You need to agree to share your contact information to access this model
This repository is publicly accessible, but you have to accept the conditions to access its files and content.
This model is released under the Creative Commons Attribution-NonCommercial 4.0 International License, also known as CC BY-NC 4.0.
Access is limited to non-commercial use. Users must provide appropriate attribution when sharing or adapting the model where required by the license. Commercial use, including use in paid products, paid services, or commercial deployment, is not permitted under this license.
Log in or Sign Up to review the conditions and access this model content.
Model Card for whisper-large-v3-taiwanese-hakka
This model is a fine-tuned version of the Taiwanese Hakka openai/whisper-large-v3, which uses the ids of each dialect as prompts during training, to experiment whether the addition of prompts to the finetune of whisper when using multiple dialects will give better results.
Dialect and Id
- 四縣: htia_sixian
- 海陸: htia_hailu
- 大埔: htia_dapu
- 饒平: htia_raoping
- 詔安: htia_zhaoan
- 南四縣: htia_nansixian
Training process
The training of the model was performed with the following hyperparameters
- Batch size: 32
- Epochs: 3
- Warmup Steps: 50
- Total Steps: 42549
- Learning rate: 7e-5
- Data augmentation: No
How to use
Access and Authentication
This model is hosted as a gated Hugging Face repository. Before using it:
- Visit the model page and request access.
- Log in with the same Hugging Face account that has been granted access.
- Authenticate your local environment with a Hugging Face access token.
A read token is sufficient for inference.
pip install -U huggingface_hub
hf auth login
Alternatively, you can provide the token through the HF_TOKEN environment variable:
export HF_TOKEN=hf_xxx
Do not hard-code your Hugging Face token in scripts, notebooks, or public repositories.
If you see an error such as Cannot access gated repo, make sure that:
- your Hugging Face account has been granted access to this model;
hf auth whoamishows the expected account;HF_HUB_DISABLE_IMPLICIT_TOKENis not set.
Run model
import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
model_id = "formospeech/whisper-large-v3-taiwanese-hakka"
dialect_id = "htia_sixian"
model = AutoModelForSpeechSeq2Seq.from_pretrained(
model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)
processor = AutoProcessor.from_pretrained(model_id)
pipe = pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
max_new_tokens=128,
chunk_length_s=30,
batch_size=16,
torch_dtype=torch_dtype,
device=device,
)
generate_kwargs = {"language": "Chinese", "prompt_ids": torch.from_numpy(processor.get_prompt_ids(dialect_id)).to(device)}
transcription = pipe("path/to/my_audio.wav", generate_kwargs=generate_kwargs)
print(transcription.replace(f" {dialect_id}", ""))
- Downloads last month
- 174
Model tree for formospeech/whisper-large-v3-taiwanese-hakka
Base model
openai/whisper-large-v3