You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

This model is released under the Creative Commons Attribution-NonCommercial 4.0 International License, also known as CC BY-NC 4.0.

Access is limited to non-commercial use. Users must provide appropriate attribution when sharing or adapting the model where required by the license. Commercial use, including use in paid products, paid services, or commercial deployment, is not permitted under this license.

Model Card for whisper-large-v3-taiwanese-hakka

This model is a fine-tuned version of the Taiwanese Hakka openai/whisper-large-v3, which uses the ids of each dialect as prompts during training, to experiment whether the addition of prompts to the finetune of whisper when using multiple dialects will give better results.

Dialect and Id

四縣: htia_sixian
海陸: htia_hailu
大埔: htia_dapu
饒平: htia_raoping
詔安: htia_zhaoan
南四縣: htia_nansixian

Training process

The training of the model was performed with the following hyperparameters

Batch size: 32
Epochs: 3
Warmup Steps: 50
Total Steps: 42549
Learning rate: 7e-5
Data augmentation: No

How to use

Access and Authentication

This model is hosted as a gated Hugging Face repository. Before using it:

Visit the model page and request access.
Log in with the same Hugging Face account that has been granted access.
Authenticate your local environment with a Hugging Face access token.

A read token is sufficient for inference.

pip install -U huggingface_hub
hf auth login

Alternatively, you can provide the token through the HF_TOKEN environment variable:

export HF_TOKEN=hf_xxx

Do not hard-code your Hugging Face token in scripts, notebooks, or public repositories.

If you see an error such as Cannot access gated repo, make sure that:

your Hugging Face account has been granted access to this model;
hf auth whoami shows the expected account;
HF_HUB_DISABLE_IMPLICIT_TOKEN is not set.

Run model

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
model_id = "formospeech/whisper-large-v3-taiwanese-hakka"
dialect_id = "htia_sixian"
model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)
processor = AutoProcessor.from_pretrained(model_id)
pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    max_new_tokens=128,
    chunk_length_s=30,
    batch_size=16,
    torch_dtype=torch_dtype,
    device=device,
)
generate_kwargs = {"language": "Chinese", "prompt_ids": torch.from_numpy(processor.get_prompt_ids(dialect_id)).to(device)}
transcription = pipe("path/to/my_audio.wav", generate_kwargs=generate_kwargs)
print(transcription.replace(f" {dialect_id}", ""))

Downloads last month: 174

Safetensors

Model size

2B params

Tensor type

F32

Model tree for formospeech/whisper-large-v3-taiwanese-hakka

Base model

openai/whisper-large-v3

Finetuned

(842)

this model