Fully open Whisper-style speech foundation models developed by CMU WAVLab: https://www.wavlab.org/activities/2024/owsm/
Yifan Peng
pyf98
AI & ML interests
Multimodal LLMs, Speech-to-Speech, Speech Recognition
Recent Activity
authored a paper about 6 hours ago
Improving Multilingual Speech Models on ML-SUPERB 2.0: Fine-tuning with
Data Augmentation and LID-Aware CTC authored a paper about 6 hours ago
ESPnet-SpeechLM: An Open Speech Language Model Toolkit authored a paper about 7 hours ago
Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence