Treble10: A high-quality dataset for far-field speech recognition, dereverberation, and enhancement Paper • 2510.23141 • Published Oct 27 • 4
Open ASR Leaderboard: Towards Reproducible and Transparent Multilingual and Long-Form Speech Recognition Evaluation Paper • 2510.06961 • Published Oct 8 • 7
Analyzing Mitigation Strategies for Catastrophic Forgetting in End-to-End Training of Spoken Language Models Paper • 2505.17496 • Published May 23 • 2
Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks Paper • 2411.05361 • Published Nov 8, 2024 • 3
DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment Paper • 2507.02768 • Published Jul 3 • 18
Game-Time: Evaluating Temporal Dynamics in Spoken Language Models Paper • 2509.26388 • Published Sep 30 • 26
Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks Paper • 2411.05361 • Published Nov 8, 2024 • 3
BreezyVoice: Adapting TTS for Taiwanese Mandarin with Enhanced Polyphone Disambiguation -- Challenges and Insights Paper • 2501.17790 • Published Jan 29 • 3
DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment Paper • 2507.02768 • Published Jul 3 • 18
Analyzing Mitigation Strategies for Catastrophic Forgetting in End-to-End Training of Spoken Language Models Paper • 2505.17496 • Published May 23 • 2
Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models Paper • 2408.07665 • Published Aug 14, 2024
Game-Time: Evaluating Temporal Dynamics in Spoken Language Models Paper • 2509.26388 • Published Sep 30 • 26
Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models Paper • 2408.07665 • Published Aug 14, 2024
EMO-Debias: Benchmarking Gender Debiasing Techniques in Multi-Label Speech Emotion Recognition Paper • 2506.04652 • Published Jun 5 • 1
Fake-Mamba: Real-Time Speech Deepfake Detection Using Bidirectional Mamba as Self-Attention's Alternative Paper • 2508.09294 • Published Aug 12
Meta-PerSER: Few-Shot Listener Personalized Speech Emotion Recognition via Meta-learning Paper • 2505.16220 • Published May 22 • 1
Do You Hear What I Mean? Quantifying the Instruction-Perception Gap in Instruction-Guided Expressive Text-To-Speech Systems Paper • 2509.13989 • Published Sep 17 • 3