Preference Alignment Improves Language Model-Based TTS
Recent advancements in text-to-speech (TTS) have shown that language model (LM)-based
systems offer competitive performance to their counterparts. Further optimization can be …
systems offer competitive performance to their counterparts. Further optimization can be …
ESPnet-EZ: Python-Only ESPnet For Easy Fine-Tuning And Integration
We introduce ESPnet-EZ, an extension of the open-source speech processing toolkit
ESPnet, aimed at quick and easy development of speech models. ESPnet-EZ focuses on …
ESPnet, aimed at quick and easy development of speech models. ESPnet-EZ focuses on …
Floras 50: A Massively Multilingual Multitask Benchmark for Long-Form Conversational Speech
A common criticism for current speech recognition benchmarks is the reliance on settings
which do not generalize well to real-world conversational environments, such as read …
which do not generalize well to real-world conversational environments, such as read …
Efficiently Identifying Low-Quality Language Subsets in Multilingual Datasets: A Case Study on a Large-Scale Multilingual Audio Dataset
Curating datasets that span multiple languages is challenging. To make the collection more
scalable, researchers often incorporate one or more imperfect classifiers in the process, like …
scalable, researchers often incorporate one or more imperfect classifiers in the process, like …