Preference Alignment Improves Language Model-Based TTS

J Tian, C Zhang, J Shi, H Zhang, J Yu… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent advancements in text-to-speech (TTS) have shown that language model (LM)-based
systems offer competitive performance to their counterparts. Further optimization can be …

ESPnet-EZ: Python-Only ESPnet For Easy Fine-Tuning And Integration

M Someki, K Choi, S Arora, W Chen… - 2024 IEEE Spoken …, 2024 - ieeexplore.ieee.org
We introduce ESPnet-EZ, an extension of the open-source speech processing toolkit
ESPnet, aimed at quick and easy development of speech models. ESPnet-EZ focuses on …

Floras 50: A Massively Multilingual Multitask Benchmark for Long-Form Conversational Speech

W Chen, B Yan, CC Chen… - 2024 IEEE Spoken …, 2024 - ieeexplore.ieee.org
A common criticism for current speech recognition benchmarks is the reliance on settings
which do not generalize well to real-world conversational environments, such as read …

Efficiently Identifying Low-Quality Language Subsets in Multilingual Datasets: A Case Study on a Large-Scale Multilingual Audio Dataset

F Samir, EP Ahn, S Prakash, M Soskuthy… - arxiv preprint arxiv …, 2024 - arxiv.org
Curating datasets that span multiple languages is challenging. To make the collection more
scalable, researchers often incorporate one or more imperfect classifiers in the process, like …