Google 학술 검색

J Tian, C Zhang, J Shi, H Zhang, J Yu… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent advancements in text-to-speech (TTS) have shown that language model (LM)-based
systems offer competitive performance to their counterparts. Further optimization can be …

저장 인용 3회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

ESPnet-EZ: Python-Only ESPnet For Easy Fine-Tuning And Integration

M Someki, K Choi, S Arora, W Chen… - 2024 IEEE Spoken …, 2024 - ieeexplore.ieee.org

We introduce ESPnet-EZ, an extension of the open-source speech processing toolkit
ESPnet, aimed at quick and easy development of speech models. ESPnet-EZ focuses on …

저장 인용 관련 학술자료 전체 3개의 버전

Floras 50: A Massively Multilingual Multitask Benchmark for Long-Form Conversational Speech

W Chen, B Yan, CC Chen… - 2024 IEEE Spoken …, 2024 - ieeexplore.ieee.org

A common criticism for current speech recognition benchmarks is the reliance on settings
which do not generalize well to real-world conversational environments, such as read …

저장 인용 관련 학술자료

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Efficiently Identifying Low-Quality Language Subsets in Multilingual Datasets: A Case Study on a Large-Scale Multilingual Audio Dataset

F Samir, EP Ahn, S Prakash, M Soskuthy… - arxiv preprint arxiv …, 2024 - arxiv.org

Curating datasets that span multiple languages is challenging. To make the collection more
scalable, researchers often incorporate one or more imperfect classifiers in the process, like …

저장 인용 관련 학술자료 전체 2개의 버전 HTML 버전

알림 만들기

인용

고급 검색

라이브러리에 저장됨

On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models

Preference Alignment Improves Language Model-Based TTS

ESPnet-EZ: Python-Only ESPnet For Easy Fine-Tuning And Integration

Floras 50: A Massively Multilingual Multitask Benchmark for Long-Form Conversational Speech

Efficiently Identifying Low-Quality Language Subsets in Multilingual Datasets: A Case Study on a Large-Scale Multilingual Audio Dataset