Controllable accented text-to-speech synthesis with fine and coarse-grained intensity rendering

R Liu, B Sisman, G Gao, H Li - IEEE/ACM Transactions on …, 2024 - ieeexplore.ieee.org
Accented text-to-speech (TTS) synthesis seeks to generate speech with an accent (L2) as a
variant of the standard version (L1), which is challenging as L2 is different from L1 in terms …

An embedded end-to-end voice assistant

L Lazzaroni, F Bellotti, R Berta - Engineering Applications of Artificial …, 2024 - Elsevier
Voice assistants are spreading in various environments, such as houses and cars, bringing
the possibility of controlling heterogeneous Internet of Things devices with simple voice …

Improving mispronunciation detection using speech reconstruction

A Das, R Gutierrez-Osuna - IEEE/ACM Transactions on Audio …, 2024 - ieeexplore.ieee.org
Training related machine learning tasks simultaneously can lead to improved performance
on both tasks. Text-to-speech (TTS) and mispronunciation detection and diagnosis (MDD) …

Multi-scale accent modeling with disentangling for multi-speaker multi-accent TTS synthesis

X Zhou, M Zhang, Y Zhou, Z Wu, H Li - arxiv preprint arxiv:2406.10844, 2024 - arxiv.org
Synthesizing speech across different accents while preserving the speaker identity is
essential for various real-world customer applications. However, the individual and accurate …

Towards zero-shot multi-speaker multi-accent text-to-speech synthesis

M Zhang, X Zhou, Z Wu, H Li - IEEE Signal Processing Letters, 2023 - ieeexplore.ieee.org
This letter presents a framework towards multi-accent neural text-to-speech synthesis for
zero-shot multi-speaker, which employs an encoder-decoder architecture and an accent …

Zero-shot emotion transfer for cross-lingual speech synthesis

Y Li, X Zhu, Y Lei, H Li, J Liu, D **e… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
Zero-shot emotion transfer in cross-lingual speech synthesis aims to transfer emotion from
an arbitrary speech reference in the source language to the synthetic speech in the target …

Non-autoregressive real-time accent conversion model with voice cloning

V Nechaev, S Kosyakov - arxiv preprint arxiv:2405.13162, 2024 - arxiv.org
Currently, the development of Foreign Accent Conversion (FAC) models utilizes deep neural
network architectures, as well as ensembles of neural networks for speech recognition and …

[PDF][PDF] Neural speech synthesis for austrian dialects with standard german grapheme-to-phoneme conversion and dialect embeddings

L Gutscher, M Pucher, V Garcia - Proc. 2nd Annual Meeting of the …, 2023 - researchgate.net
For languages where extensive audio data and text transcriptions are available, text-to-
speech (TTS) systems have showcased the ability to generate speech that closely …

AccentBox: Towards High-Fidelity Zero-Shot Accent Generation

J Zhong, K Richmond, Z Su, S Sun - arxiv preprint arxiv:2409.09098, 2024 - arxiv.org
While recent Zero-Shot Text-to-Speech (ZS-TTS) models have achieved high naturalness
and speaker similarity, they fall short in accent fidelity and control. To address this issue, we …

生成式文本质量的自动评估方法综述 (A Survey of Automatic Evaluation on the Quality of Generated Text)

L Tian, M Ziao, Z Yanghao, X Chen… - Proceedings of the 23rd …, 2024 - aclanthology.org
Abstract “人工评估, 作为生成式文本质量评价的金标准, 成本太高; 自动评估,
核心思想在于要使其评估结果与人工评估高度相关, 从而实现对生成式文本质量的自动化分析和 …