Human-computer interaction system: A survey of talking-head generation

R Zhen, W Song, Q He, J Cao, L Shi, J Luo - Electronics, 2023 - mdpi.com
Virtual human is widely employed in various industries, including personal assistance,
intelligent customer service, and online education, thanks to the rapid development of …

Funasr: A fundamental end-to-end speech recognition toolkit

Z Gao, Z Li, J Wang, H Luo, X Shi, M Chen, Y Li… - ar** text-to-speech (TTS) systems for a variety of real-world …

Alchemy: Data-Free Adversarial Training

Y Bai, Z Ma, Y Chen, J Deng, S Pang, Y Liu… - Proceedings of the 2024 …, 2024 - dl.acm.org
Machine learning models have become integral to various aspects of daily life, prompting
increased vulnerability to adversarial attacks. Adversarial training is one of the most …

[PDF][PDF] Speech generation for indigenous language education

RK Kazantsevaa, R Kuhna, S Larkina… - Computer Speech & …, 2024 - docs.everyvoice.ca
The vast majority of the world's languages are unable to follow in the footsteps of existing
resource-intensive pathways to building text-to-speech (TTS) systems. But, as the quality of …

Grammar-supervised end-to-end speech recognition with part-of-speech tagging and dependency parsing

G Wan, T Mao, J Zhang, H Chen, J Gao, Z Ye - Applied Sciences, 2023 - mdpi.com
For most automatic speech recognition systems, many unacceptable hypothesis errors still
make the recognition results absurd and difficult to understand. In this paper, we introduce …

Looking and listening: Audio guided text recognition

W Yu, M Liu, B Yang, E Zhang, D Jiang, X Sun… - arxiv preprint arxiv …, 2023 - arxiv.org
Text recognition in the wild is a long-standing problem in computer vision. Driven by end-to-
end deep learning, recent studies suggest vision and language processing are effective for …