- Academic Search

Z **, W Chen, X Guo, W He, Y Ding, B Hong… - Science China …, 2025 - Springer

For a long time, researchers have sought artificial intelligence (AI) that matches or exceeds
human intelligence. AI agents, which are artificial entities capable of sensing the …

Save Cite Cited by 719 Related articles All 4 versions Free GPT-4

[Free GPT-4]

[PDF] sciencedirect.com

A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier

The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Save Cite Cited by 224 Related articles All 6 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Naturalspeech 2: Latent diffusion models are natural and zero-shot speech and singing synthesizers

K Shen, Z Ju, X Tan, Y Liu, Y Leng, L He, T Qin… - arxiv preprint arxiv …, 2023 - arxiv.org

Scaling text-to-speech (TTS) to large-scale, multi-speaker, and in-the-wild datasets is
important to capture the diversity in human speech such as speaker identities, prosodies …

Save Cite Cited by 212 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Viola: Conditional language models for speech recognition, synthesis, and translation

T Wang, L Zhou, Z Zhang, Y Wu, S Liu… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org

Recent research shows a big convergence in model architecture, training objectives, and
inference methods across various tasks for different modalities. In this paper, we propose …

Save Cite Cited by 102 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Uniaudio: An audio foundation model toward universal audio generation

D Yang, J Tian, X Tan, R Huang, S Liu, X Chang… - arxiv preprint arxiv …, 2023 - arxiv.org

Large Language models (LLM) have demonstrated the capability to handle a variety of
generative tasks. This paper presents the UniAudio system, which, unlike prior task-specific …

Save Cite Cited by 97 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Voicecraft: Zero-shot speech editing and text-to-speech in the wild

P Peng, PY Huang, SW Li, A Mohamed… - arxiv preprint arxiv …, 2024 - arxiv.org

We introduce VoiceCraft, a token infilling neural codec language model, that achieves state-
of-the-art performance on both speech editing and zero-shot text-to-speech (TTS) on …

Save Cite Cited by 47 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Speechx: Neural codec language model as a versatile speech transformer

X Wang, M Thakker, Z Chen, N Kanda… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org

Recent advancements in generative speech models based on audio-text prompts have
enabled remarkable innovations like high-quality zero-shot text-to-speech. However …

Save Cite Cited by 66 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

SeamlessM4T-Massively Multilingual & Multimodal Machine Translation

L Barrault, YA Chung, MC Meglioli, D Dale… - arxiv preprint arxiv …, 2023 - arxiv.org

What does it take to create the Babel Fish, a tool that can help individuals translate speech
between any two languages? While recent breakthroughs in text-based models have …

Save Cite Cited by 105 Related articles View as HTML

[Free GPT-4]

[PDF] arxiv.org

Character-llm: A trainable agent for role-playing

Y Shao, L Li, J Dai, X Qiu - arxiv preprint arxiv:2310.10158, 2023 - arxiv.org

Large language models (LLMs) can be used to serve as agents to simulate human
behaviors, given the powerful ability to understand human instructions and provide high …

Save Cite Cited by 152 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

On decoder-only architecture for speech-to-text and large language model integration

J Wu, Y Gaur, Z Chen, L Zhou, Y Zhu… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

Large language models (LLMs) have achieved remarkable success in the field of natural
language processing, enabling better human-computer interaction using natural language …

Save Cite Cited by 103 Related articles All 3 versions Free GPT-4

Create alert

Cite

Advanced search

Saved to My library

Speak foreign languages with your own voice: Cross-lingual neural codec language modeling

The rise and potential of large language model based agents: A survey

A review of deep learning techniques for speech processing

Naturalspeech 2: Latent diffusion models are natural and zero-shot speech and singing synthesizers

Viola: Conditional language models for speech recognition, synthesis, and translation

Uniaudio: An audio foundation model toward universal audio generation

Voicecraft: Zero-shot speech editing and text-to-speech in the wild

Speechx: Neural codec language model as a versatile speech transformer

SeamlessM4T-Massively Multilingual & Multimodal Machine Translation

Character-llm: A trainable agent for role-playing

On decoder-only architecture for speech-to-text and large language model integration