Google Académico

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier

The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Guardar Citar Citado por 235 Artículos relacionados Las 6 versiones

[Free GPT-4]
[DeepSeek]

[PDF] sciencedirect.com

Spoken language interaction with robots: Recommendations for future research

M Marge, C Espy-Wilson, NG Ward, A Alwan… - Computer Speech & …, 2022 - Elsevier

With robotics rapidly advancing, more effective human–robot interaction is increasingly
needed to realize the full potential of robots for society. While spoken language must be part …

Guardar Citar Citado por 132 Artículos relacionados Las 8 versiones

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Voicebox: Text-guided multilingual universal speech generation at scale

M Le, A Vyas, B Shi, B Karrer, L Sari… - Advances in neural …, 2024 - proceedings.neurips.cc

Large-scale generative models such as GPT and DALL-E have revolutionized the research
community. These models not only generate high fidelity outputs, but are also generalists …

Guardar Citar Citado por 253 Artículos relacionados Las 8 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Naturalspeech 2: Latent diffusion models are natural and zero-shot speech and singing synthesizers

K Shen, Z Ju, X Tan, Y Liu, Y Leng, L He, T Qin… - arxiv preprint arxiv …, 2023 - arxiv.org

Scaling text-to-speech (TTS) to large-scale, multi-speaker, and in-the-wild datasets is
important to capture the diversity in human speech such as speaker identities, prosodies …

Guardar Citar Citado por 216 Artículos relacionados Las 3 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Noise2music: Text-conditioned music generation with diffusion models

Q Huang, DS Park, T Wang, TI Denk, A Ly… - arxiv preprint arxiv …, 2023 - arxiv.org

We introduce Noise2Music, where a series of diffusion models is trained to generate high-
quality 30-second music clips from text prompts. Two types of diffusion models, a generator …

Guardar Citar Citado por 193 Artículos relacionados Las 5 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey on neural speech synthesis

X Tan, T Qin, F Soong, TY Liu - arxiv preprint arxiv:2106.15561, 2021 - arxiv.org

Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …

Guardar Citar Citado por 467 Artículos relacionados Las 2 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Naturalspeech: End-to-end text-to-speech synthesis with human-level quality

X Tan, J Chen, H Liu, J Cong, C Zhang… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

Text-to-speech (TTS) has made rapid progress in both academia and industry in recent
years. Some questions naturally arise that whether a TTS system can achieve human-level …

Guardar Citar Citado por 225 Artículos relacionados Las 9 versiones

[Free GPT-4]
[DeepSeek]

[PDF] pubpub.org

[PDF][PDF] Jukebox: A generative model for music

P Dhariwal, H Jun, C Payne, JW Kim… - arxiv preprint arxiv …, 2020 - assets.pubpub.org

We introduce Jukebox, a model that generates music with singing in the raw audio domain.
We tackle the long context of raw audio using a multiscale VQ-VAE to compress it to discrete …

Guardar Citar Citado por 905 Artículos relacionados Las 8 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Glow-tts: A generative flow for text-to-speech via monotonic alignment search

J Kim, S Kim, J Kong, S Yoon - Advances in Neural …, 2020 - proceedings.neurips.cc

Abstract Recently, text-to-speech (TTS) models such as FastSpeech and ParaNet have been
proposed to generate mel-spectrograms from text in parallel. Despite the advantage, the …

Guardar Citar Citado por 572 Artículos relacionados Las 5 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Add 2022: the first audio deep synthesis detection challenge

J Yi, R Fu, J Tao, S Nie, H Ma, C Wang… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

Audio deepfake detection is an emerging topic, which was included in the ASVspoof 2021.
However, the recent shared tasks have not covered many real-life and challenging …

Guardar Citar Citado por 207 Artículos relacionados Las 9 versiones

Crear alerta

Citar

Búsqueda avanzada

Guardado en Mi biblioteca

Style tokens: Unsupervised style modeling, control and transfer in end-to-end speech synthesis

A review of deep learning techniques for speech processing

Spoken language interaction with robots: Recommendations for future research

Voicebox: Text-guided multilingual universal speech generation at scale

Naturalspeech 2: Latent diffusion models are natural and zero-shot speech and singing synthesizers

Noise2music: Text-conditioned music generation with diffusion models

A survey on neural speech synthesis

Naturalspeech: End-to-end text-to-speech synthesis with human-level quality

[PDF][PDF] Jukebox: A generative model for music

Glow-tts: A generative flow for text-to-speech via monotonic alignment search

Add 2022: the first audio deep synthesis detection challenge