„Google“ mokslinčius

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier

The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Išsaugoti Cituoti Cituoja 242 Susiję straipsniai Visos 7 versijos

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Deepfakes generation and detection: State-of-the-art, open challenges, countermeasures, and way forward

M Masood, M Nawaz, KM Malik, A Javed, A Irtaza… - Applied …, 2023 - Springer

Easy access to audio-visual content on social media, combined with the availability of
modern tools such as Tensorflow or Keras, and open-source trained models, along with …

Išsaugoti Cituoti Cituoja 442 Susiję straipsniai Visos 11 versijos

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Symphonize 3d semantic scene completion with contextual instance queries

H Jiang, T Cheng, N Gao, H Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract 3D Semantic Scene Completion (SSC) has emerged as a nascent and pivotal
undertaking in autonomous driving aiming to predict the voxel occupancy within volumetric …

Išsaugoti Cituoti Cituoja 214 Susiję straipsniai Visos 10 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Naturalspeech 2: Latent diffusion models are natural and zero-shot speech and singing synthesizers

K Shen, Z Ju, X Tan, Y Liu, Y Leng, L He, T Qin… - arxiv preprint arxiv …, 2023 - arxiv.org

Scaling text-to-speech (TTS) to large-scale, multi-speaker, and in-the-wild datasets is
important to capture the diversity in human speech such as speaker identities, prosodies …

Išsaugoti Cituoti Cituoja 234 Susiję straipsniai Visos 4 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Naturalspeech 3: Zero-shot speech synthesis with factorized codec and diffusion models

Z Ju, Y Wang, K Shen, X Tan, D **n, D Yang… - arxiv preprint arxiv …, 2024 - arxiv.org

While recent large-scale text-to-speech (TTS) models have achieved significant progress,
they still fall short in speech quality, similarity, and prosody. Considering speech intricately …

Išsaugoti Cituoti Cituoja 146 Susiję straipsniai Visos 8 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Diffsound: Discrete diffusion model for text-to-sound generation

D Yang, J Yu, H Wang, W Wang… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org

Generating sound effects that people want is an important topic. However, there are limited
studies in this area for sound generation. In this study, we investigate generating sound …

Išsaugoti Cituoti Cituoja 318 Susiję straipsniai Visos 5 versijos

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

NaturalSpeech: End-to-End Text-to-Speech Synthesis With Human-Level Quality

X Tan, J Chen, H Liu, J Cong, C Zhang… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

Text-to-speech (TTS) has made rapid progress in both academia and industry in recent
years. Some questions naturally arise that whether a TTS system can achieve human-level …

Išsaugoti Cituoti Cituoja 232 Susiję straipsniai Visos 11 versijos

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey on neural speech synthesis

X Tan, T Qin, F Soong, TY Liu - arxiv preprint arxiv:2106.15561, 2021 - arxiv.org

Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …

Išsaugoti Cituoti Cituoja 471 Susiję straipsniai Visos 2 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Merlot reserve: Neural script knowledge through vision and language and sound

R Zellers, J Lu, X Lu, Y Yu, Y Zhao… - Proceedings of the …, 2022 - openaccess.thecvf.com

As humans, we navigate a multimodal world, building a holistic understanding from all our
senses. We introduce MERLOT Reserve, a model that represents videos jointly over time …

Išsaugoti Cituoti Cituoja 272 Susiję straipsniai Visos 8 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Prodiff: Progressive fast diffusion model for high-quality text-to-speech

R Huang, Z Zhao, H Liu, J Liu, C Cui… - Proceedings of the 30th …, 2022 - dl.acm.org

Denoising diffusion probabilistic models (DDPMs) have recently achieved leading
performances in many generative tasks. However, the inherited iterative sampling process …

Išsaugoti Cituoti Cituoja 187 Susiję straipsniai Visos 3 versijos

Kurti įspėjimą

Cituoti

Išplėstinė paieška

Išsaugota skiltyje „Mano biblioteka“

Tacotron: Towards end-to-end speech synthesis

A review of deep learning techniques for speech processing

Deepfakes generation and detection: State-of-the-art, open challenges, countermeasures, and way forward

Symphonize 3d semantic scene completion with contextual instance queries

Naturalspeech 2: Latent diffusion models are natural and zero-shot speech and singing synthesizers

Naturalspeech 3: Zero-shot speech synthesis with factorized codec and diffusion models

Diffsound: Discrete diffusion model for text-to-sound generation

NaturalSpeech: End-to-End Text-to-Speech Synthesis With Human-Level Quality

A survey on neural speech synthesis

Merlot reserve: Neural script knowledge through vision and language and sound

Prodiff: Progressive fast diffusion model for high-quality text-to-speech