Google Academic

C Zhang, C Zhang, S Zheng, Y Qiao, C Li… - arxiv preprint arxiv …, 2023 - arxiv.org

As ChatGPT goes viral, generative AI (AIGC, aka AI-generated content) has made headlines
everywhere because of its ability to analyze and create text, images, and beyond. With such …

Salvați Citați Citat de 210 ori Articole cu conținut similar Toate cele 4 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

An overview of deep-learning-based audio-visual speech enhancement and separation

D Michelsanti, ZH Tan, SX Zhang, Y Xu… - … on Audio, Speech …, 2021 - ieeexplore.ieee.org

Speech enhancement and speech separation are two related tasks, whose purpose is to
extract either one or more target speech signals, respectively, from a mixture of sounds …

Salvați Citați Citat de 310 ori Articole cu conținut similar Toate cele 6 versiuni

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Seamless: Multilingual Expressive and Streaming Speech Translation

L Barrault, YA Chung, MC Meglioli, D Dale… - arxiv preprint arxiv …, 2023 - arxiv.org

Large-scale automatic speech translation systems today lack key features that help machine-
mediated communication feel seamless when compared to human-to-human dialogue. In …

Salvați Citați Citat de 110 ori Articole cu conținut similar Toate cele 2 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey on neural speech synthesis

X Tan, T Qin, F Soong, TY Liu - arxiv preprint arxiv:2106.15561, 2021 - arxiv.org

Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …

Salvați Citați Citat de 471 ori Articole cu conținut similar Toate cele 2 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Diffsinger: Singing voice synthesis via shallow diffusion mechanism

J Liu, C Li, Y Ren, F Chen, Z Zhao - … of the AAAI conference on artificial …, 2022 - ojs.aaai.org

Singing voice synthesis (SVS) systems are built to synthesize high-quality and expressive
singing voice, in which the acoustic model generates the acoustic features (eg, mel …

Salvați Citați Citat de 284 ori Articole cu conținut similar Toate cele 8 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Melgan: Generative adversarial networks for conditional waveform synthesis

K Kumar, R Kumar, T De Boissiere… - Advances in neural …, 2019 - proceedings.neurips.cc

Previous works (Donahue et al., 2018a; Engel et al., 2019a) have found that generating
coherent raw audio waveforms with GANs is challenging. In this paper, we show that it is …

Salvați Citați Citat de 1186 ori Articole cu conținut similar Toate cele 10 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Fastspeech: Fast, robust and controllable text to speech

Y Ren, Y Ruan, X Tan, T Qin, S Zhao… - Advances in neural …, 2019 - proceedings.neurips.cc

Neural network based end-to-end text to speech (TTS) has significantly improved the quality
of synthesized speech. Prominent methods (eg, Tacotron 2) usually first generate mel …

Salvați Citați Citat de 1289 ori Articole cu conținut similar Toate cele 10 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Joint audio-visual deepfake detection

Y Zhou, SN Lim - Proceedings of the IEEE/CVF international …, 2021 - openaccess.thecvf.com

Abstract Deepfakes (" deep learning"+" fake") are synthetically-generated videos from AI
algorithms. While they could be entertaining, they could also be misused for falsifying …

Salvați Citați Citat de 189 ori Articole cu conținut similar Toate cele 5 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

ASVspoof 2019: Future horizons in spoofed and fake audio detection

M Todisco, X Wang, V Vestman, M Sahidullah… - arxiv preprint arxiv …, 2019 - arxiv.org

ASVspoof, now in its third edition, is a series of community-led challenges which promote
the development of countermeasures to protect automatic speaker verification (ASV) from …

Salvați Citați Citat de 736 ori Articole cu conținut similar Toate cele 18 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

DDSP: Differentiable digital signal processing

J Engel, L Hantrakul, C Gu, A Roberts - arxiv preprint arxiv:2001.04643, 2020 - arxiv.org

Most generative models of audio directly generate samples in one of two domains: time or
frequency. While sufficient to express any signal, these representations are inefficient, as …

Salvați Citați Citat de 527 ori Articole cu conținut similar Toate cele 5 versiuni Afișare ca HTML

Creează alerta

Citați

Căutare avansată

Salvat în Bibliotecă

WORLD: a vocoder-based high-quality speech synthesis system for real-time applications

A complete survey on generative ai (aigc): Is chatgpt from gpt-4 to gpt-5 all you need?

An overview of deep-learning-based audio-visual speech enhancement and separation

Seamless: Multilingual Expressive and Streaming Speech Translation

A survey on neural speech synthesis

Diffsinger: Singing voice synthesis via shallow diffusion mechanism

Melgan: Generative adversarial networks for conditional waveform synthesis

Fastspeech: Fast, robust and controllable text to speech

Joint audio-visual deepfake detection

ASVspoof 2019: Future horizons in spoofed and fake audio detection

DDSP: Differentiable digital signal processing