Google 학술 검색

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier

The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

저장 인용 240회 인용 관련 학술자료 전체 7개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] wiley.com

A comprehensive review of data‐driven co‐speech gesture generation

S Nyatsanga, T Kucherenko, C Ahuja… - Computer Graphics …, 2023 - Wiley Online Library

Gestures that accompany speech are an essential part of natural and efficient embodied
human communication. The automatic generation of such co‐speech gestures is a long …

저장 인용 82회 인용 관련 학술자료 전체 14개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Voicebox: Text-guided multilingual universal speech generation at scale

M Le, A Vyas, B Shi, B Karrer, L Sari… - Advances in neural …, 2023 - proceedings.neurips.cc

Large-scale generative models such as GPT and DALL-E have revolutionized the research
community. These models not only generate high fidelity outputs, but are also generalists …

저장 인용 257회 인용 관련 학술자료 전체 9개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Symphonize 3d semantic scene completion with contextual instance queries

H Jiang, T Cheng, N Gao, H Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract 3D Semantic Scene Completion (SSC) has emerged as a nascent and pivotal
undertaking in autonomous driving aiming to predict the voxel occupancy within volumetric …

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Noise2music: Text-conditioned music generation with diffusion models

Q Huang, DS Park, T Wang, TI Denk, A Ly… - arxiv preprint arxiv …, 2023 - arxiv.org

We introduce Noise2Music, where a series of diffusion models is trained to generate high-
quality 30-second music clips from text prompts. Two types of diffusion models, a generator …

저장 인용 191회 인용 관련 학술자료 전체 4개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Naturalspeech 2: Latent diffusion models are natural and zero-shot speech and singing synthesizers

K Shen, Z Ju, X Tan, Y Liu, Y Leng, L He, T Qin… - arxiv preprint arxiv …, 2023 - arxiv.org

Scaling text-to-speech (TTS) to large-scale, multi-speaker, and in-the-wild datasets is
important to capture the diversity in human speech such as speaker identities, prosodies …

저장 인용 223회 인용 관련 학술자료 전체 4개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

NaturalSpeech: End-to-End Text-to-Speech Synthesis With Human-Level Quality

X Tan, J Chen, H Liu, J Cong, C Zhang… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

Text-to-speech (TTS) has made rapid progress in both academia and industry in recent
years. Some questions naturally arise that whether a TTS system can achieve human-level …

저장 인용 226회 인용 관련 학술자료 전체 11개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey on neural speech synthesis

X Tan, T Qin, F Soong, TY Liu - arxiv preprint arxiv:2106.15561, 2021 - arxiv.org

Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …

저장 인용 467회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Nv-embed: Improved techniques for training llms as generalist embedding models

C Lee, R Roy, M Xu, J Raiman, M Shoeybi… - arxiv preprint arxiv …, 2024 - arxiv.org

Decoder-only large language model (LLM)-based embedding models are beginning to
outperform BERT or T5-based embedding models in general-purpose text embedding tasks …

저장 인용 88회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Add 2022: the first audio deep synthesis detection challenge

J Yi, R Fu, J Tao, S Nie, H Ma, C Wang… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

Audio deepfake detection is an emerging topic, which was included in the ASVspoof 2021.
However, the recent shared tasks have not covered many real-life and challenging …

저장 인용 205회 인용 관련 학술자료 전체 9개의 버전

알림 만들기

인용

고급 검색

라이브러리에 저장됨

Style tokens: Unsupervised style modeling, control and transfer in end-to-end speech synthesis

A review of deep learning techniques for speech processing

A comprehensive review of data‐driven co‐speech gesture generation

Voicebox: Text-guided multilingual universal speech generation at scale

Symphonize 3d semantic scene completion with contextual instance queries

Noise2music: Text-conditioned music generation with diffusion models

Naturalspeech 2: Latent diffusion models are natural and zero-shot speech and singing synthesizers

NaturalSpeech: End-to-End Text-to-Speech Synthesis With Human-Level Quality

A survey on neural speech synthesis

Nv-embed: Improved techniques for training llms as generalist embedding models

Add 2022: the first audio deep synthesis detection challenge