Google Академія

Y Wei, D Hu, Y Tian, X Li - arxiv preprint arxiv:2208.09579, 2022 - arxiv.org

Sight and hearing are two senses that play a vital role in human communication and scene
understanding. To mimic human perception ability, audio-visual learning, aimed at …

Зберегти Послатися Цитовано в 68 джерелах Пов’язані статті Кількість версій: 2 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Make-an-audio: Text-to-audio generation with prompt-enhanced diffusion models

R Huang, J Huang, D Yang, Y Ren… - International …, 2023 - proceedings.mlr.press

Large-scale multimodal generative modeling has created milestones in text-to-image and
text-to-video generation. Its application to audio still lags behind for two main reasons: the …

Зберегти Послатися Цитовано в 332 джерелах Пов’язані статті Кількість версій: 7 Показати у форматі HTML

Audioldm 2: Learning holistic audio generation with self-supervised pretraining

H Liu, Y Yuan, X Liu, X Mei, Q Kong… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org

Although audio generation shares commonalities across different types of audio, such as
speech, music, and sound effects, designing models for each type requires careful …

Зберегти Послатися Цитовано в 142 джерелах Пов’язані статті Кількість версій: 8

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Diffsound: Discrete diffusion model for text-to-sound generation

D Yang, J Yu, H Wang, W Wang… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org

Generating sound effects that people want is an important topic. However, there are limited
studies in this area for sound generation. In this study, we investigate generating sound …

Зберегти Послатися Цитовано в 317 джерелах Пов’язані статті Кількість версій: 5

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Seeing and hearing: Open-domain visual-audio generation with diffusion latent aligners

Y **ng, Y He, Z Tian, X Wang… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Video and audio content creation serves as the core technique for the movie industry and
professional users. Recently existing diffusion-based methods tackle video and audio …

Зберегти Послатися Цитовано в 36 джерелах Пов’язані статті Кількість версій: 6 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Diff-foley: Synchronized video-to-audio synthesis with latent diffusion models

S Luo, C Yan, C Hu, H Zhao - Advances in Neural …, 2023 - proceedings.neurips.cc

Abstract The Video-to-Audio (V2A) model has recently gained attention for its practical
application in generating audio directly from silent videos, particularly in video/film …

Зберегти Послатися Цитовано в 74 джерелах Пов’язані статті Кількість версій: 6 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Hifi-codec: Group-residual vector quantization for high fidelity audio codec

D Yang, S Liu, R Huang, J Tian, C Weng… - arxiv preprint arxiv …, 2023 - arxiv.org

Audio codec models are widely used in audio communication as a crucial technique for
compressing audio into discrete representations. Nowadays, audio codec models are …

Зберегти Послатися Цитовано в 115 джерелах Пов’язані статті Кількість версій: 3 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey on audio diffusion models: Text to speech synthesis and enhancement in generative ai

C Zhang, C Zhang, S Zheng, M Zhang… - arxiv preprint arxiv …, 2023 - arxiv.org

Generative AI has demonstrated impressive performance in various fields, among which
speech synthesis is an interesting direction. With the diffusion model as the most popular …

Зберегти Послатися Цитовано в 85 джерелах Пов’язані статті Кількість версій: 4 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Instructtts: Modelling expressive tts in discrete latent space with natural language style prompt

D Yang, S Liu, R Huang, C Weng… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org

Expressive text-to-speech (TTS) aims to synthesize speech with varying speaking styles to
better reflect human speech patterns. In this study, we attempt to use natural language as a …

Зберегти Послатися Цитовано в 75 джерелах Пов’язані статті Кількість версій: 4

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Conditional generation of audio from video via foley analogies

Y Du, Z Chen, J Salamon, B Russell… - Proceedings of the …, 2023 - openaccess.thecvf.com

The sound effects that designers add to videos are designed to convey a particular artistic
effect and, thus, may be quite different from a scene's true sound. Inspired by the challenges …

Зберегти Послатися Цитовано в 39 джерелах Пов’язані статті Кількість версій: 7 Показати у форматі HTML

Створити сповіщення

Послатися

Розширений пошук

Збережено в моїй бібліотеці

Taming visually guided sound generation

Learning in audio-visual context: A review, analysis, and new perspective

Make-an-audio: Text-to-audio generation with prompt-enhanced diffusion models

Audioldm 2: Learning holistic audio generation with self-supervised pretraining

Diffsound: Discrete diffusion model for text-to-sound generation

Seeing and hearing: Open-domain visual-audio generation with diffusion latent aligners

Diff-foley: Synchronized video-to-audio synthesis with latent diffusion models

Hifi-codec: Group-residual vector quantization for high fidelity audio codec

A survey on audio diffusion models: Text to speech synthesis and enhancement in generative ai

Instructtts: Modelling expressive tts in discrete latent space with natural language style prompt

Conditional generation of audio from video via foley analogies