Μελετητής Google

S Pascual, C Yeh, I Tsiamas, J Serrà - European Conference on Computer …, 2024 - Springer

Abstract Video-to-audio (V2A) generation leverages visual-only video features to render
plausible sounds that match the scene. Importantly, the generated sound onsets should …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 8 Σχετικά άρθρα Όλες οι 6 εκδοχές

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Temporally aligned audio for video with autoregression

I Viertola, V Iashin, E Rahtu - arxiv preprint arxiv:2409.13689, 2024 - arxiv.org

We introduce V-AURA, the first autoregressive model to achieve high temporal alignment
and relevance in video-to-audio generation. V-AURA uses a high-framerate visual feature …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 5 Σχετικά άρθρα Όλες οι 2 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Video-guided foley sound generation with multimodal controls

Z Chen, P Seetharaman, B Russell, O Nieto… - arxiv preprint arxiv …, 2024 - arxiv.org

Generating sound effects for videos often requires creating artistic sound effects that diverge
significantly from real-life sources and flexible control in the sound design. To address this …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 2 Σχετικά άρθρα Όλες οι 2 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Taming multimodal joint training for high-quality video-to-audio synthesis

HK Cheng, M Ishii, A Hayakawa, T Shibuya… - arxiv preprint arxiv …, 2024 - arxiv.org

We propose to synthesize high-quality and synchronized audio, given video and optional
text conditions, using a novel multimodal joint training framework MMAudio. In contrast to …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 2 Σχετικά άρθρα Όλες οι 2 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

From vision to audio and beyond: A unified model for audio-visual representation and generation

K Su, X Liu, E Shlizerman - arxiv preprint arxiv:2409.19132, 2024 - arxiv.org

Video encompasses both visual and auditory data, creating a perceptually rich experience
where these two modalities complement each other. As such, videos are a valuable type of …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 1 Σχετικά άρθρα Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

LoVA: Long-form Video-to-Audio Generation

X Cheng, X Wang, Y Wu, Y Wang, R Song - arxiv preprint arxiv …, 2024 - arxiv.org

Video-to-audio (V2A) generation is important for video editing and post-processing,
enabling the creation of semantics-aligned audio for silent video. However, most existing …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 1 Σχετικά άρθρα Όλες οι 4 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Generative AI for Cel-Animation: A Survey

Y Tang, J Guo, P Liu, Z Wang, H Hua, JX Zhong… - arxiv preprint arxiv …, 2025 - arxiv.org

Traditional Celluloid (Cel) Animation production pipeline encompasses multiple essential
steps, including storyboarding, layout design, keyframe animation, inbetweening, and …

Αποθήκευση Παράθεση Σχετικά άρθρα Όλες οι 2 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Images that Sound: Composing Images and Sounds on a Single Canvas

Z Chen, D Geng, A Owens - arxiv preprint arxiv:2405.12221, 2024 - arxiv.org

Spectrograms are 2D representations of sound that look very different from the images found
in our visual world. And natural images, when played as spectrograms, make unnatural …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 4 Σχετικά άρθρα Όλες οι 2 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] washington.edu

Towards Integrated Audio-Visual Learning: From Vision-to-Audio Generation to a Unified Audio-Visual Framework

K Su - 2024 - digital.lib.washington.edu

The interplay between audio and visual signals, rich in correlations across various scales,
significantly impacts human perception and drives a consistent demand for audio-visual …

Αποθήκευση Παράθεση Σχετικά άρθρα Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] upm.es

[PDF][PDF] Generative and parametric models for interactive neural synthesis in speech and audio

MJC Largo - 2024 - oa.upm.es

Speech synthesis is a multifaceted process that encompasses both acoustic signals and
articulatory dynamics. Traditional neural audio synthesis methods often rely exclusively on …

Αποθήκευση Παράθεση Σχετικά άρθρα

Δημιουργία ειδοποίησης

Παράθεση

Σύνθετη αναζήτηση

Αποθηκεύτηκε στη Βιβλιοθήκη μου

Foleygen: Visually-guided audio generation

Masked generative video-to-audio transformers with enhanced synchronicity

Temporally aligned audio for video with autoregression

Video-guided foley sound generation with multimodal controls

Taming multimodal joint training for high-quality video-to-audio synthesis

From vision to audio and beyond: A unified model for audio-visual representation and generation

LoVA: Long-form Video-to-Audio Generation

Generative AI for Cel-Animation: A Survey

Images that Sound: Composing Images and Sounds on a Single Canvas

Towards Integrated Audio-Visual Learning: From Vision-to-Audio Generation to a Unified Audio-Visual Framework

[PDF][PDF] Generative and parametric models for interactive neural synthesis in speech and audio