- Academic Search

Emo: Emote portrait alive generating expressive portrait videos with audio2video diffusion model under weak conditions

L Tian, Q Wang, B Zhang, L Bo - European Conference on Computer …, 2024 - Springer

In this work, we tackle the challenge of enhancing the realism and expressiveness in talking
head video generation by focusing on the dynamic and nuanced relationship between audio …

Save Cite Cited by 94 Related articles All 2 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Dreamvideo: Composing your dream videos with customized subject and motion

Y Wei, S Zhang, Z Qing, H Yuan, Z Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Customized generation using diffusion models has made impressive progress in image
generation but remains unsatisfactory in the challenging video generation task as it requires …

Save Cite Cited by 85 Related articles All 4 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey on generative ai and llm for video generation, understanding, and streaming

P Zhou, L Wang, Z Liu, Y Hao, P Hui, S Tarkoma… - arxiv preprint arxiv …, 2024 - arxiv.org

This paper offers an insightful examination of how currently top-trending AI technologies, ie,
generative artificial intelligence (Generative AI) and large language models (LLMs), are …

Save Cite Cited by 32 Related articles All 8 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Deepfake generation and detection: A benchmark and survey

G Pei, J Zhang, M Hu, Z Zhang, C Wang, Y Wu… - arxiv preprint arxiv …, 2024 - arxiv.org

Deepfake is a technology dedicated to creating highly realistic facial images and videos
under specific conditions, which has significant application potential in fields such as …

Save Cite Cited by 32 Related articles All 2 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

A recipe for scaling up text-to-video generation with text-free videos

X Wang, S Zhang, H Yuan, Z Qing… - Proceedings of the …, 2024 - openaccess.thecvf.com

Diffusion-based text-to-video generation has witnessed impressive progress in the past year
yet still falls behind text-to-image generation. One of the key reasons is the limited scale of …

Save Cite Cited by 23 Related articles All 4 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Hallo: Hierarchical audio-driven visual synthesis for portrait image animation

M Xu, H Li, Q Su, H Shang, L Zhang, C Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

The field of portrait image animation, driven by speech audio input, has experienced
significant advancements in the generation of realistic and dynamic portraits. This research …

Save Cite Cited by 14 Related articles All 2 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

EmoTalk3D: high-fidelity free-view synthesis of emotional 3D talking head

Q He, X Ji, Y Gong, Y Lu, Z Diao, L Huang… - … on Computer Vision, 2024 - Springer

We present a novel approach for synthesizing 3D talking heads with controllable emotion,
featuring enhanced lip synchronization and rendering quality. Despite significant progress in …

Save Cite Cited by 5 Related articles All 9 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Hallo2: Long-duration and high-resolution audio-driven portrait image animation

J Cui, H Li, Y Yao, H Zhu, H Shang, K Cheng… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent advances in latent diffusion-based generative models for portrait image animation,
such as Hallo, have achieved impressive results in short-duration video synthesis. In this …

Save Cite Cited by 9 Related articles All 2 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Survey: Transformer-based Models in Data Modality Conversion

E Rashno, A Eskandari, A Anand… - arxiv preprint arxiv …, 2024 - arxiv.org

Transformers have made significant strides across various artificial intelligence domains,
including natural language processing, computer vision, and audio processing. This …

Save Cite Cited by 2 Related articles All 2 versions Free GPT-4 DeepSeek View as HTML

Loopy: Taming audio-driven portrait avatar with long-term motion dependency

J Jiang, C Liang, J Yang, G Lin, T Zhong… - arxiv preprint arxiv …, 2024 - arxiv.org

With the introduction of diffusion-based video generation techniques, audio-conditioned
human video generation has recently achieved significant breakthroughs in both the …

Save Cite Cited by 14 Related articles All 2 versions Free GPT-4 DeepSeek Cached

Create alert

Cite

Advanced search

Saved to My library

Dreamtalk: When expressive talking head generation meets diffusion probabilistic models

Emo: Emote portrait alive generating expressive portrait videos with audio2video diffusion model under weak conditions

Dreamvideo: Composing your dream videos with customized subject and motion

A survey on generative ai and llm for video generation, understanding, and streaming

Deepfake generation and detection: A benchmark and survey

A recipe for scaling up text-to-video generation with text-free videos

Hallo: Hierarchical audio-driven visual synthesis for portrait image animation

EmoTalk3D: high-fidelity free-view synthesis of emotional 3D talking head

Hallo2: Long-duration and high-resolution audio-driven portrait image animation

Survey: Transformer-based Models in Data Modality Conversion

Loopy: Taming audio-driven portrait avatar with long-term motion dependency