Emo: Emote portrait alive generating expressive portrait videos with audio2video diffusion model under weak conditions

L Tian, Q Wang, B Zhang, L Bo - European Conference on Computer …, 2024 - Springer
In this work, we tackle the challenge of enhancing the realism and expressiveness in talking
head video generation by focusing on the dynamic and nuanced relationship between audio …

Dreamvideo: Composing your dream videos with customized subject and motion

Y Wei, S Zhang, Z Qing, H Yuan, Z Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Customized generation using diffusion models has made impressive progress in image
generation but remains unsatisfactory in the challenging video generation task as it requires …

A survey on generative ai and llm for video generation, understanding, and streaming

P Zhou, L Wang, Z Liu, Y Hao, P Hui, S Tarkoma… - arxiv preprint arxiv …, 2024 - arxiv.org
This paper offers an insightful examination of how currently top-trending AI technologies, ie,
generative artificial intelligence (Generative AI) and large language models (LLMs), are …

Deepfake generation and detection: A benchmark and survey

G Pei, J Zhang, M Hu, Z Zhang, C Wang, Y Wu… - arxiv preprint arxiv …, 2024 - arxiv.org
Deepfake is a technology dedicated to creating highly realistic facial images and videos
under specific conditions, which has significant application potential in fields such as …

A recipe for scaling up text-to-video generation with text-free videos

X Wang, S Zhang, H Yuan, Z Qing… - Proceedings of the …, 2024 - openaccess.thecvf.com
Diffusion-based text-to-video generation has witnessed impressive progress in the past year
yet still falls behind text-to-image generation. One of the key reasons is the limited scale of …

Hallo: Hierarchical audio-driven visual synthesis for portrait image animation

M Xu, H Li, Q Su, H Shang, L Zhang, C Liu… - arxiv preprint arxiv …, 2024 - arxiv.org
The field of portrait image animation, driven by speech audio input, has experienced
significant advancements in the generation of realistic and dynamic portraits. This research …

EmoTalk3D: high-fidelity free-view synthesis of emotional 3D talking head

Q He, X Ji, Y Gong, Y Lu, Z Diao, L Huang… - … on Computer Vision, 2024 - Springer
We present a novel approach for synthesizing 3D talking heads with controllable emotion,
featuring enhanced lip synchronization and rendering quality. Despite significant progress in …

Hallo2: Long-duration and high-resolution audio-driven portrait image animation

J Cui, H Li, Y Yao, H Zhu, H Shang, K Cheng… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent advances in latent diffusion-based generative models for portrait image animation,
such as Hallo, have achieved impressive results in short-duration video synthesis. In this …

Survey: Transformer-based Models in Data Modality Conversion

E Rashno, A Eskandari, A Anand… - arxiv preprint arxiv …, 2024 - arxiv.org
Transformers have made significant strides across various artificial intelligence domains,
including natural language processing, computer vision, and audio processing. This …

Loopy: Taming audio-driven portrait avatar with long-term motion dependency

J Jiang, C Liang, J Yang, G Lin, T Zhong… - arxiv preprint arxiv …, 2024 - arxiv.org
With the introduction of diffusion-based video generation techniques, audio-conditioned
human video generation has recently achieved significant breakthroughs in both the …