Belfusion: Latent diffusion for behavior-driven human motion prediction

G Barquero, S Escalera… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Stochastic human motion prediction (HMP) has generally been tackled with generative
adversarial networks and variational autoencoders. Most prior works aim at predicting highly …

Breaking the limits of text-conditioned 3d motion synthesis with elaborative descriptions

Y Qian, J Urbanek… - Proceedings of the …, 2023 - openaccess.thecvf.com
Given its wide applications, there is increasing focus on generating 3D human motions from
textual descriptions. Differing from the majority of previous works, which regard actions as …

Diverse human motion prediction via gumbel-softmax sampling from an auxiliary space

L Dang, Y Nie, C Long, Q Zhang, G Li - Proceedings of the 30th ACM …, 2022 - dl.acm.org
Diverse human motion prediction aims at predicting multiple possible future pose
sequences from a sequence of observed poses. Previous approaches usually employ deep …

Text Motion Translator: A Bi-directional Model for Enhanced 3D Human Motion Generation from Open-Vocabulary Descriptions

Y Qian, J Urbanek, A Hauptmann, J Won - European Conference on …, 2024 - Springer
The field of 3D human motion generation from natural language descriptions, known as
Text2Motion, has gained significant attention for its potential application in industries such …

Autoregressive Models in Vision: A Survey

J **ong, G Liu, L Huang, C Wu, T Wu, Y Mu… - arxiv preprint arxiv …, 2024 - arxiv.org
Autoregressive modeling has been a huge success in the field of natural language
processing (NLP). Recently, autoregressive models have emerged as a significant area of …

MSTP-net: Multiscale spatio-temporal parallel networks for human motion prediction

L Chen, R Liu, W Zhang, Y Hou… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
As a new rising technology, human motion prediction has broad application prospects in the
field of consumer electronics. Since different scale features have different receptive fields in …

CS-IntroVAE: Cauchy-Schwarz Divergence-Based Introspective Variational Autoencoder

Z Yu, Y Yang, Y Zhu, B Guo, C Li - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Although generative models are still being developed, image reconstruction and generation
tasks have evolved dramatically. Since the most popular generative models still have some …

A Survey on Vision Autoregressive Model

K Jiang, J Huang - arxiv preprint arxiv:2411.08666, 2024 - arxiv.org
Autoregressive models have demonstrated great performance in natural language
processing (NLP) with impressive scalability, adaptability and generalizability. Inspired by …

EMO2: End-Effector Guided Audio-Driven Avatar Video Generation

L Tian, S Hu, Q Wang, B Zhang, L Bo - arxiv preprint arxiv:2501.10687, 2025 - arxiv.org
In this paper, we propose a novel audio-driven talking head method capable of
simultaneously generating highly expressive facial expressions and hand gestures. Unlike …

Speech modeling with a hierarchical transformer dynamical vae

X Lin, X Bie, S Leglaive, L Girin… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
The dynamical variational autoencoders (DVAEs) are a family of latent-variable deep
generative models that extends the VAE to model a sequence of observed data and a …