Diffusion with forward models: Solving stochastic inverse problems without direct supervision

A Tewari, T Yin, G Cazenavette… - Advances in …, 2023 - proceedings.neurips.cc
Denoising diffusion models are a powerful type of generative models used to capture
complex distributions of real-world signals. However, their applicability is limited to …

Generative image dynamics

Z Li, R Tucker, N Snavely… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
We present an approach to modeling an image-space prior on scene motion. Our prior is
learned from a collection of motion trajectories extracted from real video sequences …

Learning 3d human dynamics from video

A Kanazawa, JY Zhang, P Felsen… - Proceedings of the …, 2019 - openaccess.thecvf.com
From an image of a person in action, we can easily guess the 3D motion of the person in the
immediate past and future. This is because we have a mental model of 3D human dynamics …

Continuous human action recognition for human-machine interaction: a review

H Gammulle, D Ahmedt-Aristizabal, S Denman… - ACM Computing …, 2023 - dl.acm.org
With advances in data-driven machine learning research, a wide variety of prediction
models have been proposed to capture spatio-temporal features for the analysis of video …

2.5 d visual sound

R Gao, K Grauman - … of the IEEE/CVF Conference on …, 2019 - openaccess.thecvf.com
Binaural audio provides a listener with 3D sound sensation, allowing a rich perceptual
experience of the scene. However, binaural recordings are scarcely available and require …

Improved road connectivity by joint learning of orientation and segmentation

A Batra, S Singh, G Pang, S Basu… - Proceedings of the …, 2019 - openaccess.thecvf.com
Road network extraction from satellite images often produce fragmented road segments
leading to road maps unfit for real applications. Pixel-wise classification fails to predict …

D3d: Distilled 3d networks for video action recognition

J Stroud, D Ross, C Sun, J Deng… - Proceedings of the …, 2020 - openaccess.thecvf.com
State-of-the-art methods for action recognition commonly use two networks: the spatial
stream, which takes RGB frames as input, and the temporal stream, which takes optical flow …

Representation flow for action recognition

AJ Piergiovanni, MS Ryoo - … of the IEEE/CVF conference on …, 2019 - openaccess.thecvf.com
In this paper, we propose a convolutional layer inspired by optical flow algorithms to learn
motion representations. Our representation flow layer is a fully-differentiable layer designed …

Adaptive fusion and category-level dictionary learning model for multiview human action recognition

Z Gao, HZ Xuan, H Zhang, S Wan… - IEEE Internet of Things …, 2019 - ieeexplore.ieee.org
Human actions are often captured by multiple cameras (or sensors) to overcome the
significant variations in viewpoints, background clutter, object speed, and motion patterns in …

Amplifying key cues for human-object-interaction detection

Y Liu, Q Chen, A Zisserman - European Conference on Computer Vision, 2020 - Springer
Human-object interaction (HOI) detection aims to detect and recognise how people interact
with the objects that surround them. This is challenging as different interaction categories …