Diffusion with forward models: Solving stochastic inverse problems without direct supervision
Denoising diffusion models are a powerful type of generative models used to capture
complex distributions of real-world signals. However, their applicability is limited to …
complex distributions of real-world signals. However, their applicability is limited to …
Generative image dynamics
We present an approach to modeling an image-space prior on scene motion. Our prior is
learned from a collection of motion trajectories extracted from real video sequences …
learned from a collection of motion trajectories extracted from real video sequences …
Learning 3d human dynamics from video
From an image of a person in action, we can easily guess the 3D motion of the person in the
immediate past and future. This is because we have a mental model of 3D human dynamics …
immediate past and future. This is because we have a mental model of 3D human dynamics …
Continuous human action recognition for human-machine interaction: a review
With advances in data-driven machine learning research, a wide variety of prediction
models have been proposed to capture spatio-temporal features for the analysis of video …
models have been proposed to capture spatio-temporal features for the analysis of video …
2.5 d visual sound
Binaural audio provides a listener with 3D sound sensation, allowing a rich perceptual
experience of the scene. However, binaural recordings are scarcely available and require …
experience of the scene. However, binaural recordings are scarcely available and require …
Improved road connectivity by joint learning of orientation and segmentation
Road network extraction from satellite images often produce fragmented road segments
leading to road maps unfit for real applications. Pixel-wise classification fails to predict …
leading to road maps unfit for real applications. Pixel-wise classification fails to predict …
D3d: Distilled 3d networks for video action recognition
State-of-the-art methods for action recognition commonly use two networks: the spatial
stream, which takes RGB frames as input, and the temporal stream, which takes optical flow …
stream, which takes RGB frames as input, and the temporal stream, which takes optical flow …
Representation flow for action recognition
In this paper, we propose a convolutional layer inspired by optical flow algorithms to learn
motion representations. Our representation flow layer is a fully-differentiable layer designed …
motion representations. Our representation flow layer is a fully-differentiable layer designed …
Adaptive fusion and category-level dictionary learning model for multiview human action recognition
Human actions are often captured by multiple cameras (or sensors) to overcome the
significant variations in viewpoints, background clutter, object speed, and motion patterns in …
significant variations in viewpoints, background clutter, object speed, and motion patterns in …
Amplifying key cues for human-object-interaction detection
Human-object interaction (HOI) detection aims to detect and recognise how people interact
with the objects that surround them. This is challenging as different interaction categories …
with the objects that surround them. This is challenging as different interaction categories …