Fashion meets computer vision: A survey
Fashion is the way we present ourselves to the world and has become one of the world's
largest industries. Fashion, mainly conveyed by vision, has thus attracted much attention …
largest industries. Fashion, mainly conveyed by vision, has thus attracted much attention …
Tryondiffusion: A tale of two unets
Given two images depicting a person and a garment worn by another person, our goal is to
generate a visualization of how the garment might look on the input person. A key challenge …
generate a visualization of how the garment might look on the input person. A key challenge …
3D human pose estimation via intuitive physics
Estimating 3D humans from images often produces implausible bodies that lean, float, or
penetrate the floor. Such methods ignore the fact that bodies are typically supported by the …
penetrate the floor. Such methods ignore the fact that bodies are typically supported by the …
Deep hierarchical semantic segmentation
Humans are able to recognize structured relations in observation, allowing us to decompose
complex scenes into simpler parts and abstract the visual world in multiple levels. However …
complex scenes into simpler parts and abstract the visual world in multiple levels. However …
Expressive talking head generation with granular audio-visual control
Generating expressive talking heads is essential for creating virtual humans. However,
existing one-or few-shot methods focus on lip-sync and head motion, ignoring the emotional …
existing one-or few-shot methods focus on lip-sync and head motion, ignoring the emotional …
AGORA: Avatars in geography optimized for regression analysis
While the accuracy of 3D human pose estimation from images has steadily improved on
benchmark datasets, the best methods still fail in many real-world scenarios. This suggests …
benchmark datasets, the best methods still fail in many real-world scenarios. This suggests …
Deep learning technique for human parsing: A survey and outlook
Human parsing aims to partition humans in image or video into multiple pixel-level semantic
parts. In the last decade, it has gained significantly increased interest in the computer vision …
parts. In the last decade, it has gained significantly increased interest in the computer vision …
Self-correction for human parsing
Labeling pixel-level masks for fine-grained semantic segmentation tasks, eg, human
parsing, remains a challenging task. The ambiguous boundary between different semantic …
parsing, remains a challenging task. The ambiguous boundary between different semantic …
Neural point-based graphics
We present a new point-based approach for modeling the appearance of real scenes. The
approach uses a raw point cloud as the geometric representation of a scene, and augments …
approach uses a raw point cloud as the geometric representation of a scene, and augments …
DECO: Dense estimation of 3D human-scene contact in the wild
Understanding how humans use physical contact to interact with the world is key to enabling
human-centric artificial intelligence. While inferring 3D contact is crucial for modeling …
human-centric artificial intelligence. While inferring 3D contact is crucial for modeling …