Chop & learn: Recognizing and generating object-state compositions
Recognizing and generating object-state compositions has been a challenging task,
especially when generalizing to unseen compositions. In this paper, we study the task of …
especially when generalizing to unseen compositions. In this paper, we study the task of …
Learning object state changes in videos: An open-world perspective
Abstract Object State Changes (OSCs) are pivotal for video understanding. While humans
can effortlessly generalize OSC understanding from familiar to unknown objects current …
can effortlessly generalize OSC understanding from familiar to unknown objects current …
Towards scalable neural representation for diverse videos
Implicit neural representations (INR) have gained increasing attention in representing 3D
scenes and images, and have been recently applied to encode videos (eg, NeRV, E-NeRV) …
scenes and images, and have been recently applied to encode videos (eg, NeRV, E-NeRV) …
Multi-task learning of object states and state-modifying actions from web videos
We aim to learn to temporally localize object state changes and the corresponding state-
modifying actions by observing people interacting with objects in long uncurated web …
modifying actions by observing people interacting with objects in long uncurated web …
Video Prediction by Modeling Videos as Continuous Multi-Dimensional Processes
Diffusion models have made significant strides in image generation mastering tasks such as
unconditional image synthesis text-image translation and image-to-image conversions …
unconditional image synthesis text-image translation and image-to-image conversions …
Simpson: Simplifying photo cleanup with single-click distracting object segmentation network
In photo editing, it is common practice to remove visual distractions to improve the overall
image quality and highlight the primary subject. However, manually selecting and removing …
image quality and highlight the primary subject. However, manually selecting and removing …
Multi-task learning of object state changes from uncurated videos
We aim to learn to temporally localize object state changes and the corresponding state-
modifying actions by observing people interacting with objects in long uncurated web …
modifying actions by observing people interacting with objects in long uncurated web …
Multi-task learning of object states and state-modifying actions from web videos
We aim to learn to temporally localize object state changes and the corresponding state-
modifying actions by observing people interacting with objects in long uncurated web …
modifying actions by observing people interacting with objects in long uncurated web …
Beyond Seen Primitive Concepts and Attribute-Object Compositional Learning
Learning from seen attribute-object pairs to generalize to unseen compositions has been
studied extensively in Compositional Zero-Shot Learning (CZSL). However CZSL setup is …
studied extensively in Compositional Zero-Shot Learning (CZSL). However CZSL setup is …
Video decomposition prior: Editing videos layer by layer
In the evolving landscape of video editing methodologies, a majority of deep learning
techniques are often reliant on extensive datasets of observed input and ground truth …
techniques are often reliant on extensive datasets of observed input and ground truth …