Foundation Models Defining a New Era in Vision: a Survey and Outlook
Vision systems that see and reason about the compositional nature of visual scenes are
fundamental to understanding our world. The complex relations between objects and their …
fundamental to understanding our world. The complex relations between objects and their …
A survey on deep learning in medical image registration: New technologies, uncertainty, evaluation metrics, and beyond
Deep learning technologies have dramatically reshaped the field of medical image
registration over the past decade. The initial developments, such as regression-based and U …
registration over the past decade. The initial developments, such as regression-based and U …
Cotracker: It is better to track together
We introduce CoTracker, a transformer-based model that tracks a large number of 2D points
in long video sequences. Differently from most existing approaches that track points …
in long video sequences. Differently from most existing approaches that track points …
Drag your gan: Interactive point-based manipulation on the generative image manifold
Synthesizing visual content that meets users' needs often requires flexible and precise
controllability of the pose, shape, expression, and layout of the generated objects. Existing …
controllability of the pose, shape, expression, and layout of the generated objects. Existing …
Tracking everything everywhere all at once
We present a new test-time optimization method for estimating dense and long-range motion
from a video sequence. Prior optical flow or particle video tracking algorithms typically …
from a video sequence. Prior optical flow or particle video tracking algorithms typically …
Multimodal foundation models: From specialists to general-purpose assistants
Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …
methods to data compression. Recent advances in statistical machine learning have opened …
Pointodyssey: A large-scale synthetic dataset for long-term point tracking
We introduce PointOdyssey, a large-scale synthetic dataset, and data generation framework,
for the training and evaluation of long-term fine-grained tracking algorithms. Our goal is to …
for the training and evaluation of long-term fine-grained tracking algorithms. Our goal is to …
Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis
We present a method that simultaneously addresses the tasks of dynamic scene novel-view
synthesis and six degree-of-freedom (6-DOF) tracking of all dense scene elements. We …
synthesis and six degree-of-freedom (6-DOF) tracking of all dense scene elements. We …
Tapir: Tracking any point with per-frame initialization and temporal refinement
We present a novel model for Tracking Any Point (TAP) that effectively tracks any queried
point on any physical surface throughout a video sequence. Our approach employs two …
point on any physical surface throughout a video sequence. Our approach employs two …
Tap-vid: A benchmark for tracking any point in a video
Generic motion understanding from video involves not only tracking objects, but also
perceiving how their surfaces deform and move. This information is useful to make …
perceiving how their surfaces deform and move. This information is useful to make …