Foundation Models Defining a New Era in Vision: a Survey and Outlook

M Awais, M Naseer, S Khan, RM Anwer… - … on Pattern Analysis …, 2025 - ieeexplore.ieee.org
Vision systems that see and reason about the compositional nature of visual scenes are
fundamental to understanding our world. The complex relations between objects and their …

A survey on deep learning in medical image registration: New technologies, uncertainty, evaluation metrics, and beyond

J Chen, Y Liu, S Wei, Z Bian, S Subramanian… - Medical Image …, 2024 - Elsevier
Deep learning technologies have dramatically reshaped the field of medical image
registration over the past decade. The initial developments, such as regression-based and U …

Cotracker: It is better to track together

N Karaev, I Rocco, B Graham, N Neverova… - … on Computer Vision, 2024 - Springer
We introduce CoTracker, a transformer-based model that tracks a large number of 2D points
in long video sequences. Differently from most existing approaches that track points …

Drag your gan: Interactive point-based manipulation on the generative image manifold

X Pan, A Tewari, T Leimkühler, L Liu, A Meka… - ACM SIGGRAPH 2023 …, 2023 - dl.acm.org
Synthesizing visual content that meets users' needs often requires flexible and precise
controllability of the pose, shape, expression, and layout of the generated objects. Existing …

Tracking everything everywhere all at once

Q Wang, YY Chang, R Cai, Z Li… - Proceedings of the …, 2023 - openaccess.thecvf.com
We present a new test-time optimization method for estimating dense and long-range motion
from a video sequence. Prior optical flow or particle video tracking algorithms typically …

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com
Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

Pointodyssey: A large-scale synthetic dataset for long-term point tracking

Y Zheng, AW Harley, B Shen… - Proceedings of the …, 2023 - openaccess.thecvf.com
We introduce PointOdyssey, a large-scale synthetic dataset, and data generation framework,
for the training and evaluation of long-term fine-grained tracking algorithms. Our goal is to …

Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis

J Luiten, G Kopanas, B Leibe, D Ramanan - arxiv preprint arxiv …, 2023 - arxiv.org
We present a method that simultaneously addresses the tasks of dynamic scene novel-view
synthesis and six degree-of-freedom (6-DOF) tracking of all dense scene elements. We …

Tapir: Tracking any point with per-frame initialization and temporal refinement

C Doersch, Y Yang, M Vecerik… - Proceedings of the …, 2023 - openaccess.thecvf.com
We present a novel model for Tracking Any Point (TAP) that effectively tracks any queried
point on any physical surface throughout a video sequence. Our approach employs two …

Tap-vid: A benchmark for tracking any point in a video

C Doersch, A Gupta, L Markeeva… - Advances in …, 2022 - proceedings.neurips.cc
Generic motion understanding from video involves not only tracking objects, but also
perceiving how their surfaces deform and move. This information is useful to make …