Computer vision for autonomous vehicles: Problems, datasets and state of the art

J Janai, F Güney, A Behl, A Geiger - Foundations and trends® …, 2020 - nowpublishers.com
Recent years have witnessed enormous progress in AI-related fields such as computer
vision, machine learning, and autonomous vehicles. As with any rapidly growing field, it …

Visual SLAM algorithms: A survey from 2010 to 2016

T Taketomi, H Uchiyama, S Ikeda - IPSJ transactions on computer vision …, 2017 - Springer
SLAM is an abbreviation for simultaneous localization and map**, which is a technique for
estimating sensor motion and reconstructing structure in an unknown environment …

[PDF][PDF] The dawn of lmms: Preliminary explorations with gpt-4v (ision)

Z Yang, L Li, K Lin, J Wang, CC Lin… - arxiv preprint arxiv …, 2023 - stableaiprompts.com
Large multimodal models (LMMs) extend large language models (LLMs) with multi-sensory
skills, such as visual understanding, to achieve stronger generic intelligence. In this paper …

Emergent correspondence from image diffusion

L Tang, M Jia, Q Wang, CP Phoo… - Advances in Neural …, 2023 - proceedings.neurips.cc
Finding correspondences between images is a fundamental problem in computer vision. In
this paper, we show that correspondence emerges in image diffusion models without any …

Cotracker: It is better to track together

N Karaev, I Rocco, B Graham, N Neverova… - … on Computer Vision, 2024 - Springer
We introduce CoTracker, a transformer-based model that tracks a large number of 2D points
in long video sequences. Differently from most existing approaches that track points …

Generative novel view synthesis with 3d-aware diffusion models

ER Chan, K Nagano, MA Chan… - Proceedings of the …, 2023 - openaccess.thecvf.com
We present a diffusion-based model for 3D-aware generative novel view synthesis from as
few as a single input image. Our model samples from the distribution of possible renderings …

Vastgaussian: Vast 3d gaussians for large scene reconstruction

J Lin, Z Li, X Tang, J Liu, S Liu, J Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Existing NeRF-based methods for large scene reconstruction often have limitations in visual
quality and rendering speed. While the recent 3D Gaussian Splatting works well on small …

Magic123: One image to high-quality 3d object generation using both 2d and 3d diffusion priors

G Qian, J Mai, A Hamdi, J Ren, A Siarohin, B Li… - arxiv preprint arxiv …, 2023 - arxiv.org
We present Magic123, a two-stage coarse-to-fine approach for high-quality, textured 3D
meshes generation from a single unposed image in the wild using both2D and 3D priors. In …

Tracking everything everywhere all at once

Q Wang, YY Chang, R Cai, Z Li… - Proceedings of the …, 2023 - openaccess.thecvf.com
We present a new test-time optimization method for estimating dense and long-range motion
from a video sequence. Prior optical flow or particle video tracking algorithms typically …

Gs-lrm: Large reconstruction model for 3d gaussian splatting

K Zhang, S Bi, H Tan, Y **angli, N Zhao… - … on Computer Vision, 2024 - Springer
We propose GS-LRM, a scalable large reconstruction model that can predict high-quality 3D
Gaussian primitives from 2–4 posed sparse images in∼ 0.23 s on single A100 GPU. Our …