Drivinggaussian: Composite gaussian splatting for surrounding dynamic autonomous driving scenes

X Zhou, Z Lin, X Shan, Y Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com
We present DrivingGaussian an efficient and effective framework for surrounding dynamic
autonomous driving scenes. For complex scenes with moving objects we first sequentially …

A survey on open-vocabulary detection and segmentation: Past, present, and future

C Zhu, L Chen - IEEE Transactions on Pattern Analysis and …, 2024 - ieeexplore.ieee.org
As the most fundamental scene understanding tasks, object detection and segmentation
have made tremendous progress in deep learning era. Due to the expensive manual …

Brushnet: A plug-and-play image inpainting model with decomposed dual-branch diffusion

X Ju, X Liu, X Wang, Y Bian, Y Shan, Q Xu - European Conference on …, 2024 - Springer
Image inpainting, the process of restoring corrupted images, has seen significant
advancements with the advent of diffusion models (DMs). Despite these advancements …

A foundation model for joint segmentation, detection and recognition of biomedical objects across nine modalities

T Zhao, Y Gu, J Yang, N Usuyama, HH Lee, S Kiblawi… - Nature …, 2024 - nature.com
Biomedical image analysis is fundamental for biomedical discovery. Holistic image analysis
comprises interdependent subtasks such as segmentation, detection and recognition, which …

Diffusion model-based video editing: A survey

W Sun, RC Tu, J Liao, D Tao - arxiv preprint arxiv:2407.07111, 2024 - arxiv.org
The rapid development of diffusion models (DMs) has significantly advanced image and
video applications, making" what you want is what you see" a reality. Among these, video …

Segment anything in medical images and videos: Benchmark and deployment

J Ma, S Kim, F Li, M Baharoon, R Asakereh… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent advances in segmentation foundation models have enabled accurate and efficient
segmentation across a wide range of natural images and videos, but their utility to medical …

T2v-compbench: A comprehensive benchmark for compositional text-to-video generation

K Sun, K Huang, X Liu, Y Wu, Z Xu, Z Li… - arxiv preprint arxiv …, 2024 - arxiv.org
Text-to-video (T2V) generation models have advanced significantly, yet their ability to
compose different objects, attributes, actions, and motions into a video remains unexplored …

A survey on occupancy perception for autonomous driving: The information fusion perspective

H Xu, J Chen, S Meng, Y Wang, LP Chau - Information Fusion, 2025 - Elsevier
Abstract 3D occupancy perception technology aims to observe and understand dense 3D
environments for autonomous vehicles. Owing to its comprehensive perception capability …

Filo: Zero-shot anomaly detection by fine-grained description and high-quality localization

Z Gu, B Zhu, G Zhu, Y Chen, H Li, M Tang… - Proceedings of the 32nd …, 2024 - dl.acm.org
Zero-shot anomaly detection (ZSAD) methods detect anomalies without prior access to
known normal or abnormal samples within target categories. Existing methods typically rely …

Spatialbot: Precise spatial understanding with vision language models

W Cai, I Ponomarenko, J Yuan, X Li, W Yang… - arxiv preprint arxiv …, 2024 - arxiv.org
Vision Language Models (VLMs) have achieved impressive performance in 2D image
understanding, however they are still struggling with spatial understanding which is the …