Osprey: Pixel understanding with visual instruction tuning

Y Yuan, W Li, J Liu, D Tang, X Luo… - Proceedings of the …, 2024 - openaccess.thecvf.com
Multimodal large language models (MLLMs) have recently achieved impressive general-
purpose vision-language capabilities through visual instruction tuning. However current …

Effectiveness assessment of recent large vision-language models

Y Jiang, X Yan, GP Ji, K Fu, M Sun, H **, H Li… - Proceedings of the …, 2024 - openaccess.thecvf.com
We present an approach to pose object recognition as next token prediction. The idea is to
apply a language decoder that auto-regressively predicts the text tokens from image …

Dino-x: A unified vision model for open-world object detection and understanding

T Ren, Y Chen, Q Jiang, Z Zeng, Y **ong, W Liu… - arxiv preprint arxiv …, 2024 - arxiv.org
In this paper, we introduce DINO-X, which is a unified object-centric vision model developed
by IDEA Research with the best open-world object detection performance to date. DINO-X …

A Hitchhiker's Guide to Fine-Grained Face Forgery Detection Using Common Sense Reasoning

NM Foteinopoulou, E Ghorbel… - Advances in Neural …, 2025 - proceedings.neurips.cc
Explainability in artificial intelligence is crucial for restoring trust, particularly in areas like
face forgery detection, where viewers often struggle to distinguish between real and …

TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias

S Jo, S Ryu, S Kim, E Yang, K Kim - European Conference on Computer …, 2024 - Springer
We identify a critical bias in contemporary CLIP-based models, which we denote as single
tag bias. This bias manifests as a disproportionate focus on a singular tag (word) while …

Survey on video anomaly detection in dynamic scenes with moving cameras

R Jiao, Y Wan, F Poiesi, Y Wang - Artificial Intelligence Review, 2023 - Springer
The increasing popularity of compact and inexpensive cameras, eg dash cameras, body
cameras, and cameras equipped on robots, has sparked a growing interest in detecting …