Mitigating object hallucinations in large vision-language models through visual contrastive decoding

S Leng, H Zhang, G Chen, X Li, S Lu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Large Vision-Language Models (LVLMs) have advanced considerably intertwining
visual recognition and language understanding to generate content that is not only coherent …

Analyzing and mitigating object hallucination in large vision-language models

Y Zhou, C Cui, J Yoon, L Zhang, Z Deng, C Finn… - arxiv preprint arxiv …, 2023 - arxiv.org
Large vision-language models (LVLMs) have shown remarkable abilities in understanding
visual information with human languages. However, LVLMs still suffer from object …

Holistic analysis of hallucination in gpt-4v (ision): Bias and interference challenges

C Cui, Y Zhou, X Yang, S Wu, L Zhang, J Zou… - arxiv preprint arxiv …, 2023 - arxiv.org
While GPT-4V (ision) impressively models both visual and textual information
simultaneously, it's hallucination behavior has not been systematically assessed. To bridge …

Effectiveness assessment of recent large vision-language models

Y Jiang, X Yan, GP Ji, K Fu, M Sun, H **ong, DP Fan… - Visual Intelligence, 2024 - Springer
The advent of large vision-language models (LVLMs) represents a remarkable advance in
the quest for artificial general intelligence. However, the models' effectiveness in both …

Mitigating test-time bias for fair image retrieval

F Kong, S Yuan, W Hao… - Advances in Neural …, 2024 - proceedings.neurips.cc
We address the challenge of generating fair and unbiased image retrieval results given
neutral textual queries (with no explicit gender or race connotations), while maintaining the …

Mitigating Noisy Correspondence by Geometrical Structure Consistency Learning

Z Zhao, M Chen, T Dai, J Yao, B Han… - Proceedings of the …, 2024 - openaccess.thecvf.com
Noisy correspondence that refers to mismatches in cross-modal data pairs is prevalent on
human-annotated or web-crawled datasets. Prior approaches to leverage such data mainly …

Debiasing large visual language models

YF Zhang, W Yu, Q Wen, X Wang, Z Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
In the realms of computer vision and natural language processing, Large Vision-Language
Models (LVLMs) have become indispensable tools, proficient in generating textual …

Fire: Food image to recipe generation

P Chhikara, D Chaurasia, Y Jiang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Food computing has emerged as a prominent multidisciplinary field of research in recent
years. An ambitious goal of food computing is to develop end-to-end intelligent systems …

Vacode: Visual augmented contrastive decoding

S Kim, B Cho, S Bae, S Ahn, SY Yun - arxiv preprint arxiv:2408.05337, 2024 - arxiv.org
Despite the astonishing performance of recent Large Vision-Language Models (LVLMs),
these models often generate inaccurate responses. To address this issue, previous studies …

Fine-grained Prototypical Voting with Heterogeneous Mixup for Semi-supervised 2D-3D Cross-modal Retrieval

F Zhang, XS Hua, C Chen… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
This paper studies the problem of semi-supervised 2D-3D retrieval which aims to align both
labeled and unlabeled 2D and 3D data into the same embedding space. The problem is …