Mitigating object hallucinations in large vision-language models through visual contrastive decoding
Abstract Large Vision-Language Models (LVLMs) have advanced considerably intertwining
visual recognition and language understanding to generate content that is not only coherent …
visual recognition and language understanding to generate content that is not only coherent …
Analyzing and mitigating object hallucination in large vision-language models
Large vision-language models (LVLMs) have shown remarkable abilities in understanding
visual information with human languages. However, LVLMs still suffer from object …
visual information with human languages. However, LVLMs still suffer from object …
Holistic analysis of hallucination in gpt-4v (ision): Bias and interference challenges
While GPT-4V (ision) impressively models both visual and textual information
simultaneously, it's hallucination behavior has not been systematically assessed. To bridge …
simultaneously, it's hallucination behavior has not been systematically assessed. To bridge …
Effectiveness assessment of recent large vision-language models
The advent of large vision-language models (LVLMs) represents a remarkable advance in
the quest for artificial general intelligence. However, the models' effectiveness in both …
the quest for artificial general intelligence. However, the models' effectiveness in both …
Mitigating test-time bias for fair image retrieval
We address the challenge of generating fair and unbiased image retrieval results given
neutral textual queries (with no explicit gender or race connotations), while maintaining the …
neutral textual queries (with no explicit gender or race connotations), while maintaining the …
Mitigating Noisy Correspondence by Geometrical Structure Consistency Learning
Noisy correspondence that refers to mismatches in cross-modal data pairs is prevalent on
human-annotated or web-crawled datasets. Prior approaches to leverage such data mainly …
human-annotated or web-crawled datasets. Prior approaches to leverage such data mainly …
Debiasing large visual language models
In the realms of computer vision and natural language processing, Large Vision-Language
Models (LVLMs) have become indispensable tools, proficient in generating textual …
Models (LVLMs) have become indispensable tools, proficient in generating textual …
Fire: Food image to recipe generation
Food computing has emerged as a prominent multidisciplinary field of research in recent
years. An ambitious goal of food computing is to develop end-to-end intelligent systems …
years. An ambitious goal of food computing is to develop end-to-end intelligent systems …
Vacode: Visual augmented contrastive decoding
Despite the astonishing performance of recent Large Vision-Language Models (LVLMs),
these models often generate inaccurate responses. To address this issue, previous studies …
these models often generate inaccurate responses. To address this issue, previous studies …
Fine-grained Prototypical Voting with Heterogeneous Mixup for Semi-supervised 2D-3D Cross-modal Retrieval
This paper studies the problem of semi-supervised 2D-3D retrieval which aims to align both
labeled and unlabeled 2D and 3D data into the same embedding space. The problem is …
labeled and unlabeled 2D and 3D data into the same embedding space. The problem is …