Parrot captions teach clip to spot text

Y Lin, C He, AJ Wang, B Wang, W Li… - European Conference on …, 2024 - Springer
Despite CLIP being the foundation model in numerous vision-language applications, CLIP
suffers from a severe text spotting bias. Such bias causes CLIP models to 'Parrot'the visual …

Improving Geo-Diversity of Generated Images with Contextualized Vendi Score Guidance

R Askari Hemmat, M Hall, A Sun, C Ross… - … on Computer Vision, 2024 - Springer
With the growing popularity of text-to-image generative models, there has been increasing
focus on understanding their risks and biases. Recent work has found that state-of-the-art …

Can CLIP Count Stars? An Empirical Study on Quantity Bias in CLIP

Z Zhang, Z Liu, M Feng, C Xu - arxiv preprint arxiv:2409.15035, 2024 - arxiv.org
CLIP has demonstrated great versatility in adapting to various downstream tasks, such as
image editing and generation, visual question answering, and video understanding …

Mechanistic understanding and validation of large AI models with SemanticLens

M Dreyer, J Berend, T Labarta, J Vielhaben… - arxiv preprint arxiv …, 2025 - arxiv.org
Unlike human-engineered systems such as aeroplanes, where each component's role and
dependencies are well understood, the inner workings of AI models remain largely opaque …

Instruction-Guided Editing Controls for Images and Multimedia: A Survey in LLM era

TT Nguyen, Z Ren, T Pham, PL Nguyen, H Yin… - arxiv preprint arxiv …, 2024 - arxiv.org
The rapid advancement of large language models (LLMs) and multimodal learning has
transformed digital content creation and manipulation. Traditional visual editing tools require …

Uncovering Bias in Foundation Models: Impact, Testing, Harm, and Mitigation

S Sun, L Liu, Y Liu, Z Liu, S Zhang, J Heikkilä… - arxiv preprint arxiv …, 2025 - arxiv.org
Bias in Foundation Models (FMs)-trained on vast datasets spanning societal and historical
knowledge-poses significant challenges for fairness and equity across fields such as …

SANER: Annotation-free Societal Attribute Neutralizer for Debiasing CLIP

Y Hirota, MH Chen, CY Wang, Y Nakashima… - arxiv preprint arxiv …, 2024 - arxiv.org
Large-scale vision-language models, such as CLIP, are known to contain harmful societal
bias regarding protected attributes (eg, gender and age). In this paper, we aim to address …

[BOOK][B] Computer Vision-ECCV 2024: 18th European Conference, Milan, Italy, September 29-October 4, 2024, Proceedings, Part XXIV.

A Leonardis - 2024 - books.google.com
The multi-volume set of LNCS books with volume numbers 15059 up to 15147 constitutes
the refereed proceedings of the 18th European Conference on Computer Vision, ECCV …

[PDF][PDF] Generative Artificial Intelligence and Digital Ageism: Exploring the Construction of Age and Aging by Image-Generating AI

T Kamelski, D Klinge - 2024 - osf.io
Since 2022, the growing attention to and public accessibility of generative artificial
intelligence (AI) have become essential for knowledge acquisition on digital platforms …