Large Vision-Language Model Alignment and Misalignment: A Survey Through the Lens of Explainability
Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities in
processing both visual and textual information. However, the critical challenge of alignment …
processing both visual and textual information. However, the critical challenge of alignment …
Exploring annotation-free image captioning with retrieval-augmented pseudo sentence generation
Recently, training an image captioner without annotated image-sentence pairs has gained
traction. Previous methods have faced limitations due to either using mismatched corpora for …
traction. Previous methods have faced limitations due to either using mismatched corpora for …