Human-centered artificial intelligence for designing accessible cultural heritage
This paper reviews the literature concerning technology used for creating and delivering
accessible museum and cultural heritage sites experiences. It highlights the importance of …
accessible museum and cultural heritage sites experiences. It highlights the importance of …
Deep learning approaches on image captioning: A review
Image captioning is a research area of immense importance, aiming to generate natural
language descriptions for visual content in the form of still images. The advent of deep …
language descriptions for visual content in the form of still images. The advent of deep …
Evaluating object hallucination in large vision-language models
Inspired by the superior language abilities of large language models (LLM), large vision-
language models (LVLM) have been recently explored by integrating powerful LLMs for …
language models (LVLM) have been recently explored by integrating powerful LLMs for …
Aligning large multimodal models with factually augmented rlhf
Large Multimodal Models (LMM) are built across modalities and the misalignment between
two modalities can result in" hallucination", generating textual outputs that are not grounded …
two modalities can result in" hallucination", generating textual outputs that are not grounded …
Dreamllm: Synergistic multimodal comprehension and creation
This paper presents DreamLLM, a learning framework that first achieves versatile
Multimodal Large Language Models (MLLMs) empowered with frequently overlooked …
Multimodal Large Language Models (MLLMs) empowered with frequently overlooked …
Shapellm: Universal 3d object understanding for embodied interaction
This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM)
designed for embodied interaction, exploring a universal 3D object understanding with 3D …
designed for embodied interaction, exploring a universal 3D object understanding with 3D …
Vizwiz grand challenge: Answering visual questions from blind people
The study of algorithms to automatically answer visual questions currently is motivated by
visual question answering (VQA) datasets constructed in artificial VQA settings. We propose …
visual question answering (VQA) datasets constructed in artificial VQA settings. We propose …
Object hallucination in image captioning
Despite continuously improving performance, contemporary image captioning models are
prone to" hallucinating" objects that are not actually in a scene. One problem is that standard …
prone to" hallucinating" objects that are not actually in a scene. One problem is that standard …
Accessible visualization via natural language descriptions: A four-level model of semantic content
Natural language descriptions sometimes accompany visualizations to better communicate
and contextualize their insights, and to improve their accessibility for readers with …
and contextualize their insights, and to improve their accessibility for readers with …
Understanding and evaluating racial biases in image captioning
Image captioning is an important task for benchmarking visual reasoning and for enabling
accessibility for people with vision impairments. However, as in many machine learning …
accessibility for people with vision impairments. However, as in many machine learning …