Human-centered artificial intelligence for designing accessible cultural heritage

G Pisoni, N Díaz-Rodríguez, H Gijlers, L Tonolli - Applied Sciences, 2021 - mdpi.com
This paper reviews the literature concerning technology used for creating and delivering
accessible museum and cultural heritage sites experiences. It highlights the importance of …

Deep learning approaches on image captioning: A review

T Ghandi, H Pourreza, H Mahyar - ACM Computing Surveys, 2023 - dl.acm.org
Image captioning is a research area of immense importance, aiming to generate natural
language descriptions for visual content in the form of still images. The advent of deep …

Evaluating object hallucination in large vision-language models

Y Li, Y Du, K Zhou, J Wang, WX Zhao… - arxiv preprint arxiv …, 2023 - arxiv.org
Inspired by the superior language abilities of large language models (LLM), large vision-
language models (LVLM) have been recently explored by integrating powerful LLMs for …

Aligning large multimodal models with factually augmented rlhf

Z Sun, S Shen, S Cao, H Liu, C Li, Y Shen… - arxiv preprint arxiv …, 2023 - arxiv.org
Large Multimodal Models (LMM) are built across modalities and the misalignment between
two modalities can result in" hallucination", generating textual outputs that are not grounded …

Dreamllm: Synergistic multimodal comprehension and creation

R Dong, C Han, Y Peng, Z Qi, Z Ge, J Yang… - arxiv preprint arxiv …, 2023 - arxiv.org
This paper presents DreamLLM, a learning framework that first achieves versatile
Multimodal Large Language Models (MLLMs) empowered with frequently overlooked …

Shapellm: Universal 3d object understanding for embodied interaction

Z Qi, R Dong, S Zhang, H Geng, C Han, Z Ge… - … on Computer Vision, 2024 - Springer
This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM)
designed for embodied interaction, exploring a universal 3D object understanding with 3D …

Vizwiz grand challenge: Answering visual questions from blind people

D Gurari, Q Li, AJ Stangl, A Guo, C Lin… - Proceedings of the …, 2018 - openaccess.thecvf.com
The study of algorithms to automatically answer visual questions currently is motivated by
visual question answering (VQA) datasets constructed in artificial VQA settings. We propose …

Object hallucination in image captioning

A Rohrbach, LA Hendricks, K Burns, T Darrell… - arxiv preprint arxiv …, 2018 - arxiv.org
Despite continuously improving performance, contemporary image captioning models are
prone to" hallucinating" objects that are not actually in a scene. One problem is that standard …

Accessible visualization via natural language descriptions: A four-level model of semantic content

A Lundgard, A Satyanarayan - IEEE transactions on …, 2021 - ieeexplore.ieee.org
Natural language descriptions sometimes accompany visualizations to better communicate
and contextualize their insights, and to improve their accessibility for readers with …

Understanding and evaluating racial biases in image captioning

D Zhao, A Wang… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
Image captioning is an important task for benchmarking visual reasoning and for enabling
accessibility for people with vision impairments. However, as in many machine learning …