From image to language: A critical analysis of visual question answering (vqa) approaches, challenges, and opportunities

MF Ishmam, MSH Shovon, MF Mridha, N Dey - Information Fusion, 2024‏ - Elsevier
The multimodal task of Visual Question Answering (VQA) encompassing elements of
Computer Vision (CV) and Natural Language Processing (NLP), aims to generate answers …

[HTML][HTML] Mobile health applications for the most prevalent conditions by the World Health Organization: review and analysis

B Martínez-Pérez, I De La Torre-Díez… - Journal of medical …, 2013‏ - jmir.org
Background New possibilities for mHealth have arisen by means of the latest advances in
mobile communications and technologies. With more than 1 billion smartphones and 100 …

Vizwiz grand challenge: Answering visual questions from blind people

D Gurari, Q Li, AJ Stangl, A Guo, C Lin… - Proceedings of the …, 2018‏ - openaccess.thecvf.com
The study of algorithms to automatically answer visual questions currently is motivated by
visual question answering (VQA) datasets constructed in artificial VQA settings. We propose …

Captioning images taken by people who are blind

D Gurari, Y Zhao, M Zhang, N Bhattacharya - Computer Vision–ECCV …, 2020‏ - Springer
While an important problem in the vision community is to design algorithms that can
automatically caption images, few publicly-available datasets for algorithm development …

Revolt: Collaborative crowdsourcing for labeling machine learning datasets

JC Chang, S Amershi, E Kamar - … of the 2017 CHI conference on human …, 2017‏ - dl.acm.org
Crowdsourcing provides a scalable and efficient way to construct labeled datasets for
training machine learning systems. However, creating comprehensive label guidelines for …

Understanding blind people's experiences with computer-generated captions of social media images

H MacLeod, CL Bennett, MR Morris… - proceedings of the 2017 …, 2017‏ - dl.acm.org
Research advancements allow computational systems to automatically caption social media
images. Often, these captions are evaluated with sighted humans using the image as a …

" Person, Shoes, Tree. Is the Person Naked?" What People with Vision Impairments Want in Image Descriptions

A Stangl, MR Morris, D Gurari - Proceedings of the 2020 chi conference …, 2020‏ - dl.acm.org
Access to digital images is important to people who are blind or have low vision (BLV). Many
contemporary image description efforts do not take into account this population's nuanced …

Going beyond one-size-fits-all image descriptions to satisfy the information wants of people who are blind or have low vision

A Stangl, N Verma, KR Fleischmann… - Proceedings of the 23rd …, 2021‏ - dl.acm.org
Image descriptions are how people who are blind or have low vision (BLV) access
information depicted within images. To our knowledge, no prior work has examined how a …

“It's Kind of Context Dependent”: Understanding Blind and Low Vision People's Video Accessibility Preferences Across Viewing Scenarios

L Jiang, C Jung, M Phutane, A Stangl… - Proceedings of the 2024 …, 2024‏ - dl.acm.org
While audio description (AD) is the standard approach for making videos accessible to blind
and low vision (BLV) people, existing AD guidelines do not consider BLV users' varied …

Vizwiz-priv: A dataset for recognizing the presence and purpose of private visual information in images taken by blind people

D Gurari, Q Li, C Lin, Y Zhao, A Guo… - Proceedings of the …, 2019‏ - openaccess.thecvf.com
We introduce the first visual privacy dataset originating from people who are blind in order to
better understand their privacy disclosures and to encourage the development of algorithms …