- Academic Search

From image to language: A critical analysis of visual question answering (vqa) approaches, challenges, and opportunities‏

MF Ishmam, MSH Shovon, MF Mridha, N Dey - Information Fusion, 2024‏ - Elsevier‏

The multimodal task of Visual Question Answering (VQA) encompassing elements of
Computer Vision (CV) and Natural Language Processing (NLP), aims to generate answers …‏

שמור צטט צוטט על ידי 28 מאמרים בנושא זה כל 3 הגרסאות

[免费ChatGPT] [DeepSeek可用网址] [HTML] jmir.org

[HTML][HTML] Mobile health applications for the most prevalent conditions by the World Health Organization: review and analysis‏

B Martínez-Pérez, I De La Torre-Díez… - Journal of medical …, 2013‏ - jmir.org‏

Background New possibilities for mHealth have arisen by means of the latest advances in
mobile communications and technologies. With more than 1 billion smartphones and 100 …‏

שמור צטט צוטט על ידי 779 מאמרים בנושא זה כל 12 הגרסאות במטמון

[免费ChatGPT] [DeepSeek可用网址] [PDF] thecvf.com

Vizwiz grand challenge: Answering visual questions from blind people‏

D Gurari, Q Li, AJ Stangl, A Guo, C Lin… - Proceedings of the …, 2018‏ - openaccess.thecvf.com‏

The study of algorithms to automatically answer visual questions currently is motivated by
visual question answering (VQA) datasets constructed in artificial VQA settings. We propose …‏

שמור צטט צוטט על ידי 865 מאמרים בנושא זה כל 14 הגרסאות פתיחה בתור HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Captioning images taken by people who are blind‏

D Gurari, Y Zhao, M Zhang, N Bhattacharya - Computer Vision–ECCV …, 2020‏ - Springer‏

While an important problem in the vision community is to design algorithms that can
automatically caption images, few publicly-available datasets for algorithm development …‏

שמור צטט צוטט על ידי 233 מאמרים בנושא זה כל 8 הגרסאות

[免费ChatGPT] [DeepSeek可用网址] [PDF] usc.edu.ph

Revolt: Collaborative crowdsourcing for labeling machine learning datasets‏

JC Chang, S Amershi, E Kamar - … of the 2017 CHI conference on human …, 2017‏ - dl.acm.org‏

Crowdsourcing provides a scalable and efficient way to construct labeled datasets for
training machine learning systems. However, creating comprehensive label guidelines for …‏

שמור צטט צוטט על ידי 333 מאמרים בנושא זה כל 6 הגרסאות

[免费ChatGPT] [DeepSeek可用网址] [PDF] stanford.edu

Understanding blind people's experiences with computer-generated captions of social media images‏

H MacLeod, CL Bennett, MR Morris… - proceedings of the 2017 …, 2017‏ - dl.acm.org‏

Research advancements allow computational systems to automatically caption social media
images. Often, these captions are evaluated with sighted humans using the image as a …‏

שמור צטט צוטט על ידי 213 מאמרים בנושא זה כל 10 הגרסאות

[免费ChatGPT] [DeepSeek可用网址] [PDF] microsoft.com

" Person, Shoes, Tree. Is the Person Naked?" What People with Vision Impairments Want in Image Descriptions‏

A Stangl, MR Morris, D Gurari - Proceedings of the 2020 chi conference …, 2020‏ - dl.acm.org‏

Access to digital images is important to people who are blind or have low vision (BLV). Many
contemporary image description efforts do not take into account this population's nuanced …‏

שמור צטט צוטט על ידי 134 מאמרים בנושא זה כל 5 הגרסאות

[免费ChatGPT] [DeepSeek可用网址] [PDF] stanford.edu

Going beyond one-size-fits-all image descriptions to satisfy the information wants of people who are blind or have low vision‏

A Stangl, N Verma, KR Fleischmann… - Proceedings of the 23rd …, 2021‏ - dl.acm.org‏

Image descriptions are how people who are blind or have low vision (BLV) access
information depicted within images. To our knowledge, no prior work has examined how a …‏

שמור צטט צוטט על ידי 76 מאמרים בנושא זה כל 3 הגרסאות

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

“It's Kind of Context Dependent”: Understanding Blind and Low Vision People's Video Accessibility Preferences Across Viewing Scenarios‏

L Jiang, C Jung, M Phutane, A Stangl… - Proceedings of the 2024 …, 2024‏ - dl.acm.org‏

While audio description (AD) is the standard approach for making videos accessible to blind
and low vision (BLV) people, existing AD guidelines do not consider BLV users' varied …‏

שמור צטט צוטט על ידי 11 מאמרים בנושא זה כל 3 הגרסאות

[免费ChatGPT] [DeepSeek可用网址] [PDF] thecvf.com

Vizwiz-priv: A dataset for recognizing the presence and purpose of private visual information in images taken by blind people‏

D Gurari, Q Li, C Lin, Y Zhao, A Guo… - Proceedings of the …, 2019‏ - openaccess.thecvf.com‏

We introduce the first visual privacy dataset originating from people who are blind in order to
better understand their privacy disclosures and to encourage the development of algorithms …‏

שמור צטט צוטט על ידי 118 מאמרים בנושא זה כל 11 הגרסאות פתיחה בתור HTML

יצירת התראה

צטט

חיפוש מתקדם

נשמר בספרייה שלי

Crowdsourcing subjective fashion advice using VizWiz: challenges and opportunities