- Academic Search

A Kamath, M Singh, Y LeCun… - Proceedings of the …, 2021 - openaccess.thecvf.com

Multi-modal reasoning systems rely on a pre-trained object detector to extract regions of
interest from the image. However, this crucial module is typically used as a black box …

Salva Cita Citato da 931 Articoli correlati Tutte e 10 le versioni Versione HTML

[Free GPT-4]

[PDF] neurips.cc

Debiased visual question answering from feature and sample perspectives

Z Wen, G Xu, M Tan, Q Wu… - Advances in Neural …, 2021 - proceedings.neurips.cc

Visual question answering (VQA) is designed to examine the visual-textual reasoning ability
of an intelligent agent. However, recent observations show that many VQA models may only …

Salva Cita Citato da 77 Articoli correlati Tutte e 9 le versioni Versione HTML

Test-time model adaptation for visual question answering with debiased self-supervisions

Z Wen, S Niu, G Li, Q Wu, M Tan… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Visual question answering (VQA) is a prevalent task in real-world, and plays an essential
role in hel** the blind understand the physical world. However, due to the real-world …

Salva Cita Citato da 18 Articoli correlati Tutte e 2 le versioni

[Free GPT-4]

[PDF] arxiv.org

Context disentangling and prototype inheriting for robust visual grounding

W Tang, L Li, X Liu, L **, J Tang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Visual grounding (VG) aims to locate a specific target in an image based on a given
language query. The discriminative information from context is important for distinguishing …

Salva Cita Citato da 20 Articoli correlati Tutte e 7 le versioni

Transformer-based relational inference network for complex visual relational reasoning

M Tan, Z Wen, L Fang, Q Wu - ACM Transactions on Multimedia …, 2023 - dl.acm.org

Visual Relational Reasoning is the basis of many vision-and-language based tasks (eg,
visual question answering and referring expression comprehension). In this article, we …

Salva Cita Citato da 5 Articoli correlati

Deep scene understanding with extended text description for human object interaction detection

HS Hong, JC Lee, A Kumar, S Ahn, DG Lee - Expert Systems with …, 2025 - Elsevier

Human–object interaction (HOI) detection plays a pivotal role in scene understanding,
enabling the identification, localization, and behavioral intention prediction of humans and …

Salva Cita Articoli correlati Tutte e 3 le versioni

[Free GPT-4]

[PDF] ssrn.com

Deep Scene Understanding with Extended Text Description for Human

DG Lee - Available at SSRN 4705624 - papers.ssrn.com

Human-object interaction (HOI) detection plays a pivotal role in scene understanding,
enabling the identification, localization, and behavioral intention prediction of humans and …

Salva Cita Articoli correlati Versione HTML

Crea avviso

Cita

Ricerca avanzata

Salvato in La mia biblioteca

Modular graph attention network for complex visual relational reasoning

Mdetr-modulated detection for end-to-end multi-modal understanding

Debiased visual question answering from feature and sample perspectives

Test-time model adaptation for visual question answering with debiased self-supervisions

Context disentangling and prototype inheriting for robust visual grounding

Transformer-based relational inference network for complex visual relational reasoning

Deep scene understanding with extended text description for human object interaction detection

Deep Scene Understanding with Extended Text Description for Human