Vqa: Visual question answering S Antol, A Agrawal, J Lu, M Mitchell, D Batra, CL Zitnick, D Parikh Proceedings of the IEEE international conference on computer vision, 2425-2433, 2015 | 6739 | 2015 |
Don't just assume; look and answer: Overcoming priors for visual question answering A Agrawal, D Batra, D Parikh, A Kembhavi Proceedings of the IEEE conference on computer vision and pattern …, 2018 | 754 | 2018 |
Visual storytelling TH Huang, F Ferraro, N Mostafazadeh, I Misra, A Agrawal, J Devlin, ... Proceedings of the 2016 conference of the North American chapter of the …, 2016 | 513 | 2016 |
Analyzing the behavior of visual question answering models A Agrawal, D Batra, D Parikh arXiv preprint arXiv:1606.07356, 2016 | 392 | 2016 |
Overcoming language priors in visual question answering with adversarial regularization S Ramakrishnan, A Agrawal, S Lee Advances in Neural Information Processing Systems 31, 2018 | 278 | 2018 |
C-vqa: A compositional split of the visual question answering (vqa) v1. 0 dataset A Agrawal, A Kembhavi, D Batra, D Parikh arXiv preprint arXiv:1704.08243, 2017 | 82 | 2017 |
An introduction to vision-language modeling F Bordes, RY Pang, A Ajay, AC Li, A Bardes, S Petryk, O Mañas, Z Lin, ... arXiv preprint arXiv:2405.17247, 2024 | 50 | 2024 |
Mapl: Parameter-efficient adaptation of unimodal pre-trained models for vision-language few-shot prompting O Mañas, P Rodriguez, S Ahmadi, A Nematzadeh, Y Goyal, A Agrawal arXiv preprint arXiv:2210.07179, 2022 | 42 | 2022 |
Measuring machine intelligence through visual question answering CL Zitnick, A Agrawal, S Antol, M Mitchell, D Batra, D Parikh AI Magazine 37 (1), 63-72, 2016 | 40 | 2016 |
Resolving language and vision ambiguities together: Joint segmentation & prepositional attachment resolution in captioned scenes G Christie, A Laddha, A Agrawal, S Antol, Y Goyal, K Kochersberger, ... arXiv preprint arXiv:1604.02125, 2016 | 35 | 2016 |
Improving automatic vqa evaluation using large language models O Mañas, B Krojer, A Agrawal Proceedings of the AAAI Conference on Artificial Intelligence 38 (5), 4171-4179, 2024 | 22 | 2024 |
Reassessing evaluation practices in visual question answering: A case study on out-of-distribution generalization A Agrawal, I Kajić, E Bugliarello, E Davoodi, A Gergely, P Blunsom, ... arXiv preprint arXiv:2205.12191, 2022 | 22 | 2022 |
Measuring progress in fine-grained vision-and-language understanding E Bugliarello, L Sartran, A Agrawal, LA Hendricks, A Nematzadeh arXiv preprint arXiv:2305.07558, 2023 | 20 | 2023 |
Improving text-to-image consistency via automatic prompt optimization O Mañas, P Astolfi, M Hall, C Ross, J Urbanek, A Williams, A Agrawal, ... arXiv preprint arXiv:2403.17804, 2024 | 19 | 2024 |
Resolving vision and language ambiguities together: Joint segmentation & prepositional attachment resolution in captioned scenes G Christie, A Laddha, A Agrawal, S Antol, Y Goyal, K Kochersberger, ... Computer Vision and Image Understanding 163, 101-112, 2017 | 13 | 2017 |
Benchmarking vision language models for cultural understanding S Nayak, K Jain, R Awal, S Reddy, S van Steenkiste, LA Hendricks, ... arXiv preprint arXiv:2407.10920, 2024 | 12 | 2024 |
Visual storytelling F Ferraro, N Mostafazadeh, I Misra, A Agrawal, J Devlin, R Girshick, X He, ... arXiv preprint arXiv:1604.03968, 2016 | 11 | 2016 |
MoqaGPT: Zero-Shot Multi-modal Open-domain Question Answering with Large Language Model L Zhang, Y Wu, F Mo, JY Nie, A Agrawal arXiv preprint arXiv:2310.13265, 2023 | 6 | 2023 |
Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Fine-grained Understanding L Zhang, R Awal, A Agrawal arXiv preprint arXiv:2306.08832, 2023 | 6 | 2023 |
An examination of the robustness of reference-free image captioning evaluation metrics S Ahmadi, A Agrawal arXiv preprint arXiv:2305.14998, 2023 | 6 | 2023 |