Qa dataset explosion: A taxonomy of nlp resources for question answering and reading comprehension

A Rogers, M Gardner, I Augenstein - ACM Computing Surveys, 2023 - dl.acm.org
Alongside huge volumes of research on deep learning models in NLP in the recent years,
there has been much work on benchmark datasets needed to track modeling progress …

Video generative adversarial networks: a review

N Aldausari, A Sowmya, N Marcus… - ACM Computing Surveys …, 2022 - dl.acm.org
With the increasing interest in the content creation field in multiple sectors such as media,
education, and entertainment, there is an increased trend in the papers that use AI …

Zero-shot video question answering via frozen bidirectional language models

A Yang, A Miech, J Sivic, I Laptev… - Advances in Neural …, 2022 - proceedings.neurips.cc
Video question answering (VideoQA) is a complex task that requires diverse multi-modal
data for training. Manual annotation of question and answers for videos, however, is tedious …

Just ask: Learning to answer questions from millions of narrated videos

A Yang, A Miech, J Sivic, I Laptev… - Proceedings of the …, 2021 - openaccess.thecvf.com
Recent methods for visual question answering rely on large-scale annotated datasets.
Manual annotation of questions and answers for videos, however, is tedious, expensive and …

Video understanding with large language models: A survey

Y Tang, J Bi, S Xu, L Song, S Liang, T Wang… - arxiv preprint arxiv …, 2023 - arxiv.org
With the burgeoning growth of online video platforms and the escalating volume of video
content, the demand for proficient video understanding tools has intensified markedly. Given …

Tvqa: Localized, compositional video question answering

J Lei, L Yu, M Bansal, TL Berg - arxiv preprint arxiv:1809.01696, 2018 - arxiv.org
Recent years have witnessed an increasing interest in image-based question-answering
(QA) tasks. However, due to data limitations, there has been much less work on video-based …

Hierarchical conditional relation networks for video question answering

TM Le, V Le, S Venkatesh… - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com
Video question answering (VideoQA) is challenging as it requires modeling capacity to distill
dynamic visual artifacts and distant relations and to associate them with linguistic concepts …

[PDF][PDF] Activitynet-qa: A dataset for understanding complex web videos via question answering

Z Yu, D Xu, J Yu, T Yu, Z Zhao, Y Zhuang… - Proceedings of the AAAI …, 2019 - aaai.org
Recent developments in modeling language and vision have been successfully applied to
image question answering. It is both crucial and natural to extend this research direction to …

Videodirectorgpt: Consistent multi-scene video generation via llm-guided planning

H Lin, A Zala, J Cho, M Bansal - arxiv preprint arxiv:2309.15091, 2023 - arxiv.org
Recent text-to-video (T2V) generation methods have seen significant advancements.
However, the majority of these works focus on producing short video clips of a single event …

Tvr: A large-scale dataset for video-subtitle moment retrieval

J Lei, L Yu, TL Berg, M Bansal - … Conference, Glasgow, UK, August 23–28 …, 2020 - Springer
We introduce TV show Retrieval (TVR), a new multimodal retrieval dataset. TVR requires
systems to understand both videos and their associated subtitle (dialogue) texts, making it …