Multimodal research in vision and language: A review of current and emerging trends
Deep Learning and its applications have cascaded impactful research and development
with a diverse range of modalities present in the real-world data. More recently, this has …
with a diverse range of modalities present in the real-world data. More recently, this has …
Affordances from human videos as a versatile representation for robotics
Building a robot that can understand and learn to interact by watching humans has inspired
several vision problems. However, despite some successful results on static datasets, it …
several vision problems. However, despite some successful results on static datasets, it …
Convolutional image captioning
Image captioning is an important task, applicable to virtual assistants, editing tools, image
indexing, and support of the disabled. In recent years significant progress has been made in …
indexing, and support of the disabled. In recent years significant progress has been made in …
Out of the box: Reasoning with graph convolution nets for factual visual question answering
Accurately answering a question about a given image requires combining observations with
general knowledge. While this is effortless for humans, reasoning with general knowledge …
general knowledge. While this is effortless for humans, reasoning with general knowledge …
Two causal principles for improving visual dialog
This paper unravels the design tricks adopted by us, the champion team MReaL-BDAI, for
Visual Dialog Challenge 2019: two causal principles for improving Visual Dialog (VisDial) …
Visual Dialog Challenge 2019: two causal principles for improving Visual Dialog (VisDial) …
Audio visual scene-aware dialog
We introduce the task of scene-aware dialog. Our goal is to generate a complete and natural
response to a question about a scene, given video and audio of the scene and the history of …
response to a question about a scene, given video and audio of the scene and the history of …
Trends in integration of vision and language research: A survey of tasks, datasets, and methods
Abstract Interest in Artificial Intelligence (AI) and its applications has seen unprecedented
growth in the last few years. This success can be partly attributed to the advancements made …
growth in the last few years. This success can be partly attributed to the advancements made …
Large-scale pretraining for visual dialog: A simple state-of-the-art baseline
Prior work in visual dialog has focused on training deep neural models on VisDial in
isolation. Instead, we present an approach to leverage pretraining on related vision …
isolation. Instead, we present an approach to leverage pretraining on related vision …
Removing bias in multi-modal classifiers: Regularization by maximizing functional entropies
Many recent datasets contain a variety of different data modalities, for instance, image,
question, and answer data in visual question answering (VQA). When training deep net …
question, and answer data in visual question answering (VQA). When training deep net …
Reasoning visual dialogs with structural and partial observations
We propose a novel model to address the task of Visual Dialog which exhibits complex
dialog structures. To obtain a reasonable answer based on the current question and the …
dialog structures. To obtain a reasonable answer based on the current question and the …