- Academic Search

S Uppal, S Bhagat, D Hazarika, N Majumder, S Poria… - Information …, 2022 - Elsevier

Deep Learning and its applications have cascaded impactful research and development
with a diverse range of modalities present in the real-world data. More recently, this has …

Gem Citer Citeret af 111 Relaterede artikler Alle 5 versioner

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Affordances from human videos as a versatile representation for robotics

S Bahl, R Mendonca, L Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com

Building a robot that can understand and learn to interact by watching humans has inspired
several vision problems. However, despite some successful results on static datasets, it …

Gem Citer Citeret af 138 Relaterede artikler Alle 9 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Convolutional image captioning

J Aneja, A Deshpande… - Proceedings of the IEEE …, 2018 - openaccess.thecvf.com

Image captioning is an important task, applicable to virtual assistants, editing tools, image
indexing, and support of the disabled. In recent years significant progress has been made in …

Gem Citer Citeret af 479 Relaterede artikler Alle 12 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Out of the box: Reasoning with graph convolution nets for factual visual question answering

M Narasimhan, S Lazebnik… - Advances in neural …, 2018 - proceedings.neurips.cc

Accurately answering a question about a given image requires combining observations with
general knowledge. While this is effortless for humans, reasoning with general knowledge …

Gem Citer Citeret af 288 Relaterede artikler Alle 8 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Two causal principles for improving visual dialog

J Qi, Y Niu, J Huang, H Zhang - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com

This paper unravels the design tricks adopted by us, the champion team MReaL-BDAI, for
Visual Dialog Challenge 2019: two causal principles for improving Visual Dialog (VisDial) …

Gem Citer Citeret af 171 Relaterede artikler Alle 8 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Audio visual scene-aware dialog

H Alamri, V Cartillier, A Das, J Wang… - Proceedings of the …, 2019 - openaccess.thecvf.com

We introduce the task of scene-aware dialog. Our goal is to generate a complete and natural
response to a question about a scene, given video and audio of the scene and the history of …

Gem Citer Citeret af 213 Relaterede artikler Alle 10 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] jair.org Full View

Trends in integration of vision and language research: A survey of tasks, datasets, and methods

A Mogadala, M Kalimuthu, D Klakow - Journal of Artificial Intelligence …, 2021 - jair.org

Abstract Interest in Artificial Intelligence (AI) and its applications has seen unprecedented
growth in the last few years. This success can be partly attributed to the advancements made …

Gem Citer Citeret af 163 Relaterede artikler Alle 8 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Large-scale pretraining for visual dialog: A simple state-of-the-art baseline

V Murahari, D Batra, D Parikh, A Das - European Conference on Computer …, 2020 - Springer

Prior work in visual dialog has focused on training deep neural models on VisDial in
isolation. Instead, we present an approach to leverage pretraining on related vision …

Gem Citer Citeret af 136 Relaterede artikler Alle 8 versioner

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Removing bias in multi-modal classifiers: Regularization by maximizing functional entropies

I Gat, I Schwartz, A Schwing… - Advances in Neural …, 2020 - proceedings.neurips.cc

Many recent datasets contain a variety of different data modalities, for instance, image,
question, and answer data in visual question answering (VQA). When training deep net …

Gem Citer Citeret af 96 Relaterede artikler Alle 6 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Reasoning visual dialogs with structural and partial observations

Z Zheng, W Wang, S Qi, SC Zhu - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com

We propose a novel model to address the task of Visual Dialog which exhibits complex
dialog structures. To obtain a reasonable answer based on the current question and the …

Gem Citer Citeret af 142 Relaterede artikler Alle 9 versioner Vis som HTML

Opret underretning

Citer

Avanceret søgning

Gemt i Min samling

Two can play this game: Visual dialog with discriminative question generation and answering

Multimodal research in vision and language: A review of current and emerging trends

Affordances from human videos as a versatile representation for robotics

Convolutional image captioning

Out of the box: Reasoning with graph convolution nets for factual visual question answering

Two causal principles for improving visual dialog

Audio visual scene-aware dialog

Trends in integration of vision and language research: A survey of tasks, datasets, and methods

Large-scale pretraining for visual dialog: A simple state-of-the-art baseline

Removing bias in multi-modal classifiers: Regularization by maximizing functional entropies

Reasoning visual dialogs with structural and partial observations