Multimodal research in vision and language: A review of current and emerging trends

S Uppal, S Bhagat, D Hazarika, N Majumder, S Poria… - Information …, 2022 - Elsevier
Deep Learning and its applications have cascaded impactful research and development
with a diverse range of modalities present in the real-world data. More recently, this has …

Evolving graphical planner: Contextual global planning for vision-and-language navigation

Z Deng, K Narasimhan… - Advances in Neural …, 2020 - proceedings.neurips.cc
The ability to perform effective planning is crucial for building an instruction-following agent.
When navigating through a new environment, an agent is challenged with (1) connecting the …

Vision-language navigation: a survey and taxonomy

W Wu, T Chang, X Li, Q Yin, Y Hu - Neural Computing and Applications, 2024 - Springer
Vision-language navigation (VLN) tasks require an agent to follow language instructions
from a human guide to navigate in previously unseen environments using visual …

Visual perception generalization for vision-and-language navigation via meta-learning

T Wang, Z Wu, D Wang - IEEE Transactions on Neural …, 2021 - ieeexplore.ieee.org
Vision-and-language navigation (VLN) is a challenging task that requires an agent to
navigate in real-world environments by understanding natural language instructions and …

Efficient policy adaptation with contrastive prompt ensemble for embodied agents

W Choi, WK Kim, SH Kim, H Woo - arxiv preprint arxiv:2412.11484, 2024 - arxiv.org
For embodied reinforcement learning (RL) agents interacting with the environment, it is
desirable to have rapid policy adaptation to unseen visual observations, but achieving zero …

Efficient Policy Adaptation with Contrastive Prompt Ensemble for Embodied Agents

WK Kim, SH Kim, H Woo - Advances in Neural …, 2024 - proceedings.neurips.cc
For embodied reinforcement learning (RL) agents interacting with the environment, it is
desirable to have rapid policy adaptation to unseen visual observations, but achieving zero …

Model Adaptation for Time Constrained Embodied Control

J Song, M Yoo, H Woo - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
When adopting a deep learning model for embodied agents it is required that the model
structure be optimized for specific tasks and operational conditions. Such optimization can …

CrossMap transformer: A crossmodal masked path transformer using double back-translation for vision-and-language navigation

A Magassouba, K Sugiura… - IEEE Robotics and …, 2021 - ieeexplore.ieee.org
Navigation guided by natural language instructions is particularly suitable for Domestic
Service Robots that interacts naturally with users. This task involves the prediction of a …

Mobile app tasks with iterative feedback (motif): Addressing task feasibility in interactive visual environments

A Burns, D Arsan, S Agrawal, R Kumar… - arxiv preprint arxiv …, 2021 - arxiv.org
In recent years, vision-language research has shifted to study tasks which require more
complex reasoning, such as interactive question answering, visual common sense …

Vision-Language Navigation with Embodied Intelligence: A Survey

P Gao, P Wang, F Gao, F Wang, R Yuan - arxiv preprint arxiv:2402.14304, 2024 - arxiv.org
As a long-term vision in the field of artificial intelligence, the core goal of embodied
intelligence is to improve the perception, understanding, and interaction capabilities of …