Multimodal research in vision and language: A review of current and emerging trends
Deep Learning and its applications have cascaded impactful research and development
with a diverse range of modalities present in the real-world data. More recently, this has …
with a diverse range of modalities present in the real-world data. More recently, this has …
Evolving graphical planner: Contextual global planning for vision-and-language navigation
The ability to perform effective planning is crucial for building an instruction-following agent.
When navigating through a new environment, an agent is challenged with (1) connecting the …
When navigating through a new environment, an agent is challenged with (1) connecting the …
Vision-language navigation: a survey and taxonomy
Vision-language navigation (VLN) tasks require an agent to follow language instructions
from a human guide to navigate in previously unseen environments using visual …
from a human guide to navigate in previously unseen environments using visual …
Visual perception generalization for vision-and-language navigation via meta-learning
Vision-and-language navigation (VLN) is a challenging task that requires an agent to
navigate in real-world environments by understanding natural language instructions and …
navigate in real-world environments by understanding natural language instructions and …
Efficient policy adaptation with contrastive prompt ensemble for embodied agents
For embodied reinforcement learning (RL) agents interacting with the environment, it is
desirable to have rapid policy adaptation to unseen visual observations, but achieving zero …
desirable to have rapid policy adaptation to unseen visual observations, but achieving zero …
Efficient Policy Adaptation with Contrastive Prompt Ensemble for Embodied Agents
WK Kim, SH Kim, H Woo - Advances in Neural …, 2024 - proceedings.neurips.cc
For embodied reinforcement learning (RL) agents interacting with the environment, it is
desirable to have rapid policy adaptation to unseen visual observations, but achieving zero …
desirable to have rapid policy adaptation to unseen visual observations, but achieving zero …
Model Adaptation for Time Constrained Embodied Control
When adopting a deep learning model for embodied agents it is required that the model
structure be optimized for specific tasks and operational conditions. Such optimization can …
structure be optimized for specific tasks and operational conditions. Such optimization can …
CrossMap transformer: A crossmodal masked path transformer using double back-translation for vision-and-language navigation
Navigation guided by natural language instructions is particularly suitable for Domestic
Service Robots that interacts naturally with users. This task involves the prediction of a …
Service Robots that interacts naturally with users. This task involves the prediction of a …
Mobile app tasks with iterative feedback (motif): Addressing task feasibility in interactive visual environments
In recent years, vision-language research has shifted to study tasks which require more
complex reasoning, such as interactive question answering, visual common sense …
complex reasoning, such as interactive question answering, visual common sense …
Vision-Language Navigation with Embodied Intelligence: A Survey
P Gao, P Wang, F Gao, F Wang, R Yuan - arxiv preprint arxiv:2402.14304, 2024 - arxiv.org
As a long-term vision in the field of artificial intelligence, the core goal of embodied
intelligence is to improve the perception, understanding, and interaction capabilities of …
intelligence is to improve the perception, understanding, and interaction capabilities of …