Training language models to follow instructions with human feedback

L Ouyang, J Wu, X Jiang, D Almeida… - Advances in neural …, 2022 - proceedings.neurips.cc
Making language models bigger does not inherently make them better at following a user's
intent. For example, large language models can generate outputs that are untruthful, toxic, or …

Embodied navigation with multi-modal information: A survey from tasks to methodology

Y Wu, P Zhang, M Gu, J Zheng, X Bai - Information Fusion, 2024 - Elsevier
Embodied AI aims to create agents that complete complex tasks by interacting with the
environment. A key problem in this field is embodied navigation which understands multi …

Core challenges in embodied vision-language planning

J Francis, N Kitamura, F Labelle, X Lu, I Navarro… - Journal of Artificial …, 2022 - jair.org
Recent advances in the areas of multimodal machine learning and artificial intelligence (AI)
have led to the development of challenging tasks at the intersection of Computer Vision …

GlitchBench: Can large multimodal models detect video game glitches?

MR Taesiri, T Feng, CP Bezemer… - Proceedings of the …, 2024 - openaccess.thecvf.com
Large multimodal models (LMMs) have evolved from large language models (LLMs) to
integrate multiple input modalities such as visual inputs. This integration augments the …

Less is more: Generating grounded navigation instructions from landmarks

S Wang, C Montgomery, J Orbay… - Proceedings of the …, 2022 - openaccess.thecvf.com
We study the automatic generation of navigation instructions from 360-degree images
captured on indoor routes. Existing generators suffer from poor visual grounding, causing …