- Academic Search

L Ouyang, J Wu, X Jiang, D Almeida… - Advances in neural …, 2022 - proceedings.neurips.cc

Making language models bigger does not inherently make them better at following a user's
intent. For example, large language models can generate outputs that are untruthful, toxic, or …

Tallenna Viittaa Viittausten määrä 12272 Aiheeseen liittyviä artikkeleita Kaikki 20 versiota HTML-versio

Embodied navigation with multi-modal information: A survey from tasks to methodology

Y Wu, P Zhang, M Gu, J Zheng, X Bai - Information Fusion, 2024 - Elsevier

Embodied AI aims to create agents that complete complex tasks by interacting with the
environment. A key problem in this field is embodied navigation which understands multi …

Tallenna Viittaa Viittausten määrä 5 Aiheeseen liittyviä artikkeleita Kaikki 3 versiota

[Free GPT-4]
[DeepSeek]

[PDF] jair.org Full View

Core challenges in embodied vision-language planning

J Francis, N Kitamura, F Labelle, X Lu, I Navarro… - Journal of Artificial …, 2022 - jair.org

Recent advances in the areas of multimodal machine learning and artificial intelligence (AI)
have led to the development of challenging tasks at the intersection of Computer Vision …

Tallenna Viittaa Viittausten määrä 49 Aiheeseen liittyviä artikkeleita Kaikki 15 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Vision-and-language navigation: A survey of tasks, methods, and future directions

J Gu, E Stefani, Q Wu, J Thomason… - ar** …

Tallenna Viittaa Viittausten määrä 74 Aiheeseen liittyviä artikkeleita Kaikki 10 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

GlitchBench: Can large multimodal models detect video game glitches?

MR Taesiri, T Feng, CP Bezemer… - Proceedings of the …, 2024 - openaccess.thecvf.com

Large multimodal models (LMMs) have evolved from large language models (LLMs) to
integrate multiple input modalities such as visual inputs. This integration augments the …

Tallenna Viittaa Viittausten määrä 13 Aiheeseen liittyviä artikkeleita Kaikki 5 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Less is more: Generating grounded navigation instructions from landmarks

S Wang, C Montgomery, J Orbay… - Proceedings of the …, 2022 - openaccess.thecvf.com

We study the automatic generation of navigation instructions from 360-degree images
captured on indoor routes. Existing generators suffer from poor visual grounding, causing …

Tallenna Viittaa Viittausten määrä 55 Aiheeseen liittyviä artikkeleita Kaikki 7 versiota HTML-versio

Luo ilmoitus

Viittaa

Tarkennettu haku

Tallennettu omaan kirjastoon

On the evaluation of vision-and-language navigation instructions

Training language models to follow instructions with human feedback

Embodied navigation with multi-modal information: A survey from tasks to methodology

Core challenges in embodied vision-language planning

Vision-and-language navigation: A survey of tasks, methods, and future directions

GlitchBench: Can large multimodal models detect video game glitches?

Less is more: Generating grounded navigation instructions from landmarks