- Academic Search

Z **, W Chen, X Guo, W He, Y Ding, B Hong… - Science China …, 2025 - Springer

For a long time, researchers have sought artificial intelligence (AI) that matches or exceeds
human intelligence. AI agents, which are artificial entities capable of sensing the …

Lagre Referanse Sitert av 766 Beslektede artikler Alle 6 versjoner

[Free GPT-4]
[DeepSeek]

[HTML] mdpi.com

[HTML][HTML] A comprehensive review of recent research trends on unmanned aerial vehicles (uavs)

K Telli, O Kraa, Y Himeur, A Ouamane, M Boumehraz… - Systems, 2023 - mdpi.com

The growing interest in unmanned aerial vehicles (UAVs) from both the scientific and
industrial sectors has attracted a wave of new researchers and substantial investments in …

Lagre Referanse Sitert av 152 Beslektede artikler Alle 10 versjoner Bufret

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Chatgpt for robotics: Design principles and model abilities

SH Vemprala, R Bonatti, A Bucker, A Kapoor - Ieee Access, 2024 - ieeexplore.ieee.org

This paper presents an experimental study regarding the use of OpenAI's ChatGPT for
robotics applications. We outline a strategy that combines design principles for prompt …

Lagre Referanse Sitert av 501 Beslektede artikler Alle 6 versjoner

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Image retrieval on real-life images with pre-trained vision-and-language models

Z Liu, C Rodriguez-Opazo… - Proceedings of the …, 2021 - openaccess.thecvf.com

We extend the task of composed image retrieval, where an input query consists of an image
and short textual description of how to modify the image. Existing methods have only been …

Lagre Referanse Sitert av 201 Beslektede artikler Alle 8 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Panogen: Text-conditioned panoramic environment generation for vision-and-language navigation

J Li, M Bansal - Advances in Neural Information Processing …, 2023 - proceedings.neurips.cc

Abstract Vision-and-Language Navigation requires the agent to follow language instructions
to navigate through 3D environments. One main challenge in Vision-and-Language …

Lagre Referanse Sitert av 48 Beslektede artikler Alle 6 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Adaptive zone-aware hierarchical planner for vision-language navigation

C Gao, X Peng, M Yan, H Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract The task of Vision-Language Navigation (VLN) is for an embodied agent to reach
the global goal according to the instruction. Essentially, during navigation, a series of sub …

Lagre Referanse Sitert av 31 Beslektede artikler Alle 4 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Languagerefer: Spatial-language model for 3d visual grounding

J Roh, K Desingh, A Farhadi… - Conference on Robot …, 2022 - proceedings.mlr.press

For robots to understand human instructions and perform meaningful tasks in the near
future, it is important to develop learned models that comprehend referential language to …

Lagre Referanse Sitert av 102 Beslektede artikler Alle 7 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Vision-language navigation with random environmental mixup

C Liu, F Zhu, X Chang, X Liang… - Proceedings of the …, 2021 - openaccess.thecvf.com

Vision-language Navigation (VLN) task requires an agent to perceive both the visual scene
and natural language and navigate step-by-step. Large data bias makes the VLN task …

Lagre Referanse Sitert av 93 Beslektede artikler Alle 7 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Neighbor-view enhanced model for vision and language navigation

D An, Y Qi, Y Huang, Q Wu, L Wang, T Tan - Proceedings of the 29th …, 2021 - dl.acm.org

Vision and Language Navigation (VLN) requires an agent to navigate to a target location by
following natural language instructions. Most of existing works represent a navigation …

Lagre Referanse Sitert av 72 Beslektede artikler Alle 4 versjoner

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Learning to dub movies via hierarchical prosody models

G Cong, L Li, Y Qi, ZJ Zha, Q Wu… - Proceedings of the …, 2023 - openaccess.thecvf.com

Given a piece of text, a video clip and a reference audio, the movie dubbing (also known as
visual voice clone, V2C) task aims to generate speeches that match the speaker's emotion …

Lagre Referanse Sitert av 22 Beslektede artikler Alle 9 versjoner HTML-versjon

Opprett varsel

Referanse

Avansert søk

Lagret i Mitt bibliotek

A recurrent vision-and-language bert for navigation

The rise and potential of large language model based agents: A survey

[HTML][HTML] A comprehensive review of recent research trends on unmanned aerial vehicles (uavs)

Chatgpt for robotics: Design principles and model abilities

Image retrieval on real-life images with pre-trained vision-and-language models

Panogen: Text-conditioned panoramic environment generation for vision-and-language navigation

Adaptive zone-aware hierarchical planner for vision-language navigation

Languagerefer: Spatial-language model for 3d visual grounding

Vision-language navigation with random environmental mixup

Neighbor-view enhanced model for vision and language navigation

Learning to dub movies via hierarchical prosody models