The rise and potential of large language model based agents: A survey

Z **, W Chen, X Guo, W He, Y Ding, B Hong… - Science China …, 2025 - Springer
For a long time, researchers have sought artificial intelligence (AI) that matches or exceeds
human intelligence. AI agents, which are artificial entities capable of sensing the …

[HTML][HTML] A comprehensive review of recent research trends on unmanned aerial vehicles (uavs)

K Telli, O Kraa, Y Himeur, A Ouamane, M Boumehraz… - Systems, 2023 - mdpi.com
The growing interest in unmanned aerial vehicles (UAVs) from both the scientific and
industrial sectors has attracted a wave of new researchers and substantial investments in …

Chatgpt for robotics: Design principles and model abilities

SH Vemprala, R Bonatti, A Bucker, A Kapoor - Ieee Access, 2024 - ieeexplore.ieee.org
This paper presents an experimental study regarding the use of OpenAI's ChatGPT for
robotics applications. We outline a strategy that combines design principles for prompt …

Image retrieval on real-life images with pre-trained vision-and-language models

Z Liu, C Rodriguez-Opazo… - Proceedings of the …, 2021 - openaccess.thecvf.com
We extend the task of composed image retrieval, where an input query consists of an image
and short textual description of how to modify the image. Existing methods have only been …

Panogen: Text-conditioned panoramic environment generation for vision-and-language navigation

J Li, M Bansal - Advances in Neural Information Processing …, 2023 - proceedings.neurips.cc
Abstract Vision-and-Language Navigation requires the agent to follow language instructions
to navigate through 3D environments. One main challenge in Vision-and-Language …

Adaptive zone-aware hierarchical planner for vision-language navigation

C Gao, X Peng, M Yan, H Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract The task of Vision-Language Navigation (VLN) is for an embodied agent to reach
the global goal according to the instruction. Essentially, during navigation, a series of sub …

Languagerefer: Spatial-language model for 3d visual grounding

J Roh, K Desingh, A Farhadi… - Conference on Robot …, 2022 - proceedings.mlr.press
For robots to understand human instructions and perform meaningful tasks in the near
future, it is important to develop learned models that comprehend referential language to …

Vision-language navigation with random environmental mixup

C Liu, F Zhu, X Chang, X Liang… - Proceedings of the …, 2021 - openaccess.thecvf.com
Vision-language Navigation (VLN) task requires an agent to perceive both the visual scene
and natural language and navigate step-by-step. Large data bias makes the VLN task …

Neighbor-view enhanced model for vision and language navigation

D An, Y Qi, Y Huang, Q Wu, L Wang, T Tan - Proceedings of the 29th …, 2021 - dl.acm.org
Vision and Language Navigation (VLN) requires an agent to navigate to a target location by
following natural language instructions. Most of existing works represent a navigation …

Learning to dub movies via hierarchical prosody models

G Cong, L Li, Y Qi, ZJ Zha, Q Wu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Given a piece of text, a video clip and a reference audio, the movie dubbing (also known as
visual voice clone, V2C) task aims to generate speeches that match the speaker's emotion …