Large multimodal agents: A survey

J **e, Z Chen, R Zhang, X Wan, G Li - arxiv preprint arxiv:2402.15116, 2024 - arxiv.org
Large language models (LLMs) have achieved superior performance in powering text-
based AI agents, endowing them with decision-making and reasoning abilities akin to …

Hierarchical auto-organizing system for open-ended multi-agent navigation

Z Zhao, K Chen, D Guo, W Chai, T Ye, Y Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
Navigating complex environments in Minecraft poses significant challenges for multi-agent
systems due to the game's dynamic and unpredictable open-world setting. Agents need to …

Teaching Tailored to Talent: Adverse Weather Restoration via Prompt Pool and Depth-Anything Constraint

S Chen, T Ye, K Zhang, Z **ng, Y Lin, L Zhu - European Conference on …, 2024 - Springer
Recent advancements in adverse weather restoration have shown potential, yet the
unpredictable and varied combinations of weather degradations in the real world pose …

MovieChat+: Question-aware Sparse Memory for Long Video Question Answering

E Song, W Chai, T Ye, JN Hwang, X Li… - arxiv preprint arxiv …, 2024 - arxiv.org
Recently, integrating video foundation models and large language models to build a video
understanding system can overcome the limitations of specific pre-defined vision tasks. Yet …

Do we really need a complex agent system? distill embodied agent into a single model

Z Zhao, K Ma, W Chai, X Wang, K Chen, D Guo… - arxiv preprint arxiv …, 2024 - arxiv.org
With the power of large language models (LLMs), open-ended embodied agents can flexibly
understand human instructions, generate interpretable guidance strategies, and output …

A survey of neural code intelligence: Paradigms, advances and beyond

Q Sun, Z Chen, F Xu, K Cheng, C Ma, Z Yin… - arxiv preprint arxiv …, 2024 - arxiv.org
Neural Code Intelligence--leveraging deep learning to understand, generate, and optimize
code--holds immense potential for transformative impacts on the whole society. Bridging the …

LLaVA-ultra: Large Chinese language and vision assistant for ultrasound

X Guo, W Chai, SY Li, G Wang - Proceedings of the 32nd ACM …, 2024 - dl.acm.org
Multimodal Large Language Model (MLLM) has recently garnered attention as a prominent
research focus. By harnessing powerful LLM, it facilitates a transition of conversational …

A Survey on Human-Centric LLMs

JY Wang, N Sukiennik, T Li, W Su, Q Hao, J Xu… - arxiv preprint arxiv …, 2024 - arxiv.org
The rapid evolution of large language models (LLMs) and their capacity to simulate human
cognition and behavior has given rise to LLM-based frameworks and tools that are …

AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark

W Chai, E Song, Y Du, C Meng, V Madhavan… - arxiv preprint arxiv …, 2024 - arxiv.org
Video detailed captioning is a key task which aims to generate comprehensive and coherent
textual descriptions of video content, benefiting both video understanding and generation. In …

Odyssey: Empowering Minecraft Agents with Open-World Skills

S Liu, Y Li, K Zhang, Z Cui, W Fang, Y Zheng… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent studies have delved into constructing generalist agents for open-world environments
like Minecraft. Despite the encouraging results, existing efforts mainly focus on solving basic …