Large multimodal agents: A survey
Large language models (LLMs) have achieved superior performance in powering text-
based AI agents, endowing them with decision-making and reasoning abilities akin to …
based AI agents, endowing them with decision-making and reasoning abilities akin to …
Hierarchical auto-organizing system for open-ended multi-agent navigation
Navigating complex environments in Minecraft poses significant challenges for multi-agent
systems due to the game's dynamic and unpredictable open-world setting. Agents need to …
systems due to the game's dynamic and unpredictable open-world setting. Agents need to …
Teaching Tailored to Talent: Adverse Weather Restoration via Prompt Pool and Depth-Anything Constraint
Recent advancements in adverse weather restoration have shown potential, yet the
unpredictable and varied combinations of weather degradations in the real world pose …
unpredictable and varied combinations of weather degradations in the real world pose …
MovieChat+: Question-aware Sparse Memory for Long Video Question Answering
Recently, integrating video foundation models and large language models to build a video
understanding system can overcome the limitations of specific pre-defined vision tasks. Yet …
understanding system can overcome the limitations of specific pre-defined vision tasks. Yet …
Do we really need a complex agent system? distill embodied agent into a single model
With the power of large language models (LLMs), open-ended embodied agents can flexibly
understand human instructions, generate interpretable guidance strategies, and output …
understand human instructions, generate interpretable guidance strategies, and output …
A survey of neural code intelligence: Paradigms, advances and beyond
Neural Code Intelligence--leveraging deep learning to understand, generate, and optimize
code--holds immense potential for transformative impacts on the whole society. Bridging the …
code--holds immense potential for transformative impacts on the whole society. Bridging the …
LLaVA-ultra: Large Chinese language and vision assistant for ultrasound
Multimodal Large Language Model (MLLM) has recently garnered attention as a prominent
research focus. By harnessing powerful LLM, it facilitates a transition of conversational …
research focus. By harnessing powerful LLM, it facilitates a transition of conversational …
A Survey on Human-Centric LLMs
The rapid evolution of large language models (LLMs) and their capacity to simulate human
cognition and behavior has given rise to LLM-based frameworks and tools that are …
cognition and behavior has given rise to LLM-based frameworks and tools that are …
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Video detailed captioning is a key task which aims to generate comprehensive and coherent
textual descriptions of video content, benefiting both video understanding and generation. In …
textual descriptions of video content, benefiting both video understanding and generation. In …
Odyssey: Empowering Minecraft Agents with Open-World Skills
S Liu, Y Li, K Zhang, Z Cui, W Fang, Y Zheng… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent studies have delved into constructing generalist agents for open-world environments
like Minecraft. Despite the encouraging results, existing efforts mainly focus on solving basic …
like Minecraft. Despite the encouraging results, existing efforts mainly focus on solving basic …