Jarvis-1: Open-world multi-task agents with memory-augmented multimodal language models
Achieving human-like planning and control with multimodal observations in an open world is
a key milestone for more functional generalist agents. Existing approaches can handle …
a key milestone for more functional generalist agents. Existing approaches can handle …
Computational experiments meet large language model based agents: A survey and perspective
Q Ma, X Xue, D Zhou, X Yu, D Liu, X Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
Computational experiments have emerged as a valuable method for studying complex
systems, involving the algorithmization of counterfactuals. However, accurately representing …
systems, involving the algorithmization of counterfactuals. However, accurately representing …
Rocket-1: Mastering open-world interaction with visual-temporal context prompting
Vision-language models (VLMs) have excelled in multimodal tasks, but adapting them to
embodied decision-making in open-world environments presents challenges. One critical …
embodied decision-making in open-world environments presents challenges. One critical …
Creative agents: Empowering agents with imagination for creative tasks
We study building embodied agents for open-ended creative tasks. While existing methods
build instruction-following agents that can perform diverse open-ended tasks, none of them …
build instruction-following agents that can perform diverse open-ended tasks, none of them …
OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents
This paper presents OmniJARVIS, a novel Vision-Language-Action (VLA) model for open-
world instruction-following agents in Minecraft. Compared to prior works that either emit …
world instruction-following agents in Minecraft. Compared to prior works that either emit …
Odyssey: Empowering Minecraft Agents with Open-World Skills
S Liu, Y Li, K Zhang, Z Cui, W Fang, Y Zheng… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent studies have delved into constructing generalist agents for open-world environments
like Minecraft. Despite the encouraging results, existing efforts mainly focus on solving basic …
like Minecraft. Despite the encouraging results, existing efforts mainly focus on solving basic …
Groot-1.5: Learning to follow multi-modal instructions from weak supervision
This paper studies the problem of learning an agent policy that can follow various forms of
instructions. Specifically, we focus on multi-modal instructions: the policy is expected to …
instructions. Specifically, we focus on multi-modal instructions: the policy is expected to …
MageBench: Bridging Large Multimodal Models to Agents
LMMs have shown impressive visual understanding capabilities, with the potential to be
applied in agents, which demand strong reasoning and planning abilities. Nevertheless …
applied in agents, which demand strong reasoning and planning abilities. Nevertheless …