Google Académico

Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents

Z Wang, S Cai, G Chen, A Liu, X Ma, Y Liang - ar** multi-task embodied agents. We've …

Guardar Citar Citado por 76 Artículos relacionados Las 3 versiones Versión en HTML

Jarvis-1: Open-world multi-task agents with memory-augmented multimodal language models

Z Wang, S Cai, A Liu, Y **, J Hou… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

Achieving human-like planning and control with multimodal observations in an open world is
a key milestone for more functional generalist agents. Existing approaches can handle …

Guardar Citar Citado por 81 Artículos relacionados Las 6 versiones

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Computational experiments meet large language model based agents: A survey and perspective

Q Ma, X Xue, D Zhou, X Yu, D Liu, X Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

Computational experiments have emerged as a valuable method for studying complex
systems, involving the algorithmization of counterfactuals. However, accurately representing …

Guardar Citar Citado por 9 Artículos relacionados Las 2 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Rocket-1: Mastering open-world interaction with visual-temporal context prompting

S Cai, Z Wang, K Lian, Z Mu, X Ma, A Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

Vision-language models (VLMs) have excelled in multimodal tasks, but adapting them to
embodied decision-making in open-world environments presents challenges. One critical …

Guardar Citar Citado por 2 Artículos relacionados Las 4 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Creative agents: Empowering agents with imagination for creative tasks

C Zhang, P Cai, Y Fu, H Yuan, Z Lu - arxiv preprint arxiv:2312.02519, 2023 - arxiv.org

We study building embodied agents for open-ended creative tasks. While existing methods
build instruction-following agents that can perform diverse open-ended tasks, none of them …

Guardar Citar Citado por 15 Artículos relacionados Las 2 versiones En caché

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents

Z Wang, S Cai, Z Mu, H Lin, C Zhang, X Liu, Q Li… - arxiv preprint arxiv …, 2024 - arxiv.org

This paper presents OmniJARVIS, a novel Vision-Language-Action (VLA) model for open-
world instruction-following agents in Minecraft. Compared to prior works that either emit …

Guardar Citar Citado por 1 Artículos relacionados Las 6 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Odyssey: Empowering Minecraft Agents with Open-World Skills

S Liu, Y Li, K Zhang, Z Cui, W Fang, Y Zheng… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent studies have delved into constructing generalist agents for open-world environments
like Minecraft. Despite the encouraging results, existing efforts mainly focus on solving basic …

Guardar Citar Citado por 1 Artículos relacionados Las 2 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Groot-1.5: Learning to follow multi-modal instructions from weak supervision

S Cai, B Zhang, Z Wang, X Ma, A Liu… - Multi-modal Foundation …, 2024 - openreview.net

This paper studies the problem of learning an agent policy that can follow various forms of
instructions. Specifically, we focus on multi-modal instructions: the policy is expected to …

Guardar Citar Citado por 2 Artículos relacionados Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

MageBench: Bridging Large Multimodal Models to Agents

M Zhang, Q Dai, Y Yang, J Bao, D Chen, K Qiu… - arxiv preprint arxiv …, 2024 - arxiv.org

LMMs have shown impressive visual understanding capabilities, with the potential to be
applied in agents, which demand strong reasoning and planning abilities. Nevertheless …

Guardar Citar Artículos relacionados Las 2 versiones Versión en HTML

Crear alerta

Citar

Búsqueda avanzada

Guardado en Mi biblioteca

Mcu: A task-centric framework for open-ended agent evaluation in minecraft

Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents

Jarvis-1: Open-world multi-task agents with memory-augmented multimodal language models

Computational experiments meet large language model based agents: A survey and perspective

Rocket-1: Mastering open-world interaction with visual-temporal context prompting

Creative agents: Empowering agents with imagination for creative tasks

OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents

Odyssey: Empowering Minecraft Agents with Open-World Skills

Groot-1.5: Learning to follow multi-modal instructions from weak supervision

MageBench: Bridging Large Multimodal Models to Agents