MageBench: Bridging Large Multimodal Models to Agents

M Zhang, Q Dai, Y Yang, J Bao, D Chen, K Qiu… - arxiv preprint arxiv …, 2024 - arxiv.org
LMMs have shown impressive visual understanding capabilities, with the potential to be
applied in agents, which demand strong reasoning and planning abilities. Nevertheless …

SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World

J Zhang, C Gao, L Zhang, Y Li, H Yin - arxiv preprint arxiv:2412.07472, 2024 - arxiv.org
Recent advances in embodied agents with multimodal perception and reasoning
capabilities based on large vision-language models (LVLMs), excel in autonomously …

[PDF][PDF] Os agents: A survey on mllm-based agents for general computing devices use

X Hu, T **ong, B Yi, Z Wei, R **ao, Y Chen, J Ye, M Tao… - 2024 - preprints.org
The dream to create AI assistants as capable and versatile as the fictional JARVIS from Iron
Man has long captivated imaginations. With the evolution of (multimodal) large language …