- Academic Search

X Wang, D Song, S Chen, C Zhang, B Wang - ar** user interfaces, introducing new possibilities for personalized …

Save Cite Cited by 1 Related articles View as HTML

MageBench: Bridging Large Multimodal Models to Agents

M Zhang, Q Dai, Y Yang, J Bao, D Chen, K Qiu… - arxiv preprint arxiv …, 2024 - arxiv.org

LMMs have shown impressive visual understanding capabilities, with the potential to be
applied in agents, which demand strong reasoning and planning abilities. Nevertheless …

Save Cite Related articles View as HTML

[Free GPT-4]

[PDF] arxiv.org

SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World

J Zhang, C Gao, L Zhang, Y Li, H Yin - arxiv preprint arxiv:2412.07472, 2024 - arxiv.org

Recent advances in embodied agents with multimodal perception and reasoning
capabilities based on large vision-language models (LVLMs), excel in autonomously …

Save Cite Related articles View as HTML

[Free GPT-4]

[PDF] preprints.org

[PDF][PDF] Os agents: A survey on mllm-based agents for general computing devices use

X Hu, T **ong, B Yi, Z Wei, R **ao, Y Chen, J Ye, M Tao… - 2024 - preprints.org

The dream to create AI assistants as capable and versatile as the fictional JARVIS from Iron
Man has long captivated imaginations. With the evolution of (multimodal) large language …

Save Cite Cited by 1 Related articles All 4 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

Mobile-bench: An evaluation benchmark for llm-based mobile agents

LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via a Hybrid Architecture

MageBench: Bridging Large Multimodal Models to Agents

SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World

[PDF][PDF] Os agents: A survey on mllm-based agents for general computing devices use