LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via a Hybrid Architecture
MageBench: Bridging Large Multimodal Models to Agents
LMMs have shown impressive visual understanding capabilities, with the potential to be
applied in agents, which demand strong reasoning and planning abilities. Nevertheless …
applied in agents, which demand strong reasoning and planning abilities. Nevertheless …
SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World
Recent advances in embodied agents with multimodal perception and reasoning
capabilities based on large vision-language models (LVLMs), excel in autonomously …
capabilities based on large vision-language models (LVLMs), excel in autonomously …
[PDF][PDF] Os agents: A survey on mllm-based agents for general computing devices use
The dream to create AI assistants as capable and versatile as the fictional JARVIS from Iron
Man has long captivated imaginations. With the evolution of (multimodal) large language …
Man has long captivated imaginations. With the evolution of (multimodal) large language …