Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Expanding performance boundaries of open-source multimodal models with model, data, and test-time scaling
We introduce InternVL 2.5, an advanced multimodal large language model (MLLM) series
that builds upon InternVL 2.0, maintaining its core model architecture while introducing …
that builds upon InternVL 2.0, maintaining its core model architecture while introducing …
Android in the zoo: Chain-of-action-thought for gui agents
Large language model (LLM) leads to a surge of autonomous GUI agents for smartphone,
which completes a task triggered by natural language through predicting a sequence of …
which completes a task triggered by natural language through predicting a sequence of …
Gui agents with foundation models: A comprehensive survey
Recent advances in foundation models, particularly Large Language Models (LLMs) and
Multimodal Large Language Models (MLLMs), have facilitated the development of intelligent …
Multimodal Large Language Models (MLLMs), have facilitated the development of intelligent …
Foundations and recent trends in multimodal mobile agents: A survey
Mobile agents are essential for automating tasks in complex and dynamic mobile
environments. As foundation models evolve, the demands for agents that can adapt in real …
environments. As foundation models evolve, the demands for agents that can adapt in real …
Mmiu: Multimodal multi-image understanding for evaluating large vision-language models
The capability to process multiple images is crucial for Large Vision-Language Models
(LVLMs) to develop a more thorough and nuanced understanding of a scene. Recent multi …
(LVLMs) to develop a more thorough and nuanced understanding of a scene. Recent multi …
Ferret-ui 2: Mastering universal user interface understanding across platforms
Building a generalist model for user interface (UI) understanding is challenging due to
various foundational issues, such as platform diversity, resolution variation, and data …
various foundational issues, such as platform diversity, resolution variation, and data …
Os-atlas: A foundation action model for generalist gui agents
Existing efforts in building GUI agents heavily rely on the availability of robust commercial
Vision-Language Models (VLMs) such as GPT-4o and GeminiProVision. Practitioners are …
Vision-Language Models (VLMs) such as GPT-4o and GeminiProVision. Practitioners are …
Showui: One vision-language-action model for generalist gui agent
Graphical User Interface (GUI) automation holds significant promise for enhancing human
productivity by assisting with digital tasks. While recent Large Language Models (LLMs) and …
productivity by assisting with digital tasks. While recent Large Language Models (LLMs) and …
Generalist virtual agents: A survey on autonomous agents across digital platforms
In this paper, we introduce the Generalist Virtual Agent (GVA), an autonomous entity
engineered to function across diverse digital platforms and environments, assisting users by …
engineered to function across diverse digital platforms and environments, assisting users by …
Showui: One vision-language-action model for gui visual agent
Building Graphical User Interface (GUI) assistants holds significant promise for enhancing
human workflow productivity. While most agents are language-based, relying on closed …
human workflow productivity. While most agents are language-based, relying on closed …