Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Monkey: Image resolution and text label are important things for large multi-modal models
Z Li, B Yang, Q Liu, Z Ma, S Zhang… - proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Large Multimodal Models (LMMs) have shown promise in vision-language tasks but
struggle with high-resolution input and detailed scene understanding. Addressing these …
struggle with high-resolution input and detailed scene understanding. Addressing these …
Minicpm-v: A gpt-4v level mllm on your phone
The recent surge of Multimodal Large Language Models (MLLMs) has fundamentally
reshaped the landscape of AI research and industry, shedding light on a promising path …
reshaped the landscape of AI research and industry, shedding light on a promising path …
Ureader: Universal ocr-free visually-situated language understanding with multimodal large language model
Text is ubiquitous in our visual world, conveying crucial information, such as in documents,
websites, and everyday photographs. In this work, we propose UReader, a first exploration …
websites, and everyday photographs. In this work, we propose UReader, a first exploration …
Sphinx-x: Scaling data and parameters for a family of multi-modal large language models
We propose SPHINX-X, an extensive Multimodality Large Language Model (MLLM) series
developed upon SPHINX. To improve the architecture and training efficiency, we modify the …
developed upon SPHINX. To improve the architecture and training efficiency, we modify the …
mplug-docowl 1.5: Unified structure learning for ocr-free document understanding
Structure information is critical for understanding the semantics of text-rich images, such as
documents, tables, and charts. Existing Multimodal Large Language Models (MLLMs) for …
documents, tables, and charts. Existing Multimodal Large Language Models (MLLMs) for …
mplug-docowl: Modularized multimodal large language model for document understanding
Document understanding refers to automatically extract, analyze and comprehend
information from various types of digital documents, such as a web page. Existing Multi …
information from various types of digital documents, such as a web page. Existing Multi …
Docformer: End-to-end transformer for document understanding
We present DocFormer-a multi-modal transformer based architecture for the task of Visual
Document Understanding (VDU). VDU is a challenging problem which aims to understand …
Document Understanding (VDU). VDU is a challenging problem which aims to understand …
Unifying vision, text, and layout for universal document processing
Abstract We propose Universal Document Processing (UDOP), a foundation Document AI
model which unifies text, image, and layout modalities together with varied task formats …
model which unifies text, image, and layout modalities together with varied task formats …
Layoutlmv2: Multi-modal pre-training for visually-rich document understanding
Pre-training of text and layout has proved effective in a variety of visually-rich document
understanding tasks due to its effective model architecture and the advantage of large-scale …
understanding tasks due to its effective model architecture and the advantage of large-scale …
Mmlongbench-doc: Benchmarking long-context document understanding with visualizations
Understanding documents with rich layouts and multi-modal components is a long-standing
and practical task. Recent Large Vision-Language Models (LVLMs) have made remarkable …
and practical task. Recent Large Vision-Language Models (LVLMs) have made remarkable …