- Academic Search

H Laurençon, L Saulnier, L Tronchon… - Advances in …, 2024 - proceedings.neurips.cc

Large multimodal models trained on natural documents, which interleave images and text,
outperform models trained on image-text pairs on various multimodal benchmarks …

Save Cite Cited by 251 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Datacomp: In search of the next generation of multimodal datasets

SY Gadre, G Ilharco, A Fang… - Advances in …, 2024 - proceedings.neurips.cc

Multimodal datasets are a critical component in recent breakthroughs such as CLIP, Stable
Diffusion and GPT-4, yet their design does not receive the same research attention as model …

Save Cite Cited by 356 Related articles All 9 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Deepseek-vl: towards real-world vision-language understanding

H Lu, W Liu, B Zhang, B Wang, K Dong, B Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

We present DeepSeek-VL, an open-source Vision-Language (VL) Model designed for real-
world vision and language understanding applications. Our approach is structured around …

Save Cite Cited by 192 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Internlm-xcomposer: A vision-language large model for advanced text-image comprehension and composition

P Zhang, X Dong, B Wang, Y Cao, C Xu… - arxiv preprint arxiv …, 2023 - arxiv.org

We propose InternLM-XComposer, a vision-language large model that enables advanced
image-text comprehension and composition. The innovative nature of our model is …

Save Cite Cited by 187 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Improving multimodal datasets with image captioning

T Nguyen, SY Gadre, G Ilharco… - Advances in Neural …, 2024 - proceedings.neurips.cc

Massive web datasets play a key role in the success of large vision-language models like
CLIP and Flamingo. However, the raw web data is noisy, and existing filtering methods to …

Save Cite Cited by 65 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

BrainCLIP: Bridging brain and visual-linguistic representation via CLIP for generic natural visual stimulus decoding

Y Liu, Y Ma, W Zhou, G Zhu, N Zheng - arxiv preprint arxiv:2302.12971, 2023 - arxiv.org

Due to the lack of paired samples and the low signal-to-noise ratio of functional MRI (fMRI)
signals, reconstructing perceived natural images or decoding their semantic contents from …

Save Cite Cited by 35 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] ieee.org

Survey of different large language model architectures: Trends, benchmarks, and challenges

M Shao, A Basit, R Karri, M Shafique - IEEE Access, 2024 - ieeexplore.ieee.org

Large Language Models (LLMs) represent a class of deep learning models adept at
understanding natural language and generating coherent responses to various prompts or …

Save Cite Cited by 7 Related articles All 4 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Omchat: A recipe to train multimodal language models with strong long context and video understanding

T Zhao, Q Zhang, K Lee, P Liu, L Zhang, C Fang… - arxiv preprint arxiv …, 2024 - arxiv.org

We introduce OmChat, a model designed to excel in handling long contexts and video
understanding tasks. OmChat's new architecture standardizes how different visual inputs are …

Save Cite Cited by 6 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] whu.edu.cn

Large Remote Sensing Model: Progress and Prospects

L ZHANG, L ZHANG, Q YUAN - Geomatics and Information Science …, 2023 - ch.whu.edu.cn

In recent years, significant advancements in large language models and visual foundation
models in the field of artificial intelligence have attracted scholars' attention to the potential of …

Save Cite Cited by 9 Related articles View as HTML

[Free GPT-4]

[PDF] arxiv.org

Cvlue: A new benchmark dataset for chinese vision-language understanding evaluation

Y Wang, Y Liu, F Yu, C Huang, K Li, Z Wan… - arxiv preprint arxiv …, 2024 - arxiv.org

Despite the rapid development of Chinese vision-language models (VLMs), most existing
Chinese vision-language (VL) datasets are constructed on Western-centric images from …

Save Cite Cited by 2 Related articles All 2 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

Taisu: A 166m large-scale high-quality dataset for chinese vision-language pre-training

Obelics: An open web-scale filtered dataset of interleaved image-text documents

Datacomp: In search of the next generation of multimodal datasets

Deepseek-vl: towards real-world vision-language understanding

Internlm-xcomposer: A vision-language large model for advanced text-image comprehension and composition

Improving multimodal datasets with image captioning

BrainCLIP: Bridging brain and visual-linguistic representation via CLIP for generic natural visual stimulus decoding

Survey of different large language model architectures: Trends, benchmarks, and challenges

Omchat: A recipe to train multimodal language models with strong long context and video understanding

Large Remote Sensing Model: Progress and Prospects

Cvlue: A new benchmark dataset for chinese vision-language understanding evaluation