Google 학술 검색

D Zhang, Y Yu, J Dong, C Li, D Su, C Chu… - arxiv preprint arxiv …, 2024 - arxiv.org

In the past year, MultiModal Large Language Models (MM-LLMs) have undergone
substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs …

저장 인용 205회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]

[PDF] oup.com

A Survey of Multimodel Large Language Models

Z Liang, Y Xu, Y Hong, P Shang, Q Wang… - Proceedings of the 3rd …, 2024 - dl.acm.org

With the widespread application of the Transformer architecture in various modalities,
including vision, the technology of large language models is evolving from a single modality …

저장 인용 156회 인용 관련 학술자료 전체 7개의 버전

[Free GPT-4]

[PDF] arxiv.org

A survey of large language models

WX Zhao, K Zhou, J Li, T Tang, X Wang, Y Hou… - arxiv preprint arxiv …, 2023 - arxiv.org

Language is essentially a complex, intricate system of human expressions governed by
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …

[Free GPT-4]

[PDF] arxiv.org

Mmbench: Is your multi-modal model an all-around player?

Y Liu, H Duan, Y Zhang, B Li, S Zhang, W Zhao… - European conference on …, 2024 - Springer

Large vision-language models (VLMs) have recently achieved remarkable progress,
exhibiting impressive multimodal perception and reasoning abilities. However, effectively …

저장 인용 726회 인용 관련 학술자료 전체 3개의 버전

[Free GPT-4]

[PDF] arxiv.org

Sharegpt4v: Improving large multi-modal models with better captions

L Chen, J Li, X Dong, P Zhang, C He, J Wang… - … on Computer Vision, 2024 - Springer

Modality alignment serves as the cornerstone for large multi-modal models (LMMs).
However, the impact of different attributes (eg, data type, quality, and scale) of training data …

저장 인용 449회 인용 관련 학술자료 전체 3개의 버전

[Free GPT-4]

[PDF] arxiv.org

Video-llava: Learning united visual representation by alignment before projection

B Lin, Y Ye, B Zhu, J Cui, M Ning, P **… - arxiv preprint arxiv …, 2023 - arxiv.org

The Large Vision-Language Model (LVLM) has enhanced the performance of various
downstream tasks in visual-language understanding. Most existing approaches encode …

저장 인용 433회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

Phi-3 technical report: A highly capable language model locally on your phone

M Abdin, J Aneja, H Awadalla, A Awadallah… - arxiv preprint arxiv …, 2024 - arxiv.org

We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion
tokens, whose overall performance, as measured by both academic benchmarks and …

저장 인용 759회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites

Z Chen, W Wang, H Tian, S Ye, Z Gao, E Cui… - Science China …, 2024 - Springer

In this paper, we introduce InternVL 1.5, an open-source multimodal large language model
(MLLM) to bridge the capability gap between open-source and proprietary commercial …

저장 인용 352회 인용 관련 학술자료 전체 2개의 버전

[Free GPT-4]

[PDF] arxiv.org

Yi: Open foundation models by 01. ai

A Young, B Chen, C Li, C Huang, G Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

We introduce the Yi model family, a series of language and multimodal models that
demonstrate strong multi-dimensional capabilities. The Yi model family is based on 6B and …

저장 인용 345회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]

[PDF] thecvf.com

Mmmu: A massive multi-discipline multimodal understanding and reasoning benchmark for expert agi

X Yue, Y Ni, K Zhang, T Zheng, R Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com

We introduce MMMU: a new benchmark designed to evaluate multimodal models on
massive multi-discipline tasks demanding college-level subject knowledge and deliberate …

저장 인용 534회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

알림 만들기

인용

고급 검색

라이브러리에 저장됨

Improved baselines with visual instruction tuning

Mm-llms: Recent advances in multimodal large language models

A Survey of Multimodel Large Language Models

A survey of large language models

Mmbench: Is your multi-modal model an all-around player?

Sharegpt4v: Improving large multi-modal models with better captions

Video-llava: Learning united visual representation by alignment before projection

Phi-3 technical report: A highly capable language model locally on your phone

How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites

Yi: Open foundation models by 01. ai

Mmmu: A massive multi-discipline multimodal understanding and reasoning benchmark for expert agi