- Academic Search

B McKinzie, Z Gan, JP Fauconnier, S Dodge… - … on Computer Vision, 2024 - Springer

In this work, we discuss building performant Multimodal Large Language Models (MLLMs).
In particular, we study the importance of various architecture components and data choices …

保存引用被引用数: 180 関連記事全 2 バージョン

[Free GPT-4]

[PDF] arxiv.org

Llm inference unveiled: Survey and roofline model insights

Z Yuan, Y Shang, Y Zhou, Z Dong, Z Zhou… - arxiv preprint arxiv …, 2024 - arxiv.org

The field of efficient Large Language Model (LLM) inference is rapidly evolving, presenting a
unique blend of opportunities and challenges. Although the field has expanded and is …

保存引用被引用数: 57 関連記事全 2 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Are We on the Right Way for Evaluating Large Vision-Language Models?

L Chen, J Li, X Dong, P Zhang, Y Zang, Z Chen… - arxiv preprint arxiv …, 2024 - arxiv.org

Large vision-language models (LVLMs) have recently achieved rapid progress, sparking
numerous studies to evaluate their multi-modal capabilities. However, we dig into current …

保存引用被引用数: 147 関連記事全 3 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Mm1. 5: Methods, analysis & insights from multimodal llm fine-tuning

H Zhang, M Gao, Z Gan, P Dufter, N Wenzel… - arxiv preprint arxiv …, 2024 - arxiv.org

We present MM1. 5, a new family of multimodal large language models (MLLMs) designed
to enhance capabilities in text-rich image understanding, visual referring and grounding …

保存引用被引用数: 15 関連記事全 3 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Llava-mod: Making llava tiny via moe knowledge distillation

F Shu, Y Liao, L Zhuo, C Xu, L Zhang, G Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

We introduce LLaVA-MoD, a novel framework designed to enable the efficient training of
small-scale Multimodal Language Models (s-MLLM) by distilling knowledge from large-scale …

保存引用被引用数: 9 関連記事全 3 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Rethinking visual prompting for multimodal large language models with external knowledge

Y Lin, Y Li, D Chen, W Xu, R Clark, P Torr… - arxiv preprint arxiv …, 2024 - arxiv.org

In recent years, multimodal large language models (MLLMs) have made significant strides
by training on vast high-quality image-text datasets, enabling them to generally understand …

保存引用被引用数: 6 関連記事全 3 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Dopra: Decoding over-accumulation penalization and re-allocation in specific weighting layer

J Wei, X Zhang - Proceedings of the 32nd ACM International …, 2024 - dl.acm.org

In this work, we introduce DOPRA, a novel approach designed to mitigate hallucinations in
multi-modal large language models (MLLMs). Unlike existing solutions that typically involve …

保存引用被引用数: 7 関連記事全 5 バージョン

[Free GPT-4]

[PDF] arxiv.org

Sharegpt4video: Improving video understanding and generation with better captions

L Chen, X Wei, J Li, X Dong, P Zhang, Y Zang… - arxiv preprint arxiv …, 2024 - arxiv.org

We present the ShareGPT4Video series, aiming to facilitate the video understanding of large
video-language models (LVLMs) and the video generation of text-to-video models (T2VMs) …

保存引用被引用数: 90 関連記事全 2 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Tinychart: Efficient chart understanding with visual token merging and program-of-thoughts learning

L Zhang, A Hu, H Xu, M Yan, Y Xu, Q **… - arxiv preprint arxiv …, 2024 - arxiv.org

Charts are important for presenting and explaining complex data relationships. Recently,
multimodal large language models (MLLMs) have shown remarkable capabilities in various …

保存引用被引用数: 21 関連記事全 2 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Automated multi-level preference for mllms

M Zhang, W Wu, Y Lu, Y Song, K Rong, H Yao… - arxiv preprint arxiv …, 2024 - arxiv.org

Current multimodal Large Language Models (MLLMs) suffer from``hallucination'',
occasionally generating responses that are not grounded in the input images. To tackle this …

保存引用被引用数: 5 関連記事全 2 バージョン HTMLバージョン

アラートを作成

引用

検索オプション

マイライブラリに保存しました

Tinyllava: A framework of small-scale large multimodal models

MM1: methods, analysis and insights from multimodal LLM pre-training

Llm inference unveiled: Survey and roofline model insights

Are We on the Right Way for Evaluating Large Vision-Language Models?

Mm1. 5: Methods, analysis & insights from multimodal llm fine-tuning

Llava-mod: Making llava tiny via moe knowledge distillation

Rethinking visual prompting for multimodal large language models with external knowledge

Dopra: Decoding over-accumulation penalization and re-allocation in specific weighting layer

Sharegpt4video: Improving video understanding and generation with better captions

Tinychart: Efficient chart understanding with visual token merging and program-of-thoughts learning

Automated multi-level preference for mllms