Google Akademik

Z **, W Chen, X Guo, W He, Y Ding, B Hong… - Science China …, 2025 - Springer

For a long time, researchers have sought artificial intelligence (AI) that matches or exceeds
human intelligence. AI agents, which are artificial entities capable of sensing the …

Kaydet Alıntı yap Alıntılanma sayısı: 732 İlgili makaleler 4 sürümün hepsi

[Free GPT-4]

[PDF] arxiv.org

Mm-llms: Recent advances in multimodal large language models

D Zhang, Y Yu, J Dong, C Li, D Su, C Chu… - arxiv preprint arxiv …, 2024 - arxiv.org

In the past year, MultiModal Large Language Models (MM-LLMs) have undergone
substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs …

Kaydet Alıntı yap Alıntılanma sayısı: 205 İlgili makaleler 2 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]

[PDF] arxiv.org

Minigpt-4: Enhancing vision-language understanding with advanced large language models

D Zhu, J Chen, X Shen, X Li, M Elhoseiny - arxiv preprint arxiv …, 2023 - arxiv.org

The recent GPT-4 has demonstrated extraordinary multi-modal abilities, such as directly
generating websites from handwritten text and identifying humorous elements within …

Kaydet Alıntı yap Alıntılanma sayısı: 2441 İlgili makaleler 7 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]

[PDF] thecvf.com

Improved baselines with visual instruction tuning

H Liu, C Li, Y Li, YJ Lee - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com

Large multimodal models (LMM) have recently shown encouraging progress with visual
instruction tuning. In this paper we present the first systematic study to investigate the design …

Kaydet Alıntı yap Alıntılanma sayısı: 1758 İlgili makaleler 5 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]

[PDF] arxiv.org

Video-llava: Learning united visual representation by alignment before projection

B Lin, Y Ye, B Zhu, J Cui, M Ning, P **… - arxiv preprint arxiv …, 2023 - arxiv.org

The Large Vision-Language Model (LVLM) has enhanced the performance of various
downstream tasks in visual-language understanding. Most existing approaches encode …

Kaydet Alıntı yap Alıntılanma sayısı: 433 İlgili makaleler 3 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]

[PDF] arxiv.org

MM1: methods, analysis and insights from multimodal LLM pre-training

B McKinzie, Z Gan, JP Fauconnier, S Dodge… - … on Computer Vision, 2024 - Springer

In this work, we discuss building performant Multimodal Large Language Models (MLLMs).
In particular, we study the importance of various architecture components and data choices …

Kaydet Alıntı yap Alıntılanma sayısı: 180 İlgili makaleler 2 sürümün hepsi

[Free GPT-4]

[PDF] arxiv.org

A survey on multimodal large language models

S Yin, C Fu, S Zhao, K Li, X Sun, T Xu… - arxiv preprint arxiv …, 2023 - arxiv.org

Multimodal Large Language Model (MLLM) recently has been a new rising research
hotspot, which uses powerful Large Language Models (LLMs) as a brain to perform …

Kaydet Alıntı yap Alıntılanma sayısı: 1067 İlgili makaleler 6 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]

[PDF] arxiv.org

Evaluating object hallucination in large vision-language models

Y Li, Y Du, K Zhou, J Wang, WX Zhao… - arxiv preprint arxiv …, 2023 - arxiv.org

Inspired by the superior language abilities of large language models (LLM), large vision-
language models (LVLM) have been recently explored by integrating powerful LLMs for …

Kaydet Alıntı yap Alıntılanma sayısı: 739 İlgili makaleler 6 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]

[PDF] neurips.cc

3d-llm: Injecting the 3d world into large language models

Y Hong, H Zhen, P Chen, S Zheng… - Advances in …, 2023 - proceedings.neurips.cc

Large language models (LLMs) and Vision-Language Models (VLMs) have been proved to
excel at multiple tasks, such as commonsense reasoning. Powerful as these models can be …

Kaydet Alıntı yap Alıntılanma sayısı: 252 İlgili makaleler 7 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]

[PDF] thecvf.com

mplug-owl2: Revolutionizing multi-modal large language model with modality collaboration

Q Ye, H Xu, J Ye, M Yan, A Hu, H Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Multi-modal Large Language Models (MLLMs) have demonstrated impressive
instruction abilities across various open-ended tasks. However previous methods have …

Kaydet Alıntı yap Alıntılanma sayısı: 332 İlgili makaleler 4 sürümün hepsi HTML olarak görüntüle

Uyarı oluştur

Alıntı yap

Gelişmiş arama

Kitaplığım'a kaydedildi

Multimodal-gpt: A vision and language model for dialogue with humans

The rise and potential of large language model based agents: A survey

Mm-llms: Recent advances in multimodal large language models

Minigpt-4: Enhancing vision-language understanding with advanced large language models

Improved baselines with visual instruction tuning

Video-llava: Learning united visual representation by alignment before projection

MM1: methods, analysis and insights from multimodal LLM pre-training

A survey on multimodal large language models

Evaluating object hallucination in large vision-language models

3d-llm: Injecting the 3d world into large language models

mplug-owl2: Revolutionizing multi-modal large language model with modality collaboration