Google Académico

D Zhang, Y Yu, J Dong, C Li, D Su, C Chu… - arxiv preprint arxiv …, 2024 - arxiv.org

In the past year, MultiModal Large Language Models (MM-LLMs) have undergone
substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs …

Guardar Citar Citado por 205 Artículos relacionados Las 2 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

A survey on hallucination in large vision-language models

H Liu, W Xue, Y Chen, D Chen, X Zhao, K Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent development of Large Vision-Language Models (LVLMs) has attracted growing
attention within the AI landscape for its practical implementation potential. However,`` …

Guardar Citar Citado por 142 Artículos relacionados Las 2 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

MM1: methods, analysis and insights from multimodal LLM pre-training

B McKinzie, Z Gan, JP Fauconnier, S Dodge… - … on Computer Vision, 2024 - Springer

In this work, we discuss building performant Multimodal Large Language Models (MLLMs).
In particular, we study the importance of various architecture components and data choices …

Guardar Citar Citado por 180 Artículos relacionados Las 2 versiones

[Free GPT-4]

[PDF] arxiv.org

Trustllm: Trustworthiness in large language models

Y Huang, L Sun, H Wang, S Wu, Q Zhang, Y Li… - arxiv preprint arxiv …, 2024 - arxiv.org

Large language models (LLMs), exemplified by ChatGPT, have gained considerable
attention for their excellent natural language processing capabilities. Nonetheless, these …

Guardar Citar Citado por 246 Artículos relacionados Las 4 versiones Versión en HTML

[Free GPT-4]

[HTML] mlr.press

[HTML][HTML] Position: TrustLLM: Trustworthiness in large language models

Y Huang, L Sun, H Wang, S Wu… - International …, 2024 - proceedings.mlr.press

Large language models (LLMs) have gained considerable attention for their excellent
natural language processing capabilities. Nonetheless, these LLMs present many …

Guardar Citar Citado por 43 Artículos relacionados En caché

[Free GPT-4]

[PDF] arxiv.org

Llava-phi: Efficient multi-modal assistant with small language model

Y Zhu, M Zhu, N Liu, Z Xu, Y Peng - … of the 1st International Workshop on …, 2024 - dl.acm.org

In this paper, we introduce LLaVA-φ (LLaVA-Phi), an efficient multi-modal assistant that
harnesses the power of the recently advanced small language model, Phi-2, to facilitate …

Guardar Citar Citado por 70 Artículos relacionados Las 3 versiones

[Free GPT-4]

[PDF] arxiv.org

Large language models: A survey

S Minaee, T Mikolov, N Nikzad, M Chenaghlu… - arxiv preprint arxiv …, 2024 - arxiv.org

Large Language Models (LLMs) have drawn a lot of attention due to their strong
performance on a wide range of natural language tasks, since the release of ChatGPT in …

Guardar Citar Citado por 535 Artículos relacionados Las 3 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

Gpt-4v in wonderland: Large multimodal models for zero-shot smartphone gui navigation

A Yan, Z Yang, W Zhu, K Lin, L Li, J Wang… - arxiv preprint arxiv …, 2023 - arxiv.org

We present MM-Navigator, a GPT-4V-based agent for the smartphone graphical user
interface (GUI) navigation task. MM-Navigator can interact with a smartphone screen as …

Guardar Citar Citado por 80 Artículos relacionados Las 2 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

Mobilevlm: A fast, reproducible and strong vision language assistant for mobile devices

X Chu, L Qiao, X Lin, S Xu, Y Yang, Y Hu, F Wei… - arxiv preprint arxiv …, 2023 - arxiv.org

We present MobileVLM, a competent multimodal vision language model (MMVLM) targeted
to run on mobile devices. It is an amalgamation of a myriad of architectural designs and …

Guardar Citar Citado por 79 Artículos relacionados Las 2 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

Omg-llava: Bridging image-level, object-level, pixel-level reasoning and understanding

T Zhang, X Li, H Fei, H Yuan, S Wu, S Ji… - arxiv preprint arxiv …, 2024 - arxiv.org

Current universal segmentation methods demonstrate strong capabilities in pixel-level
image and video understanding. However, they lack reasoning abilities and cannot be …

Guardar Citar Citado por 29 Artículos relacionados Las 4 versiones Versión en HTML

Crear alerta

Citar

Búsqueda avanzada

Guardado en Mi biblioteca

Llava-plus: Learning to use tools for creating multimodal agents

Mm-llms: Recent advances in multimodal large language models

A survey on hallucination in large vision-language models

MM1: methods, analysis and insights from multimodal LLM pre-training

Trustllm: Trustworthiness in large language models

[HTML][HTML] Position: TrustLLM: Trustworthiness in large language models

Llava-phi: Efficient multi-modal assistant with small language model

Large language models: A survey

Gpt-4v in wonderland: Large multimodal models for zero-shot smartphone gui navigation

Mobilevlm: A fast, reproducible and strong vision language assistant for mobile devices

Omg-llava: Bridging image-level, object-level, pixel-level reasoning and understanding