Mm-llms: Recent advances in multimodal large language models
In the past year, MultiModal Large Language Models (MM-LLMs) have undergone
substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs …
substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs …
A survey on hallucination in large vision-language models
Recent development of Large Vision-Language Models (LVLMs) has attracted growing
attention within the AI landscape for its practical implementation potential. However,`` …
attention within the AI landscape for its practical implementation potential. However,`` …
MM1: methods, analysis and insights from multimodal LLM pre-training
In this work, we discuss building performant Multimodal Large Language Models (MLLMs).
In particular, we study the importance of various architecture components and data choices …
In particular, we study the importance of various architecture components and data choices …
Trustllm: Trustworthiness in large language models
Large language models (LLMs), exemplified by ChatGPT, have gained considerable
attention for their excellent natural language processing capabilities. Nonetheless, these …
attention for their excellent natural language processing capabilities. Nonetheless, these …
[HTML][HTML] Position: TrustLLM: Trustworthiness in large language models
Large language models (LLMs) have gained considerable attention for their excellent
natural language processing capabilities. Nonetheless, these LLMs present many …
natural language processing capabilities. Nonetheless, these LLMs present many …
Llava-phi: Efficient multi-modal assistant with small language model
In this paper, we introduce LLaVA-φ (LLaVA-Phi), an efficient multi-modal assistant that
harnesses the power of the recently advanced small language model, Phi-2, to facilitate …
harnesses the power of the recently advanced small language model, Phi-2, to facilitate …
Large language models: A survey
Large Language Models (LLMs) have drawn a lot of attention due to their strong
performance on a wide range of natural language tasks, since the release of ChatGPT in …
performance on a wide range of natural language tasks, since the release of ChatGPT in …
Gpt-4v in wonderland: Large multimodal models for zero-shot smartphone gui navigation
We present MM-Navigator, a GPT-4V-based agent for the smartphone graphical user
interface (GUI) navigation task. MM-Navigator can interact with a smartphone screen as …
interface (GUI) navigation task. MM-Navigator can interact with a smartphone screen as …
Mobilevlm: A fast, reproducible and strong vision language assistant for mobile devices
We present MobileVLM, a competent multimodal vision language model (MMVLM) targeted
to run on mobile devices. It is an amalgamation of a myriad of architectural designs and …
to run on mobile devices. It is an amalgamation of a myriad of architectural designs and …
Omg-llava: Bridging image-level, object-level, pixel-level reasoning and understanding
Current universal segmentation methods demonstrate strong capabilities in pixel-level
image and video understanding. However, they lack reasoning abilities and cannot be …
image and video understanding. However, they lack reasoning abilities and cannot be …