Mm-llms: Recent advances in multimodal large language models
In the past year, MultiModal Large Language Models (MM-LLMs) have undergone
substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs …
substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs …
The (r) evolution of multimodal large language models: A survey
Connecting text and visual modalities plays an essential role in generative intelligence. For
this reason, inspired by the success of large language models, significant research efforts …
this reason, inspired by the success of large language models, significant research efforts …
Uniir: Training and benchmarking universal multimodal information retrievers
Existing information retrieval (IR) models often assume a homogeneous format, limiting their
applicability to diverse user needs, such as searching for images with text descriptions …
applicability to diverse user needs, such as searching for images with text descriptions …
Kosmos-2.5: A multimodal literate model
The automatic reading of text-intensive images represents a significant advancement toward
achieving Artificial General Intelligence (AGI). In this paper we present KOSMOS-2.5, a …
achieving Artificial General Intelligence (AGI). In this paper we present KOSMOS-2.5, a …
Anygpt: Unified multimodal llm with discrete sequence modeling
We introduce AnyGPT, an any-to-any multimodal language model that utilizes discrete
representations for the unified processing of various modalities, including speech, text …
representations for the unified processing of various modalities, including speech, text …
Earthgpt: A universal multi-modal large language model for multi-sensor image comprehension in remote sensing domain
Multimodal large language models (MLLMs) have demonstrated remarkable success in
vision and visual-language tasks within the natural image domain. Owing to the significant …
vision and visual-language tasks within the natural image domain. Owing to the significant …
Recommendation with generative models
Generative models are a class of AI models capable of creating new instances of data by
learning and sampling from their statistical distributions. In recent years, these models have …
learning and sampling from their statistical distributions. In recent years, these models have …
Unifiedmllm: Enabling unified representation for multi-modal multi-tasks with large language model
Significant advancements has recently been achieved in the field of multi-modal large
language models (MLLMs), demonstrating their remarkable capabilities in understanding …
language models (MLLMs), demonstrating their remarkable capabilities in understanding …
Multi-modal generative ai: Multi-modal llm, diffusion and beyond
Multi-modal generative AI has received increasing attention in both academia and industry.
Particularly, two dominant families of techniques are: i) The multi-modal large language …
Particularly, two dominant families of techniques are: i) The multi-modal large language …
Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
Building on the foundations of language modeling in natural language processing, Next
Token Prediction (NTP) has evolved into a versatile training objective for machine learning …
Token Prediction (NTP) has evolved into a versatile training objective for machine learning …