Mm-llms: Recent advances in multimodal large language models
In the past year, MultiModal Large Language Models (MM-LLMs) have undergone
substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs …
substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs …
Large language models for generative information extraction: A survey
Abstract Information Extraction (IE) aims to extract structural knowledge from plain natural
language texts. Recently, generative Large Language Models (LLMs) have demonstrated …
language texts. Recently, generative Large Language Models (LLMs) have demonstrated …
A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions
The emergence of large language models (LLMs) has marked a significant breakthrough in
natural language processing (NLP), fueling a paradigm shift in information acquisition …
natural language processing (NLP), fueling a paradigm shift in information acquisition …
Adashield: Safeguarding multimodal large language models from structure-based attack via adaptive shield prompting
With the advent and widespread deployment of Multimodal Large Language Models
(MLLMs), the imperative to ensure their safety has become increasingly pronounced …
(MLLMs), the imperative to ensure their safety has become increasingly pronounced …
Sora: A review on background, technology, limitations, and opportunities of large vision models
Sora is a text-to-video generative AI model, released by OpenAI in February 2024. The
model is trained to generate videos of realistic or imaginative scenes from text instructions …
model is trained to generate videos of realistic or imaginative scenes from text instructions …
BRAVE: Broadening the visual encoding of vision-language models
Vision-language models (VLMs) are typically composed of a vision encoder, eg CLIP, and a
language model (LM) that interprets the encoded features to solve downstream tasks …
language model (LM) that interprets the encoded features to solve downstream tasks …
Vision-language models for medical report generation and visual question answering: A review
Medical vision-language models (VLMs) combine computer vision (CV) and natural
language processing (NLP) to analyze visual and textual medical data. Our paper reviews …
language processing (NLP) to analyze visual and textual medical data. Our paper reviews …
Unified hallucination detection for multimodal large language models
Despite significant strides in multimodal tasks, Multimodal Large Language Models (MLLMs)
are plagued by the critical issue of hallucination. The reliable detection of such …
are plagued by the critical issue of hallucination. The reliable detection of such …
FineMatch: Aspect-Based Fine-Grained Image and Text Mismatch Detection and Correction
Recent progress in large-scale pre-training has led to the development of advanced vision-
language models (VLMs) with remarkable proficiency in comprehending and generating …
language models (VLMs) with remarkable proficiency in comprehending and generating …
Hal-eval: A universal and fine-grained hallucination evaluation framework for large vision language models
Large Vision-Language Models (LVLMs) exhibit remarkable capabilities but struggle
with''hallucinations''-inconsistencies between images and their descriptions. Previous …
with''hallucinations''-inconsistencies between images and their descriptions. Previous …