Deepseek-vl2: Mixture-of-experts vision-language models for advanced multimodal understanding
We present DeepSeek-VL2, an advanced series of large Mixture-of-Experts (MoE) Vision-
Language Models that significantly improves upon its predecessor, DeepSeek-VL, through …
Language Models that significantly improves upon its predecessor, DeepSeek-VL, through …
Wisdom: Improving multimodal sentiment analysis by fusing contextual world knowledge
Multimodal Sentiment Analysis (MSA) focuses on leveraging multimodal signals for
understanding human sentiment. Most of the existing works rely on superficial information …
understanding human sentiment. Most of the existing works rely on superficial information …
Fire: A dataset for feedback integration and refinement evaluation of multimodal models
Vision language models (VLMs) have achieved impressive progress in diverse applications,
becoming a prevalent research direction. In this paper, we build FIRE, a feedback …
becoming a prevalent research direction. In this paper, we build FIRE, a feedback …
MMDocBench: Benchmarking Large Vision-Language Models for Fine-Grained Visual Document Understanding
Large Vision-Language Models (LVLMs) have achieved remarkable performance in many
vision-language tasks, yet their capabilities in fine-grained visual understanding remain …
vision-language tasks, yet their capabilities in fine-grained visual understanding remain …
VisionArena: 230K Real World User-VLM Conversations with Preference Labels
With the growing adoption and capabilities of vision-language models (VLMs) comes the
need for benchmarks that capture authentic user-VLM interactions. In response, we create …
need for benchmarks that capture authentic user-VLM interactions. In response, we create …
Enhancing Perception Capabilities of Multimodal LLMs with Training-free Fusion
Multimodal LLMs (MLLMs) equip language models with visual capabilities by aligning vision
encoders with language models. Existing methods to enhance the visual perception of …
encoders with language models. Existing methods to enhance the visual perception of …
HumanVLM: Foundation for Human-Scene Vision-Language Model
Human-scene vision-language tasks are increasingly prevalent in diverse social
applications, yet recent advancements predominantly rely on models specifically tailored to …
applications, yet recent advancements predominantly rely on models specifically tailored to …
How Does Vision-Language Adaptation Impact the Safety of Vision Language Models?
Vision-Language adaptation (VL adaptation) transforms Large Language Models (LLMs)
into Large Vision-Language Models (LVLMs) for multimodal tasks, but this process often …
into Large Vision-Language Models (LVLMs) for multimodal tasks, but this process often …
UHDF: Hallucination Detection Using Open Source Models Beyond Close Source Models Methods
D Liu, B Xu, Z Zhao, B Xu, M Yang - CCF International Conference on …, 2024 - Springer
With the emergence of multimodal large models, the problem of hallucination has been
plaguing their development and deployment. How to reliably detect the presence of …
plaguing their development and deployment. How to reliably detect the presence of …
Large Language Models: Testing Their Capabilities to Understand and Explain Spatial Concepts (Short Paper)
Abstract Interest in applying Large Language Models (LLMs), which use natural language
processing (NLP) to provide human-like responses to text-based questions, to geospatial …
processing (NLP) to provide human-like responses to text-based questions, to geospatial …