Transformer-based visual segmentation: A survey
Visual segmentation seeks to partition images, video frames, or point clouds into multiple
segments or groups. This technique has numerous real-world applications, such as …
segments or groups. This technique has numerous real-world applications, such as …
The (r) evolution of multimodal large language models: A survey
Connecting text and visual modalities plays an essential role in generative intelligence. For
this reason, inspired by the success of large language models, significant research efforts …
this reason, inspired by the success of large language models, significant research efforts …
Omg-llava: Bridging image-level, object-level, pixel-level reasoning and understanding
Current universal segmentation methods demonstrate strong capabilities in pixel-level
image and video understanding. However, they lack reasoning abilities and cannot be …
image and video understanding. However, they lack reasoning abilities and cannot be …
Mg-llava: Towards multi-granularity visual instruction tuning
Multi-modal large language models (MLLMs) have made significant strides in various visual
understanding tasks. However, the majority of these models are constrained to process low …
understanding tasks. However, the majority of these models are constrained to process low …
Auto cherry-picker: Learning from high-quality generative data driven by language
Diffusion-based models have shown great potential in generating high-quality images with
various layouts, which can benefit downstream perception tasks. However, a fully automatic …
various layouts, which can benefit downstream perception tasks. However, a fully automatic …
TSCnet: A text-driven semantic-level controllable framework for customized low-light image enhancement
Deep learning-based image enhancement methods show significant advantages in
reducing noise and improving visibility in low-light conditions. These methods are typically …
reducing noise and improving visibility in low-light conditions. These methods are typically …
LLAVADI: What Matters For Multimodal Large Language Models Distillation
The recent surge in Multimodal Large Language Models (MLLMs) has showcased their
remarkable potential for achieving generalized intelligence by integrating visual …
remarkable potential for achieving generalized intelligence by integrating visual …
Visual Large Language Models for Generalized and Specialized Applications
Visual-language models (VLM) have emerged as a powerful tool for learning a unified
embedding space for vision and language. Inspired by large language models, which have …
embedding space for vision and language. Inspired by large language models, which have …