Google Tudós

X Li, H Ding, H Yuan, W Zhang, J Pang… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

Visual segmentation seeks to partition images, video frames, or point clouds into multiple
segments or groups. This technique has numerous real-world applications, such as …

Mentés Hivatkozás Idézetek száma: 121 Kapcsolódó cikkek Mind a(z) 3 változat

[Free GPT-4]

[PDF] arxiv.org

The (r) evolution of multimodal large language models: A survey

D Caffagni, F Cocchi, L Barsellotti, N Moratelli… - arxiv preprint arxiv …, 2024 - arxiv.org

Connecting text and visual modalities plays an essential role in generative intelligence. For
this reason, inspired by the success of large language models, significant research efforts …

Mentés Hivatkozás Idézetek száma: 43 Kapcsolódó cikkek Mind a(z) 4 változat HTML-változat

[Free GPT-4]

[PDF] arxiv.org

Omg-llava: Bridging image-level, object-level, pixel-level reasoning and understanding

T Zhang, X Li, H Fei, H Yuan, S Wu, S Ji… - arxiv preprint arxiv …, 2024 - arxiv.org

Current universal segmentation methods demonstrate strong capabilities in pixel-level
image and video understanding. However, they lack reasoning abilities and cannot be …

Mentés Hivatkozás Idézetek száma: 29 Kapcsolódó cikkek Mind a(z) 4 változat HTML-változat

[Free GPT-4]

[PDF] arxiv.org

Mg-llava: Towards multi-granularity visual instruction tuning

X Zhao, X Li, H Duan, H Huang, Y Li, K Chen… - arxiv preprint arxiv …, 2024 - arxiv.org

Multi-modal large language models (MLLMs) have made significant strides in various visual
understanding tasks. However, the majority of these models are constrained to process low …

Mentés Hivatkozás Idézetek száma: 8 Kapcsolódó cikkek Mind a(z) 3 változat HTML-változat

[Free GPT-4]

[PDF] arxiv.org

Auto cherry-picker: Learning from high-quality generative data driven by language

Y Chen, X Li, Y Li, Y Zeng, J Wu, X Zhao… - arxiv preprint arxiv …, 2024 - arxiv.org

Diffusion-based models have shown great potential in generating high-quality images with
various layouts, which can benefit downstream perception tasks. However, a fully automatic …

Mentés Hivatkozás Idézetek száma: 2 Kapcsolódó cikkek Mind a(z) 3 változat HTML-változat

TSCnet: A text-driven semantic-level controllable framework for customized low-light image enhancement

M Zhang, J Yin, P Zeng, Y Shen, S Lu, X Wang - Neurocomputing, 2025 - Elsevier

Deep learning-based image enhancement methods show significant advantages in
reducing noise and improving visibility in low-light conditions. These methods are typically …

Mentés Hivatkozás Kapcsolódó cikkek

[Free GPT-4]

[PDF] arxiv.org

LLAVADI: What Matters For Multimodal Large Language Models Distillation

S Xu, X Li, H Yuan, L Qi, Y Tong, MH Yang - arxiv preprint arxiv …, 2024 - arxiv.org

The recent surge in Multimodal Large Language Models (MLLMs) has showcased their
remarkable potential for achieving generalized intelligence by integrating visual …

Mentés Hivatkozás Kapcsolódó cikkek Mind a(z) 2 változat HTML-változat

[Free GPT-4]

[PDF] arxiv.org

Visual Large Language Models for Generalized and Specialized Applications

Y Li, Z Lai, W Bao, Z Tan, A Dao, K Sui, J Shen… - arxiv preprint arxiv …, 2025 - arxiv.org

Visual-language models (VLM) have emerged as a powerful tool for learning a unified
embedding space for vision and language. Inspired by large language models, which have …

Mentés Hivatkozás Kapcsolódó cikkek Mind a(z) 2 változat HTML-változat

Értesítés létrehozása

Hivatkozás

Speciális keresés

Mentve a Saját könyvtárba

Generalizable Entity Grounding via Assistance of Large Language Model

Transformer-based visual segmentation: A survey

The (r) evolution of multimodal large language models: A survey

Omg-llava: Bridging image-level, object-level, pixel-level reasoning and understanding

Mg-llava: Towards multi-granularity visual instruction tuning

Auto cherry-picker: Learning from high-quality generative data driven by language

TSCnet: A text-driven semantic-level controllable framework for customized low-light image enhancement

LLAVADI: What Matters For Multimodal Large Language Models Distillation

Visual Large Language Models for Generalized and Specialized Applications