Академия Google

D Caffagni, F Cocchi, L Barsellotti, N Moratelli… - arxiv preprint arxiv …, 2024 - arxiv.org

Connecting text and visual modalities plays an essential role in generative intelligence. For
this reason, inspired by the success of large language models, significant research efforts …

Сохранить Цитировать Цитируется: 46 Похожие статьи Все версии статьи (9) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Omg-llava: Bridging image-level, object-level, pixel-level reasoning and understanding

T Zhang, X Li, H Fei, H Yuan, S Wu… - Advances in …, 2025 - proceedings.neurips.cc

Current universal segmentation methods demonstrate strong capabilities in pixel-level
image and video understanding. However, they lack reasoning abilities and cannot be …

Сохранить Цитировать Цитируется: 32 Похожие статьи Все версии статьи (5) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Gsva: Generalized segmentation via multimodal large language models

Z **a, D Han, Y Han, X Pan, S Song… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Generalized Referring Expression Segmentation (GRES) extends the scope of
classic RES to refer to multiple objects in one expression or identify the empty targets absent …

Сохранить Цитировать Цитируется: 40 Похожие статьи Все версии статьи (5) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Ferret-v2: An improved baseline for referring and grounding with large language models

H Zhang, H You, P Dufter, B Zhang, C Chen… - arxiv preprint arxiv …, 2024 - arxiv.org

While Ferret seamlessly integrates regional understanding into the Large Language Model
(LLM) to facilitate its referring and grounding capability, it poses certain limitations …

Сохранить Цитировать Цитируется: 7 Похожие статьи Все версии статьи (6) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Spin: Hierarchical segmentation with subpart granularity in natural images

J Myers-Dean, J Reynolds, B Price, Y Fan… - European Conference on …, 2024 - Springer

Hierarchical segmentation entails creating segmentations at varying levels of granularity.
We introduce the first hierarchical semantic segmentation dataset with subpart annotations …

Сохранить Цитировать Цитируется: 1 Похожие статьи Все версии статьи (6)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Lasagna: Language-based segmentation assistant for complex queries

C Wei, H Tan, Y Zhong, Y Yang, L Ma - arxiv preprint arxiv:2404.08506, 2024 - arxiv.org

Recent advancements have empowered Large Language Models for Vision (vLLMs) to
generate detailed perceptual outcomes, including bounding boxes and masks. Nonetheless …

Сохранить Цитировать Цитируется: 4 Похожие статьи Все версии статьи (3) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Selective" Selective Prediction": Reducing Unnecessary Abstention in Vision-Language Reasoning

T Srinivasan, J Hessel, T Gupta, BY Lin, Y Choi… - arxiv preprint arxiv …, 2024 - arxiv.org

Selective prediction minimizes incorrect predictions from vision-language models (VLMs) by
allowing them to abstain from answering when uncertain. However, when deploying a vision …

Сохранить Цитировать Цитируется: 2 Похожие статьи Все версии статьи (5) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Reasoning to Attend: Try to Understand How< SEG> Token Works

R Qian, X Yin, D Dou - arxiv preprint arxiv:2412.17741, 2024 - arxiv.org

Current Large Multimodal Models (LMMs) empowered visual grounding typically rely on
$\texttt {< SEG>} $ token as a text prompt to jointly optimize the vision-language model (eg …

Сохранить Цитировать Цитируется: 1 Похожие статьи Все версии статьи (3) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

EditScout: Locating Forged Regions from Diffusion-based Edited Images with Multimodal LLM

Q Nguyen, T Vu, TT Nguyen, Y Wen… - arxiv preprint arxiv …, 2024 - arxiv.org

Image editing technologies are tools used to transform, adjust, remove, or otherwise alter
images. Recent research has significantly improved the capabilities of image editing tools …

Сохранить Цитировать Цитируется: 1 Похожие статьи Все версии статьи (2) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

SegLLM: Multi-round Reasoning Segmentation

XD Wang, S Zhang, S Li, K Kallidromitis, K Li… - arxiv preprint arxiv …, 2024 - arxiv.org

We present SegLLM, a novel multi-round interactive reasoning segmentation model that
enhances LLM-based segmentation by exploiting conversational memory of both visual and …

Сохранить Цитировать Цитируется: 1 Похожие статьи Все версии статьи (2) В виде HTML

Создать оповещение

Цитировать

Расширенный поиск

Сохранено в вашей библиотеке

See say and segment: Teaching lmms to overcome false premises

The revolution of multimodal large language models: a survey

Omg-llava: Bridging image-level, object-level, pixel-level reasoning and understanding

Gsva: Generalized segmentation via multimodal large language models

Ferret-v2: An improved baseline for referring and grounding with large language models

Spin: Hierarchical segmentation with subpart granularity in natural images

Lasagna: Language-based segmentation assistant for complex queries

Selective" Selective Prediction": Reducing Unnecessary Abstention in Vision-Language Reasoning

Reasoning to Attend: Try to Understand How< SEG> Token Works

EditScout: Locating Forged Regions from Diffusion-based Edited Images with Multimodal LLM

SegLLM: Multi-round Reasoning Segmentation