- Academic Search

D Yang, J Ji, X Sun, H Wang, Y Li, Y Ma… - Proceedings of the 31st …, 2023 - dl.acm.org

Despite considerable progress, the advancement of Panoptic Narrative Grounding (PNG)
remains hindered by costly annotations. In this paper, we introduce a novel Semi …

Speichern Zitieren Zitiert von: 8 Ähnliche Artikel Alle 3 Versionen

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Ppmn: Pixel-phrase matching network for one-stage panoptic narrative grounding

Z Ding, Z Ding, T Hui, J Huang, X Wei, X Wei… - Proceedings of the 30th …, 2022 - dl.acm.org

Panoptic Narrative Grounding (PNG) is an emerging task whose goal is to segment visual
objects of things and stuff categories described by dense narrative captions of a still image …

Speichern Zitieren Zitiert von: 13 Ähnliche Artikel Alle 8 Versionen

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Improving Panoptic Narrative Grounding by Harnessing Semantic Relationships and Visual Confirmation

T Guo, H Wang, Y Ma, J Ji, X Sun - … of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org

Recent advancements in single-stage Panoptic Narrative Grounding (PNG) have
demonstrated significant potential. These methods predict pixel-level masks by directly …

Speichern Zitieren Zitiert von: 3 Ähnliche Artikel HTML-Version

Integrating Neural-Symbolic Reasoning With Variational Causal Inference Network for Explanatory Visual Question Answering

D Xue, S Qian, C Xu - IEEE Transactions on Pattern Analysis …, 2024 - ieeexplore.ieee.org

Recently, a novel multimodal reasoning task named Explanatory Visual Question Answering
(EVQA) has been introduced, which combines answering visual questions with multimodal …

Speichern Zitieren Zitiert von: 6 Ähnliche Artikel Alle 4 Versionen

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

HumanFormer: Human-centric Prompting Multi-modal Perception Transformer for Referring Crowd Detection

H Qiu, L Wang, T Zhao, F Meng… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

As an important step towards crowd understanding referring crowd detection (RCD) aims to
locate the person in human crowded environments described by a natural language …

Speichern Zitieren Zitiert von: 1 Ähnliche Artikel HTML-Version

Graph-based referring expression comprehension with expression-guided selective filtering and noun-oriented reasoning

J Ke, Q Zhang, J Wang, H Ding, P Zhang, J Wen - Pattern Recognition, 2025 - Elsevier

The objective of referring expression comprehension (REC) is to find the common feature
domain between language expressions and visual objects. Due to the complex nature of …

Speichern Zitieren Ähnliche Artikel Alle 2 Versionen

[Free GPT-4]
[DeepSeek]

[PDF] researchgate.net

[PDF][PDF] A survey on interpretable cross-modal reasoning

D Xue, S Qian, Z Zhou, C Xu - arxiv preprint arxiv:2309.01955, 2023 - researchgate.net

Authors' addresses: Dizhan Xue, xuedizhan17@ mails. ucas. ac. cn; Shengsheng Qian,
shengsheng. qian@ nlpr. ia. ac. cn; Zuyi Zhou, zhouzuyi2023@ ia. ac. cn, MAIS, Institute of …

Speichern Zitieren Zitiert von: 5 Ähnliche Artikel Alle 3 Versionen HTML-Version

Universal Relocalizer for Weakly Supervised Referring Expression Grounding

P Zhang, M Liu, X Song, D Cao, Z Gao… - ACM Transactions on …, 2024 - dl.acm.org

This article introduces the Universal Relocalizer, a novel approach designed for weakly
supervised referring expression grounding. Our method strives to pinpoint a target proposal …

Speichern Zitieren Zitiert von: 1 Ähnliche Artikel

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

RefCrowd: Grounding the target in crowd with referring expressions

H Qiu, H Li, T Zhao, L Wang, Q Wu… - Proceedings of the 30th …, 2022 - dl.acm.org

Crowd understanding has aroused the widespread interest in vision domain due to its
important practical significance. Unfortunately, there is no effort to explore crowd …

Speichern Zitieren Zitiert von: 6 Ähnliche Artikel Alle 3 Versionen

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Linking people across text and images based on social relation reasoning

Y Lei, P Zhao, P Li, Y Cai, Q Huang - … of the AAAI Conference on Artificial …, 2023 - ojs.aaai.org

As a sub-task of visual grounding, linking people across text and images aims to localize
target people in images with corresponding sentences. Existing approaches tend to capture …

Speichern Zitieren Ähnliche Artikel Alle 2 Versionen HTML-Version

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

Exploring logical reasoning for referring expression comprehension

Semi-supervised panoptic narrative grounding

Ppmn: Pixel-phrase matching network for one-stage panoptic narrative grounding

Improving Panoptic Narrative Grounding by Harnessing Semantic Relationships and Visual Confirmation

Integrating Neural-Symbolic Reasoning With Variational Causal Inference Network for Explanatory Visual Question Answering

HumanFormer: Human-centric Prompting Multi-modal Perception Transformer for Referring Crowd Detection

Graph-based referring expression comprehension with expression-guided selective filtering and noun-oriented reasoning

[PDF][PDF] A survey on interpretable cross-modal reasoning

Universal Relocalizer for Weakly Supervised Referring Expression Grounding

RefCrowd: Grounding the target in crowd with referring expressions

Linking people across text and images based on social relation reasoning