Semi-supervised panoptic narrative grounding

D Yang, J Ji, X Sun, H Wang, Y Li, Y Ma… - Proceedings of the 31st …, 2023 - dl.acm.org
Despite considerable progress, the advancement of Panoptic Narrative Grounding (PNG)
remains hindered by costly annotations. In this paper, we introduce a novel Semi …

Ppmn: Pixel-phrase matching network for one-stage panoptic narrative grounding

Z Ding, Z Ding, T Hui, J Huang, X Wei, X Wei… - Proceedings of the 30th …, 2022 - dl.acm.org
Panoptic Narrative Grounding (PNG) is an emerging task whose goal is to segment visual
objects of things and stuff categories described by dense narrative captions of a still image …

Improving Panoptic Narrative Grounding by Harnessing Semantic Relationships and Visual Confirmation

T Guo, H Wang, Y Ma, J Ji, X Sun - … of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org
Recent advancements in single-stage Panoptic Narrative Grounding (PNG) have
demonstrated significant potential. These methods predict pixel-level masks by directly …

Integrating Neural-Symbolic Reasoning With Variational Causal Inference Network for Explanatory Visual Question Answering

D Xue, S Qian, C Xu - IEEE Transactions on Pattern Analysis …, 2024 - ieeexplore.ieee.org
Recently, a novel multimodal reasoning task named Explanatory Visual Question Answering
(EVQA) has been introduced, which combines answering visual questions with multimodal …

HumanFormer: Human-centric Prompting Multi-modal Perception Transformer for Referring Crowd Detection

H Qiu, L Wang, T Zhao, F Meng… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
As an important step towards crowd understanding referring crowd detection (RCD) aims to
locate the person in human crowded environments described by a natural language …

Graph-based referring expression comprehension with expression-guided selective filtering and noun-oriented reasoning

J Ke, Q Zhang, J Wang, H Ding, P Zhang, J Wen - Pattern Recognition, 2025 - Elsevier
The objective of referring expression comprehension (REC) is to find the common feature
domain between language expressions and visual objects. Due to the complex nature of …

[PDF][PDF] A survey on interpretable cross-modal reasoning

D Xue, S Qian, Z Zhou, C Xu - arxiv preprint arxiv:2309.01955, 2023 - researchgate.net
Authors' addresses: Dizhan Xue, xuedizhan17@ mails. ucas. ac. cn; Shengsheng Qian,
shengsheng. qian@ nlpr. ia. ac. cn; Zuyi Zhou, zhouzuyi2023@ ia. ac. cn, MAIS, Institute of …

Universal Relocalizer for Weakly Supervised Referring Expression Grounding

P Zhang, M Liu, X Song, D Cao, Z Gao… - ACM Transactions on …, 2024 - dl.acm.org
This article introduces the Universal Relocalizer, a novel approach designed for weakly
supervised referring expression grounding. Our method strives to pinpoint a target proposal …

RefCrowd: Grounding the target in crowd with referring expressions

H Qiu, H Li, T Zhao, L Wang, Q Wu… - Proceedings of the 30th …, 2022 - dl.acm.org
Crowd understanding has aroused the widespread interest in vision domain due to its
important practical significance. Unfortunately, there is no effort to explore crowd …

Linking people across text and images based on social relation reasoning

Y Lei, P Zhao, P Li, Y Cai, Q Huang - … of the AAAI Conference on Artificial …, 2023 - ojs.aaai.org
As a sub-task of visual grounding, linking people across text and images aims to localize
target people in images with corresponding sentences. Existing approaches tend to capture …