Google Academic

H Yuan, S Zhang, X Wang, S Albanie… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract Relational Language-Image Pre-training (RLIP) aims to align vision representations
with relational texts, thereby advancing the capability of relational reasoning in computer …

Salvați Citați Citat de 37 ori Articole cu conținut similar Toate cele 6 versiuni Afișare ca HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] thecvf.com

Open-world human-object interaction detection via multi-modal prompts

J Yang, B Li, A Zeng, L Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com

In this paper we develop MP-HOI a powerful Multi-modal Prompt-based HOI detector
designed to leverage both textual descriptions for open-set generalization and visual …

Salvați Citați Citat de 7 ori Articole cu conținut similar Toate cele 7 versiuni Afișare ca HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Scene-graph vit: End-to-end open-vocabulary visual relationship detection

T Salzmann, M Ryll, A Bewley, M Minderer - European Conference on …, 2024 - Springer

Visual relationship detection aims to identify objects and their relationships in images. Prior
methods approach this task by adding separate relationship modules or decoders to existing …

Salvați Citați Citat de 2 ori Articole cu conținut similar Toate cele 6 versiuni

[免费ChatGPT] [DeepSeek可用网址] [PDF] neurips.cc

Towards Flexible Visual Relationship Segmentation

F Zhu, J Yang, H Jiang - Advances in Neural Information …, 2025 - proceedings.neurips.cc

Visual relationship understanding has been studied separately in human-object interaction
(HOI) detection, scene graph generation (SGG), and referring relationships (RR) tasks …

Salvați Citați Citat de 2 ori Articole cu conținut similar Toate cele 5 versiuni Afișare ca HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

From easy to hard: Learning curricular shape-aware features for robust panoptic scene graph generation

H Shi, L Li, J **ao, Y Zhuang, L Chen - International Journal of Computer …, 2024 - Springer

Abstract Panoptic Scene Graph Generation (PSG) aims to generate a comprehensive graph-
structure representation based on panoptic segmentation masks. Despite remarkable …

Salvați Citați Citat de 3 ori Articole cu conținut similar Toate cele 4 versiuni

[免费ChatGPT] [DeepSeek可用网址] [PDF] aaai.org

Toward open-set human object interaction detection

M Wu, Y Liu, J Ji, X Sun, R Ji - Proceedings of the AAAI Conference on …, 2024 - ojs.aaai.org

This work is oriented toward the task of open-set Human Object Interaction (HOI) detection.
The challenge lies in identifying completely new, out-of-domain relationships, as opposed to …

Salvați Citați Citat de 2 ori Articole cu conținut similar Toate cele 3 versiuni Afișare ca HTML

RelationLMM: Large Multimodal Model as Open and Versatile Visual Relationship Generalist

C **e, S Liang, J Li, Z Zhang, F Zhu… - IEEE Transactions on …, 2025 - ieeexplore.ieee.org

Visual relationships are crucial for visual perception and reasoning, and cover tasks like
Scene Graph Generation, Human-Object Interaction, and object affordance. Despite …

Salvați Citați Articole cu conținut similar Toate cele 2 versiuni

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Beyond Embeddings: The Promise of Visual Table in Visual Reasoning

Y Zhong, ZY Hu, MR Lyu, L Wang - arxiv preprint arxiv:2403.18252, 2024 - arxiv.org

Visual representation learning has been a cornerstone in computer vision, involving typical
forms such as visual embeddings, structural symbols, and text-based representations …

Salvați Citați Citat de 1 ori Articole cu conținut similar Toate cele 3 versiuni Afișare ca HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Hydra-sgg: Hybrid relation assignment for one-stage scene graph generation

M Chen, G Chen, W Wang, Y Yang - arxiv preprint arxiv:2409.10262, 2024 - arxiv.org

DETR introduces a simplified one-stage framework for scene graph generation (SGG).
However, DETR-based SGG models face two challenges: i) Sparse supervision, as each …

Salvați Citați Citat de 2 ori Articole cu conținut similar Toate cele 2 versiuni Afișare ca HTML

Adaptive multimodal prompt for human-object interaction with local feature enhanced transformer

K Xue, Y Gao, Z Fang, X Jiang, W Yu, M Chen, C Wu - Applied Intelligence, 2024 - Springer

Human-object interaction (HOI) detection is an important computer vision task for
recognizing the interaction between humans and surrounding objects in an image or video …

Salvați Citați Articole cu conținut similar Toate cele 2 versiuni

Creează alerta

Citați

Căutare avansată

Salvat în Bibliotecă

Unified visual relationship detection with vision and language models

Rlipv2: Fast scaling of relational language-image pre-training

Open-world human-object interaction detection via multi-modal prompts

Scene-graph vit: End-to-end open-vocabulary visual relationship detection

Towards Flexible Visual Relationship Segmentation

From easy to hard: Learning curricular shape-aware features for robust panoptic scene graph generation

Toward open-set human object interaction detection

RelationLMM: Large Multimodal Model as Open and Versatile Visual Relationship Generalist

Beyond Embeddings: The Promise of Visual Table in Visual Reasoning

Hydra-sgg: Hybrid relation assignment for one-stage scene graph generation

Adaptive multimodal prompt for human-object interaction with local feature enhanced transformer