VLT: Vision-language transformer and query generation for referring segmentation

H Ding, C Liu, S Wang, X Jiang - IEEE Transactions on Pattern …, 2022 - ieeexplore.ieee.org
We propose a Vision-Language Transformer (VLT) framework for referring segmentation to
facilitate deep interactions among multi-modal information and enhance the holistic …

Panoptic scene graph generation

J Yang, YZ Ang, Z Guo, K Zhou, W Zhang… - European Conference on …, 2022 - Springer
Existing research addresses scene graph generation (SGG)—a critical technology for scene
understanding in images—from a detection perspective, ie., objects are detected using …

Re-mine, learn and reason: Exploring the cross-modal semantic correlations for language-guided hoi detection

Y Cao, Q Tang, F Yang, X Su, S You… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Human-Object Interaction (HOI) detection is a challenging computer vision task that
requires visual models to address the complex interactive relationship between humans and …

Detecting any human-object interaction relationship: Universal hoi detector with spatial prompt learning on foundation models

Y Cao, Q Tang, X Su, S Chen, S You… - Advances in Neural …, 2023 - proceedings.neurips.cc
Human-object interaction (HOI) detection aims to comprehend the intricate relationships
between humans and objects, predicting triplets, and serving as the foundation for …

Taco: Benchmarking generalizable bimanual tool-action-object understanding

Y Liu, H Yang, X Si, L Liu, Z Li… - Proceedings of the …, 2024 - openaccess.thecvf.com
Humans commonly work with multiple objects in daily life and can intuitively transfer
manipulation skills to novel objects by understanding object functional regularities. However …

Self-regularized prototypical network for few-shot semantic segmentation

H Ding, H Zhang, X Jiang - Pattern Recognition, 2023 - Elsevier
The deep CNNs in image semantic segmentation typically require a large number of
densely-annotated images for training and have difficulties in generalizing to unseen object …

Efficient adaptive human-object interaction detection with concept-guided memory

T Lei, F Caba, Q Chen, H **… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Human Object Interaction (HOI) detection aims to localize and infer the
relationships between a human and an object. Arguably, training supervised models for this …

KD-DLGAN: Data limited image generation via knowledge distillation

K Cui, Y Yu, F Zhan, S Liao, S Lu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Generative Adversarial Networks (GANs) rely heavily on large-scale training data
for training high-quality image generation models. With limited training data, the GAN …

Neural-logic human-object interaction detection

L Li, J Wei, W Wang, Y Yang - Advances in Neural …, 2024 - proceedings.neurips.cc
The interaction decoder utilized in prevalent Transformer-based HOI detectors typically
accepts pre-composed human-object pairs as inputs. Though achieving remarkable …

Exploring conditional multi-modal prompts for zero-shot hoi detection

T Lei, S Yin, Y Peng, Y Liu - European Conference on Computer Vision, 2024 - Springer
Abstract Zero-shot Human-Object Interaction (HOI) detection has emerged as a frontier topic
due to its capability to detect HOIs beyond a predefined set of categories. This task entails …