VLT: Vision-language transformer and query generation for referring segmentation
We propose a Vision-Language Transformer (VLT) framework for referring segmentation to
facilitate deep interactions among multi-modal information and enhance the holistic …
facilitate deep interactions among multi-modal information and enhance the holistic …
Panoptic scene graph generation
Existing research addresses scene graph generation (SGG)—a critical technology for scene
understanding in images—from a detection perspective, ie., objects are detected using …
understanding in images—from a detection perspective, ie., objects are detected using …
Re-mine, learn and reason: Exploring the cross-modal semantic correlations for language-guided hoi detection
Abstract Human-Object Interaction (HOI) detection is a challenging computer vision task that
requires visual models to address the complex interactive relationship between humans and …
requires visual models to address the complex interactive relationship between humans and …
Detecting any human-object interaction relationship: Universal hoi detector with spatial prompt learning on foundation models
Y Cao, Q Tang, X Su, S Chen, S You… - Advances in Neural …, 2023 - proceedings.neurips.cc
Human-object interaction (HOI) detection aims to comprehend the intricate relationships
between humans and objects, predicting triplets, and serving as the foundation for …
between humans and objects, predicting triplets, and serving as the foundation for …
Taco: Benchmarking generalizable bimanual tool-action-object understanding
Humans commonly work with multiple objects in daily life and can intuitively transfer
manipulation skills to novel objects by understanding object functional regularities. However …
manipulation skills to novel objects by understanding object functional regularities. However …
Self-regularized prototypical network for few-shot semantic segmentation
The deep CNNs in image semantic segmentation typically require a large number of
densely-annotated images for training and have difficulties in generalizing to unseen object …
densely-annotated images for training and have difficulties in generalizing to unseen object …
Efficient adaptive human-object interaction detection with concept-guided memory
Abstract Human Object Interaction (HOI) detection aims to localize and infer the
relationships between a human and an object. Arguably, training supervised models for this …
relationships between a human and an object. Arguably, training supervised models for this …
KD-DLGAN: Data limited image generation via knowledge distillation
Abstract Generative Adversarial Networks (GANs) rely heavily on large-scale training data
for training high-quality image generation models. With limited training data, the GAN …
for training high-quality image generation models. With limited training data, the GAN …
Neural-logic human-object interaction detection
The interaction decoder utilized in prevalent Transformer-based HOI detectors typically
accepts pre-composed human-object pairs as inputs. Though achieving remarkable …
accepts pre-composed human-object pairs as inputs. Though achieving remarkable …
Exploring conditional multi-modal prompts for zero-shot hoi detection
Abstract Zero-shot Human-Object Interaction (HOI) detection has emerged as a frontier topic
due to its capability to detect HOIs beyond a predefined set of categories. This task entails …
due to its capability to detect HOIs beyond a predefined set of categories. This task entails …