Exploring predicate visual context in detecting of human-object interactions
Recently, the DETR framework has emerged as the dominant approach for human--object
interaction (HOI) research. In particular, two-stage transformer-based HOI detectors are …
interaction (HOI) research. In particular, two-stage transformer-based HOI detectors are …
Re-mine, learn and reason: Exploring the cross-modal semantic correlations for language-guided hoi detection
Abstract Human-Object Interaction (HOI) detection is a challenging computer vision task that
requires visual models to address the complex interactive relationship between humans and …
requires visual models to address the complex interactive relationship between humans and …
Detecting any human-object interaction relationship: Universal hoi detector with spatial prompt learning on foundation models
Y Cao, Q Tang, X Su, S Chen, S You… - Advances in Neural …, 2023 - proceedings.neurips.cc
Human-object interaction (HOI) detection aims to comprehend the intricate relationships
between humans and objects, predicting triplets, and serving as the foundation for …
between humans and objects, predicting triplets, and serving as the foundation for …
Viplo: Vision transformer based pose-conditioned self-loop graph for human-object interaction detection
Abstract Human-Object Interaction (HOI) detection, which localizes and infers relationships
between human and objects, plays an important role in scene understanding. Although two …
between human and objects, plays an important role in scene understanding. Although two …
Agglomerative transformer for human-object interaction detection
We propose an agglomerative Transformer (AGER) that enables Transformer-based human-
object interaction (HOI) detectors to flexibly exploit extra instance-level cues in a single …
object interaction (HOI) detectors to flexibly exploit extra instance-level cues in a single …
Efficient adaptive human-object interaction detection with concept-guided memory
Abstract Human Object Interaction (HOI) detection aims to localize and infer the
relationships between a human and an object. Arguably, training supervised models for this …
relationships between a human and an object. Arguably, training supervised models for this …
Stmixer: A one-stage sparse action detector
Traditional video action detectors typically adopt the two-stage pipeline, where a person
detector is first employed to yield actor boxes and then 3D RoIAlign is used to extract actor …
detector is first employed to yield actor boxes and then 3D RoIAlign is used to extract actor …
Category query learning for human-object interaction classification
Unlike most previous HOI methods that focus on learning better human-object features, we
propose a novel and complementary approach called category query learning. Such queries …
propose a novel and complementary approach called category query learning. Such queries …
Crt-6d: Fast 6d object pose estimation with cascaded refinement transformers
Learning based 6D object pose estimation methods rely on computing large intermediate
pose representations and/or iteratively refining an initial estimation with a slow render …
pose representations and/or iteratively refining an initial estimation with a slow render …
Neural-logic human-object interaction detection
The interaction decoder utilized in prevalent Transformer-based HOI detectors typically
accepts pre-composed human-object pairs as inputs. Though achieving remarkable …
accepts pre-composed human-object pairs as inputs. Though achieving remarkable …