A review of generalized zero-shot learning methods

F Pourpanah, M Abdar, Y Luo, X Zhou… - IEEE transactions on …, 2022‏ - ieeexplore.ieee.org
Generalized zero-shot learning (GZSL) aims to train a model for classifying data samples
under the condition that some output classes are unknown during supervised learning. To …

[HTML][HTML] Scene graph generation: A comprehensive survey

H Li, G Zhu, L Zhang, Y Jiang, Y Dang, H Hou, P Shen… - Neurocomputing, 2024‏ - Elsevier
Deep learning techniques have led to remarkable breakthroughs in the field of object
detection and have spawned a lot of scene-understanding tasks in recent years. Scene …

Panoptic scene graph generation

J Yang, YZ Ang, Z Guo, K Zhou, W Zhang… - European Conference on …, 2022‏ - Springer
Existing research addresses scene graph generation (SGG)—a critical technology for scene
understanding in images—from a detection perspective, ie., objects are detected using …

Teaching structured vision & language concepts to vision & language models

S Doveh, A Arbelle, S Harary… - Proceedings of the …, 2023‏ - openaccess.thecvf.com
Vision and Language (VL) models have demonstrated remarkable zero-shot performance in
a variety of tasks. However, some aspects of complex language understanding still remain a …

Clip-event: Connecting text and images with event structures

M Li, R Xu, S Wang, L Zhou, X Lin… - Proceedings of the …, 2022‏ - openaccess.thecvf.com
Abstract Vision-language (V+ L) pretraining models have achieved great success in
supporting multimedia applications by understanding the alignments between images and …

H2o: Two hands manipulating objects for first person interaction recognition

T Kwon, B Tekin, J Stühmer, F Bogo… - Proceedings of the …, 2021‏ - openaccess.thecvf.com
We present a comprehensive framework for egocentric interaction recognition using
markerless 3D annotations of two hands manipulating objects. To this end, we propose a …

Compositional feature augmentation for unbiased scene graph generation

L Li, G Chen, J **ao, Y Yang… - Proceedings of the …, 2023‏ - openaccess.thecvf.com
Abstract Scene Graph Generation (SGG) aims to detect all the visual relation triplets< sub,
pred, obj> in a given image. With the emergence of various advanced techniques for better …

Drg: Dual relation graph for human-object interaction detection

C Gao, J Xu, Y Zou, JB Huang - … Conference, Glasgow, UK, August 23–28 …, 2020‏ - Springer
We tackle the challenging problem of human-object interaction (HOI) detection. Existing
methods either recognize the interaction of each human-object pair in isolation or perform …

Dense and aligned captions (dac) promote compositional reasoning in vl models

S Doveh, A Arbelle, S Harary… - Advances in …, 2023‏ - proceedings.neurips.cc
Vision and Language (VL) models offer an effective method for aligning representation
spaces of images and text allowing for numerous applications such as cross-modal retrieval …

Composing text and image for image retrieval-an empirical odyssey

N Vo, L Jiang, C Sun, K Murphy, LJ Li… - Proceedings of the …, 2019‏ - openaccess.thecvf.com
In this paper, we study the task of image retrieval, where the input query is specified in the
form of an image plus some text that describes desired modifications to the input image. For …