From show to tell: A survey on deep learning-based image captioning
Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …
reason, large research efforts have been devoted to image captioning, ie describing images …
A comprehensive survey of scene graphs: Generation and application
Scene graph is a structured representation of a scene that can clearly express the objects,
attributes, and relationships between objects in the scene. As computer vision technology …
attributes, and relationships between objects in the scene. As computer vision technology …
Training-free structured diffusion guidance for compositional text-to-image synthesis
Large-scale diffusion models have achieved state-of-the-art results on text-to-image
synthesis (T2I) tasks. Despite their ability to generate high-quality yet creative images, we …
synthesis (T2I) tasks. Despite their ability to generate high-quality yet creative images, we …
Stacked hybrid-attention and group collaborative learning for unbiased scene graph generation
Abstract Scene Graph Generation, which generally follows a regular encoder-decoder
pipeline, aims to first encode the visual contents within the given image and then parse them …
pipeline, aims to first encode the visual contents within the given image and then parse them …
A survey on graph neural networks and graph transformers in computer vision: A task-oriented perspective
Graph Neural Networks (GNNs) have gained momentum in graph representation learning
and boosted the state of the art in a variety of areas, such as data mining (eg, social network …
and boosted the state of the art in a variety of areas, such as data mining (eg, social network …
Devil's on the edges: Selective quad attention for scene graph generation
Scene graph generation aims to construct a semantic graph structure from an image such
that its nodes and edges respectively represent objects and their relationships. One of the …
that its nodes and edges respectively represent objects and their relationships. One of the …
Ppdl: Predicate probability distribution based loss for unbiased scene graph generation
Abstract Scene Graph Generation (SGG) has attracted more and more attention from visual
researchers in recent years, since Scene Graph (SG) is valuable in many downstream tasks …
researchers in recent years, since Scene Graph (SG) is valuable in many downstream tasks …
Learning to generate scene graph from natural language supervision
Learning from image-text data has demonstrated recent success for many recognition tasks,
yet is currently limited to visual features or individual visual concepts such as objects. In this …
yet is currently limited to visual features or individual visual concepts such as objects. In this …
Instance relation graph guided source-free domain adaptive object detection
Abstract Unsupervised Domain Adaptation (UDA) is an effective approach to tackle the issue
of domain shift. Specifically, UDA methods try to align the source and target representations …
of domain shift. Specifically, UDA methods try to align the source and target representations …
Learning to generate language-supervised and open-vocabulary scene graph using pre-trained visual-semantic space
Scene graph generation (SGG) aims to abstract an image into a graph structure, by
representing objects as graph nodes and their relations as labeled edges. However, two …
representing objects as graph nodes and their relations as labeled edges. However, two …