Multimodal learning with graphs

Y Ektefaie, G Dasoulas, A Noori, M Farhat… - Nature Machine …, 2023 - nature.com
Artificial intelligence for graphs has achieved remarkable success in modelling complex
systems, ranging from dynamic networks in biology to interacting particle systems in physics …

Cris: Clip-driven referring image segmentation

Z Wang, Y Lu, Q Li, X Tao, Y Guo… - Proceedings of the …, 2022 - openaccess.thecvf.com
Referring image segmentation aims to segment a referent via a natural linguistic expression.
Due to the distinct data properties between text and image, it is challenging for a network to …

Segvit: Semantic segmentation with plain vision transformers

B Zhang, Z Tian, Q Tang, X Chu… - Advances in Neural …, 2022 - proceedings.neurips.cc
We explore the capability of plain Vision Transformers (ViTs) for semantic segmentation and
propose the SegViT. Previous ViT-based segmentation networks usually learn a pixel-level …

Cgnet: A light-weight context guided network for semantic segmentation

T Wu, S Tang, R Zhang, J Cao… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
The demand of applying semantic segmentation model on mobile devices has been
increasing rapidly. Current state-of-the-art networks have enormous amount of parameters …

Cigar: Cross-modality graph reasoning for domain adaptive object detection

Y Liu, J Wang, C Huang, Y Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Unsupervised domain adaptive object detection (UDA-OD) aims to learn a detector by
generalizing knowledge from a labeled source domain to an unlabeled target domain …

Segvit v2: Exploring efficient and continual semantic segmentation with plain vision transformers

B Zhang, L Liu, MH Phan, Z Tian, C Shen… - International Journal of …, 2024 - Springer
This paper investigates the capability of plain Vision Transformers (ViTs) for semantic
segmentation using the encoder–decoder framework and introduce SegViTv2. In this study …

Exploiting edge-oriented reasoning for 3d point-based scene graph analysis

C Zhang, J Yu, Y Song, W Cai - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
Scene understanding is a critical problem in computer vision. In this paper, we propose a 3D
point-based scene graph generation (SGGpoint) framework to effectively bridge perception …

Structtoken: Rethinking semantic segmentation with structural prior

F Lin, Z Liang, S Wu, J He, K Chen… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
In previous deep-learning-based methods, semantic segmentation has been regarded as a
static or dynamic per-pixel classification task, ie, classify each pixel representation to a …

An Overview of Text-based Person Search: Recent Advances and Future Directions

K Niu, Y Liu, Y Long, Y Huang, L Wang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Due to the practical significance in smart video surveillance systems, Text-Based Person
Search (TBPS) has been one of the research hotspots recently, which refers to searching for …

Fully transformer networks for semantic image segmentation

S Wu, T Wu, F Lin, S Tian, G Guo - arxiv preprint arxiv:2106.04108, 2021 - arxiv.org
Transformers have shown impressive performance in various natural language processing
and computer vision tasks, due to the capability of modeling long-range dependencies …