Real-time 3D single object tracking with transformer

J Shan, S Zhou, Y Cui, Z Fang - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
LiDAR-based 3D single object tracking is a challenging issue in robotics and autonomous
driving. Currently, existing approaches usually suffer from the problem that objects at long …

Local self-attention in transformer for visual question answering

X Shen, D Han, Z Guo, C Chen, J Hua, G Luo - Applied Intelligence, 2023 - Springer
Abstract Visual Question Answering (VQA) is a multimodal task that requires models to
understand both textual and visual information. Various VQA models have applied the …

Global visual feature and linguistic state guided attention for remote sensing image captioning

Z Zhang, W Zhang, M Yan, X Gao… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
The encoder–decoder framework is prevalent in existing remote-sensing image captioning
(RSIC) models. The appearance of attention mechanisms brings significant results …

Test-time model adaptation for visual question answering with debiased self-supervisions

Z Wen, S Niu, G Li, Q Wu, M Tan… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Visual question answering (VQA) is a prevalent task in real-world, and plays an essential
role in hel** the blind understand the physical world. However, due to the real-world …

Visual question answering model based on graph neural network and contextual attention

H Sharma, AS Jalal - Image and Vision Computing, 2021 - Elsevier
Abstract Visual Question Answering (VQA) has recently appeared as a hot research area in
the field of computer vision and natural language processing. A VQA model uses both image …