Transformers in reinforcement learning: a survey
P Agarwal, AA Rahman, PL St-Charles… - ar** machines better understand …
Real-time 3D single object tracking with transformer
LiDAR-based 3D single object tracking is a challenging issue in robotics and autonomous
driving. Currently, existing approaches usually suffer from the problem that objects at long …
driving. Currently, existing approaches usually suffer from the problem that objects at long …
Local self-attention in transformer for visual question answering
X Shen, D Han, Z Guo, C Chen, J Hua, G Luo - Applied Intelligence, 2023 - Springer
Abstract Visual Question Answering (VQA) is a multimodal task that requires models to
understand both textual and visual information. Various VQA models have applied the …
understand both textual and visual information. Various VQA models have applied the …
Global visual feature and linguistic state guided attention for remote sensing image captioning
Z Zhang, W Zhang, M Yan, X Gao… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
The encoder–decoder framework is prevalent in existing remote-sensing image captioning
(RSIC) models. The appearance of attention mechanisms brings significant results …
(RSIC) models. The appearance of attention mechanisms brings significant results …
Test-time model adaptation for visual question answering with debiased self-supervisions
Visual question answering (VQA) is a prevalent task in real-world, and plays an essential
role in hel** the blind understand the physical world. However, due to the real-world …
role in hel** the blind understand the physical world. However, due to the real-world …
Visual question answering model based on graph neural network and contextual attention
Abstract Visual Question Answering (VQA) has recently appeared as a hot research area in
the field of computer vision and natural language processing. A VQA model uses both image …
the field of computer vision and natural language processing. A VQA model uses both image …