- Academic Search

P Agarwal, AA Rahman, PL St-Charles… - ar** machines better understand …

Spara Citera Citerat av 91 Relaterade artiklar Alla 3 versionerna

Real-time 3D single object tracking with transformer

J Shan, S Zhou, Y Cui, Z Fang - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

LiDAR-based 3D single object tracking is a challenging issue in robotics and autonomous
driving. Currently, existing approaches usually suffer from the problem that objects at long …

Spara Citera Citerat av 50 Relaterade artiklar Alla 4 versionerna

Local self-attention in transformer for visual question answering

X Shen, D Han, Z Guo, C Chen, J Hua, G Luo - Applied Intelligence, 2023 - Springer

Abstract Visual Question Answering (VQA) is a multimodal task that requires models to
understand both textual and visual information. Various VQA models have applied the …

Spara Citera Citerat av 39 Relaterade artiklar Alla 4 versionerna

Global visual feature and linguistic state guided attention for remote sensing image captioning

Z Zhang, W Zhang, M Yan, X Gao… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

The encoder–decoder framework is prevalent in existing remote-sensing image captioning
(RSIC) models. The appearance of attention mechanisms brings significant results …

Spara Citera Citerat av 66 Relaterade artiklar Alla 2 versionerna

Test-time model adaptation for visual question answering with debiased self-supervisions

Z Wen, S Niu, G Li, Q Wu, M Tan… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Visual question answering (VQA) is a prevalent task in real-world, and plays an essential
role in hel** the blind understand the physical world. However, due to the real-world …

Spara Citera Citerat av 18 Relaterade artiklar Alla 2 versionerna

Visual question answering model based on graph neural network and contextual attention

H Sharma, AS Jalal - Image and Vision Computing, 2021 - Elsevier

Abstract Visual Question Answering (VQA) has recently appeared as a hot research area in
the field of computer vision and natural language processing. A VQA model uses both image …

Spara Citera Citerat av 58 Relaterade artiklar Alla 2 versionerna

Skapa alarm

Citera

Avancerad sökning

Har sparats i Mitt bibliotek

Self-adaptive neural module transformer for visual question answering

Transformers in reinforcement learning: a survey

Real-time 3D single object tracking with transformer

Local self-attention in transformer for visual question answering

Global visual feature and linguistic state guided attention for remote sensing image captioning

Test-time model adaptation for visual question answering with debiased self-supervisions

Visual question answering model based on graph neural network and contextual attention