Visual attention methods in deep learning: An in-depth survey

M Hassanin, S Anwar, I Radwan, FS Khan, A Mian - Information Fusion, 2024 - Elsevier
Inspired by the human cognitive system, attention is a mechanism that imitates the human
cognitive awareness about specific information, amplifying critical details to focus more on …

Training dynamics of multi-head softmax attention for in-context learning: Emergence, convergence, and optimality

S Chen, H Sheen, T Wang, Z Yang - arxiv preprint arxiv:2402.19442, 2024 - arxiv.org
We study the dynamics of gradient flow for training a multi-head softmax attention model for
in-context learning of multi-task linear regression. We establish the global convergence of …

Few-shot named entity recognition: An empirical baseline study

J Huang, C Li, K Subudhi, D Jose… - Proceedings of the …, 2021 - aclanthology.org
This paper presents an empirical study to efficiently build named entity recognition (NER)
systems when a small amount of in-domain labeled data is available. Based upon recent …

Code structure–guided transformer for source code summarization

S Gao, C Gao, Y He, J Zeng, L Nie, X **a… - ACM Transactions on …, 2023 - dl.acm.org
Code summaries help developers comprehend programs and reduce their time to infer the
program functionalities during software maintenance. Recent efforts resort to deep learning …

Few-shot named entity recognition: A comprehensive study

J Huang, C Li, K Subudhi, D Jose… - arxiv preprint arxiv …, 2020 - arxiv.org
This paper presents a comprehensive study to efficiently build named entity recognition
(NER) systems when a small number of in-domain labeled data is available. Based upon …

Unraveling attention via convex duality: Analysis and interpretations of vision transformers

A Sahiner, T Ergen, B Ozturkler… - International …, 2022 - proceedings.mlr.press
Vision transformers using self-attention or its proposed alternatives have demonstrated
promising results in many image related tasks. However, the underpinning inductive bias of …

Combining external-latent attention for medical image segmentation

E Song, B Zhan, H Liu - Neural Networks, 2024 - Elsevier
The attention mechanism comes as a new entry point for improving the performance of
medical image segmentation. How to reasonably assign weights is a key element of the …

Balancing speciality and versatility: a coarse to fine framework for supervised fine-tuning large language model

H Zhang, Y Wu, D Li, S Yang, R Zhao, Y Jiang… - arxiv preprint arxiv …, 2024 - arxiv.org
Aligned Large Language Models (LLMs) showcase remarkable versatility, capable of
handling diverse real-world tasks. Meanwhile, aligned LLMs are also expected to exhibit …

Superiority of multi-head attention in in-context linear regression

Y Cui, J Ren, P He, J Tang, Y **ng - arxiv preprint arxiv:2401.17426, 2024 - arxiv.org
We present a theoretical analysis of the performance of transformer with softmax attention in
in-context learning with linear regression tasks. While the existing literature predominantly …

Exploring predictive uncertainty and calibration in NLP: A study on the impact of method & data scarcity

D Ulmer, J Frellsen, C Hardmeier - arxiv preprint arxiv:2210.15452, 2022 - arxiv.org
We investigate the problem of determining the predictive confidence (or, conversely,
uncertainty) of a neural classifier through the lens of low-resource languages. By training …