Optimizing Attention with Mirror Descent: Generalized Max-Margin Token Selection
Attention mechanisms have revolutionized several domains of artificial intelligence, such as
natural language processing and computer vision, by enabling models to selectively focus …
natural language processing and computer vision, by enabling models to selectively focus …
Training Dynamics of In-Context Learning in Linear Attention
While attention-based models have demonstrated the remarkable ability of in-context
learning, the theoretical understanding of how these models acquired this ability through …
learning, the theoretical understanding of how these models acquired this ability through …
On the Learn-to-Optimize Capabilities of Transformers in In-Context Sparse Recovery
An intriguing property of the Transformer is its ability to perform in-context learning (ICL),
where the Transformer can solve different inference tasks without parameter updating based …
where the Transformer can solve different inference tasks without parameter updating based …