Transformers in vision: A survey

S Khan, M Naseer, M Hayat, SW Zamir… - ACM computing …, 2022 - dl.acm.org
Astounding results from Transformer models on natural language tasks have intrigued the
vision community to study their application to computer vision problems. Among their salient …

Vector symbolic architectures as a computing framework for emerging hardware

D Kleyko, M Davies, EP Frady, P Kanerva… - Proceedings of the …, 2022 - ieeexplore.ieee.org
This article reviews recent progress in the development of the computing framework vector
symbolic architectures (VSA)(also known as hyperdimensional computing). This framework …

Towards revealing the mystery behind chain of thought: a theoretical perspective

G Feng, B Zhang, Y Gu, H Ye, D He… - Advances in Neural …, 2023 - proceedings.neurips.cc
Recent studies have discovered that Chain-of-Thought prompting (CoT) can dramatically
improve the performance of Large Language Models (LLMs), particularly when dealing with …

Transformers as statisticians: Provable in-context learning with in-context algorithm selection

Y Bai, F Chen, H Wang, C **ong… - Advances in neural …, 2023 - proceedings.neurips.cc
Neural sequence models based on the transformer architecture have demonstrated
remarkable\emph {in-context learning}(ICL) abilities, where they can perform new tasks …

Trained transformers learn linear models in-context

R Zhang, S Frei, PL Bartlett - Journal of Machine Learning Research, 2024 - jmlr.org
Attention-based neural networks such as transformers have demonstrated a remarkable
ability to exhibit in-context learning (ICL): Given a short prompt sequence of tokens from an …

What can transformers learn in-context? a case study of simple function classes

S Garg, D Tsipras, PS Liang… - Advances in Neural …, 2022 - proceedings.neurips.cc
In-context learning is the ability of a model to condition on a prompt sequence consisting of
in-context examples (input-output pairs corresponding to some task) along with a new query …

Representational strengths and limitations of transformers

C Sanford, DJ Hsu, M Telgarsky - Advances in Neural …, 2023 - proceedings.neurips.cc
Attention layers, as commonly used in transformers, form the backbone of modern deep
learning, yet there is no mathematical description of their benefits and deficiencies as …

Attention is not all you need: Pure attention loses rank doubly exponentially with depth

Y Dong, JB Cordonnier… - … conference on machine …, 2021 - proceedings.mlr.press
Attention-based architectures have become ubiquitous in machine learning. Yet, our
understanding of the reasons for their effectiveness remains limited. This work proposes a …

Big bird: Transformers for longer sequences

M Zaheer, G Guruganesh, KA Dubey… - Advances in neural …, 2020 - proceedings.neurips.cc
Transformers-based models, such as BERT, have been one of the most successful deep
learning models for NLP. Unfortunately, one of their core limitations is the quadratic …

[PDF][PDF] Chain of thought empowers transformers to solve inherently serial problems

Z Li, H Liu, D Zhou, T Ma - arxiv preprint arxiv:2402.12875, 2024 - academia.edu
Instructing the model to generate a sequence of intermediate steps, aka, a chain of thought
(CoT), is a highly effective method to improve the accuracy of large language models (LLMs) …