A review on the attention mechanism of deep learning

Z Niu, G Zhong, H Yu - Neurocomputing, 2021 - Elsevier
Attention has arguably become one of the most important concepts in the deep learning
field. It is inspired by the biological systems of humans that tend to focus on the distinctive …

Social interactions for autonomous driving: A review and perspectives

W Wang, L Wang, C Zhang, C Liu… - Foundations and Trends …, 2022 - nowpublishers.com
No human drives a car in a vacuum; she/he must negotiate with other road users to achieve
their goals in social traffic scenes. A rational human driver can interact with other road users …

Transformers learn to implement preconditioned gradient descent for in-context learning

K Ahn, X Cheng, H Daneshmand… - Advances in Neural …, 2023 - proceedings.neurips.cc
Several recent works demonstrate that transformers can implement algorithms like gradient
descent. By a careful construction of weights, these works show that multiple layers of …

Wizardmath: Empowering mathematical reasoning for large language models via reinforced evol-instruct

H Luo, Q Sun, C Xu, P Zhao, J Lou, C Tao… - arxiv preprint arxiv …, 2023 - arxiv.org
Large language models (LLMs), such as GPT-4, have shown remarkable performance in
natural language processing (NLP) tasks, including challenging mathematical reasoning …

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arxiv preprint arxiv …, 2021 - arxiv.org
AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

Program synthesis with large language models

J Austin, A Odena, M Nye, M Bosma… - arxiv preprint arxiv …, 2021 - arxiv.org
This paper explores the limits of the current generation of large language models for
program synthesis in general purpose programming languages. We evaluate a collection of …

Memorybank: Enhancing large language models with long-term memory

W Zhong, L Guo, Q Gao, H Ye, Y Wang - Proceedings of the AAAI …, 2024 - ojs.aaai.org
Abstract Large Language Models (LLMs) have drastically reshaped our interactions with
artificial intelligence (AI) systems, showcasing impressive performance across an extensive …

Show your work: Scratchpads for intermediate computation with language models

M Nye, AJ Andreassen, G Gur-Ari… - arxiv preprint arxiv …, 2021 - arxiv.org
Large pre-trained language models perform remarkably well on tasks that can be done" in
one pass", such as generating realistic text or synthesizing computer programs. However …

RULER: What's the Real Context Size of Your Long-Context Language Models?

CP Hsieh, S Sun, S Kriman, S Acharya… - arxiv preprint arxiv …, 2024 - arxiv.org
The needle-in-a-haystack (NIAH) test, which examines the ability to retrieve a piece of
information (the" needle") from long distractor texts (the" haystack"), has been widely …

Perceiver: General perception with iterative attention

A Jaegle, F Gimeno, A Brock… - International …, 2021 - proceedings.mlr.press
Biological systems understand the world by simultaneously processing high-dimensional
inputs from modalities as diverse as vision, audition, touch, proprioception, etc. The …