A review on the attention mechanism of deep learning
Attention has arguably become one of the most important concepts in the deep learning
field. It is inspired by the biological systems of humans that tend to focus on the distinctive …
field. It is inspired by the biological systems of humans that tend to focus on the distinctive …
Social interactions for autonomous driving: A review and perspectives
No human drives a car in a vacuum; she/he must negotiate with other road users to achieve
their goals in social traffic scenes. A rational human driver can interact with other road users …
their goals in social traffic scenes. A rational human driver can interact with other road users …
Transformers learn to implement preconditioned gradient descent for in-context learning
Several recent works demonstrate that transformers can implement algorithms like gradient
descent. By a careful construction of weights, these works show that multiple layers of …
descent. By a careful construction of weights, these works show that multiple layers of …
Wizardmath: Empowering mathematical reasoning for large language models via reinforced evol-instruct
Large language models (LLMs), such as GPT-4, have shown remarkable performance in
natural language processing (NLP) tasks, including challenging mathematical reasoning …
natural language processing (NLP) tasks, including challenging mathematical reasoning …
On the opportunities and risks of foundation models
AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …
Program synthesis with large language models
This paper explores the limits of the current generation of large language models for
program synthesis in general purpose programming languages. We evaluate a collection of …
program synthesis in general purpose programming languages. We evaluate a collection of …
Memorybank: Enhancing large language models with long-term memory
Abstract Large Language Models (LLMs) have drastically reshaped our interactions with
artificial intelligence (AI) systems, showcasing impressive performance across an extensive …
artificial intelligence (AI) systems, showcasing impressive performance across an extensive …
Show your work: Scratchpads for intermediate computation with language models
Large pre-trained language models perform remarkably well on tasks that can be done" in
one pass", such as generating realistic text or synthesizing computer programs. However …
one pass", such as generating realistic text or synthesizing computer programs. However …
RULER: What's the Real Context Size of Your Long-Context Language Models?
The needle-in-a-haystack (NIAH) test, which examines the ability to retrieve a piece of
information (the" needle") from long distractor texts (the" haystack"), has been widely …
information (the" needle") from long distractor texts (the" haystack"), has been widely …
Perceiver: General perception with iterative attention
Biological systems understand the world by simultaneously processing high-dimensional
inputs from modalities as diverse as vision, audition, touch, proprioception, etc. The …
inputs from modalities as diverse as vision, audition, touch, proprioception, etc. The …