Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Trained transformers learn linear models in-context
Attention-based neural networks such as transformers have demonstrated a remarkable
ability to exhibit in-context learning (ICL): Given a short prompt sequence of tokens from an …
ability to exhibit in-context learning (ICL): Given a short prompt sequence of tokens from an …
Tinystories: How small can language models be and still speak coherent english?
Language models (LMs) are powerful tools for natural language processing, but they often
struggle to produce coherent and fluent text when they are small. Models with around 125M …
struggle to produce coherent and fluent text when they are small. Models with around 125M …
Scan and snap: Understanding training dynamics and token composition in 1-layer transformer
Transformer architecture has shown impressive performance in multiple research domains
and has become the backbone of many neural network models. However, there is limited …
and has become the backbone of many neural network models. However, there is limited …
Birth of a transformer: A memory viewpoint
Large language models based on transformers have achieved great empirical successes.
However, as they are deployed more widely, there is a growing need to better understand …
However, as they are deployed more widely, there is a growing need to better understand …
How transformers learn causal structure with gradient descent
The incredible success of transformers on sequence modeling tasks can be largely
attributed to the self-attention mechanism, which allows information to be transferred …
attributed to the self-attention mechanism, which allows information to be transferred …
Exposing attention glitches with flip-flop language modeling
Why do large language models sometimes output factual inaccuracies and exhibit
erroneous reasoning? The brittleness of these models, particularly when executing long …
erroneous reasoning? The brittleness of these models, particularly when executing long …
In-context learning with transformers: Softmax attention adapts to function lipschitzness
A striking property of transformers is their ability to perform in-context learning (ICL), a
machine learning framework in which the learner is presented with a novel context during …
machine learning framework in which the learner is presented with a novel context during …
Towards best practices of activation patching in language models: Metrics and methods
Mechanistic interpretability seeks to understand the internal mechanisms of machine
learning models, where localization--identifying the important model components--is a key …
learning models, where localization--identifying the important model components--is a key …
Joma: Demystifying multilayer transformers via joint dynamics of mlp and attention
We propose Joint MLP/Attention (JoMA) dynamics, a novel mathematical framework to
understand the training procedure of multilayer Transformer architectures. This is achieved …
understand the training procedure of multilayer Transformer architectures. This is achieved …