Google znalac

Spremi Citiraj Spominje se 320 puta Srodni članci Svih 12 inačica Prikaži kao HTML

Larger language models do in-context learning differently

J Wei, J Wei, Y Tay, D Tran, A Webson, Y Lu… - ar**s. We investigate two setups-ICL with flipped labels and ICL with …

Spremi Citiraj Spominje se 206 puta Srodni članci Svih 3 inačica Prikaži kao HTML

Tinystories: How small can language models be and still speak coherent english?

R Eldan, Y Li - arxiv preprint arxiv:2305.07759, 2023 - arxiv.org

Language models (LMs) are powerful tools for natural language processing, but they often
struggle to produce coherent and fluent text when they are small. Models with around 125M …

Spremi Citiraj Spominje se 82 puta Srodni članci Svih 11 inačica Prikaži kao HTML

Scan and snap: Understanding training dynamics and token composition in 1-layer transformer

Y Tian, Y Wang, B Chen, SS Du - Advances in neural …, 2023 - proceedings.neurips.cc

Transformer architecture has shown impressive performance in multiple research domains
and has become the backbone of many neural network models. However, there is limited …

Spremi Citiraj Spominje se 73 puta Srodni članci Svih 7 inačica Prikaži kao HTML

Birth of a transformer: A memory viewpoint

A Bietti, V Cabannes, D Bouchacourt… - Advances in …, 2023 - proceedings.neurips.cc

Large language models based on transformers have achieved great empirical successes.
However, as they are deployed more widely, there is a growing need to better understand …

Spremi Citiraj Spominje se 68 puta Srodni članci Svih 7 inačica Prikaži kao HTML

How transformers learn causal structure with gradient descent

E Nichani, A Damian, JD Lee - arxiv preprint arxiv:2402.14735, 2024 - arxiv.org

The incredible success of transformers on sequence modeling tasks can be largely
attributed to the self-attention mechanism, which allows information to be transferred …

Spremi Citiraj Spominje se 50 puta Srodni članci Svih 9 inačica Prikaži kao HTML

Exposing attention glitches with flip-flop language modeling

B Liu, J Ash, S Goel… - Advances in Neural …, 2023 - proceedings.neurips.cc

Why do large language models sometimes output factual inaccuracies and exhibit
erroneous reasoning? The brittleness of these models, particularly when executing long …

Spremi Citiraj Spominje se 18 puta Srodni članci Svih 6 inačica Prikaži kao HTML

In-context learning with transformers: Softmax attention adapts to function lipschitzness

L Collins, A Parulekar, A Mokhtari… - Advances in …, 2025 - proceedings.neurips.cc

A striking property of transformers is their ability to perform in-context learning (ICL), a
machine learning framework in which the learner is presented with a novel context during …

Spremi Citiraj Spominje se 67 puta Srodni članci Svih 4 inačica Prikaži kao HTML

Towards best practices of activation patching in language models: Metrics and methods

F Zhang, N Nanda - arxiv preprint arxiv:2309.16042, 2023 - arxiv.org

Mechanistic interpretability seeks to understand the internal mechanisms of machine
learning models, where localization--identifying the important model components--is a key …