Google Academic

C Wendler, V Veselovsky, G Monea… - Proceedings of the 62nd …, 2024 - aclanthology.org

We ask whether multilingual language models trained on unbalanced, English-dominated
corpora use English as an internal pivot language—-a question of key importance for …

Salvați Citați Citat de 74 ori Articole cu conținut similar Toate cele 4 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Universal neurons in gpt2 language models

W Gurnee, T Horsley, ZC Guo, TR Kheirkhah… - arxiv preprint arxiv …, 2024 - arxiv.org

A basic question within the emerging field of mechanistic interpretability is the degree to
which neural networks learn the same underlying mechanisms. In other words, are neural …

Salvați Citați Citat de 26 ori Articole cu conținut similar Toate cele 4 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

How truncating weights improves reasoning in language models

L Chen, J Bruna, A Bietti - arxiv preprint arxiv:2406.03068, 2024 - arxiv.org

In addition to the ability to generate fluent text in various languages, large language models
have been successful at tasks that involve basic forms of logical" reasoning" over their …

Salvați Citați Citat de 1 ori Articole cu conținut similar Toate cele 4 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Progressive distillation induces an implicit curriculum

A Panigrahi, B Liu, S Malladi, A Risteski… - arxiv preprint arxiv …, 2024 - arxiv.org

Knowledge distillation leverages a teacher model to improve the training of a student model.
A persistent challenge is that a better teacher does not always yield a better student, to …

Salvați Citați Articole cu conținut similar Toate cele 4 versiuni Afișare ca HTML

Creează alerta

Citați

Căutare avansată

Salvat în Bibliotecă

Training dynamics of contextual n-grams in language models

Do llamas work in english? on the latent language of multilingual transformers

Universal neurons in gpt2 language models

How truncating weights improves reasoning in language models

Progressive distillation induces an implicit curriculum