الباحث العلمي من Google

J Kaddour, J Harris, M Mozes, H Bradley… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

Large Language Models (LLMs) went from non-existent to ubiquitous in the machine
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …‏

حفظ اقتباس تم اقتباسها في عدد: 482 مقالات ذات صلة الإصدارات الـ 3كلها إصدار HTML‏

[Free GPT-4]

[PDF] arxiv.org

Ai alignment: A comprehensive survey‏

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …‏

حفظ اقتباس تم اقتباسها في عدد: 226 مقالات ذات صلة الإصدارات الـ 3كلها إصدار HTML‏

[Free GPT-4]

[PDF] mlr.press

Pythia: A suite for analyzing large language models across training and scaling‏

S Biderman, H Schoelkopf… - International …, 2023‏ - proceedings.mlr.press‏

How do large language models (LLMs) develop and evolve over the course of training?
How do these patterns change as models scale? To answer these questions, we introduce …‏

حفظ اقتباس تم اقتباسها في عدد: 972 مقالات ذات صلة الإصدارات الـ 7كلها إصدار HTML‏

[Free GPT-4]

[PDF] neurips.cc

Emergent and predictable memorization in large language models‏

S Biderman, U Prashanth, L Sutawika… - Advances in …, 2024‏ - proceedings.neurips.cc‏

Memorization, or the tendency of large language models (LLMs) to output entire sequences
from their training data verbatim, is a key concern for deploying language models. In …‏

حفظ اقتباس تم اقتباسها في عدد: 149 مقالات ذات صلة الإصدارات الـ 5كلها إصدار HTML‏

[Free GPT-4]

[PDF] arxiv.org

Llemma: An open language model for mathematics‏

Z Azerbayev, H Schoelkopf, K Paster… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

We present Llemma, a large language model for mathematics. We continue pretraining
Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing …‏

حفظ اقتباس تم اقتباسها في عدد: 255 مقالات ذات صلة الإصدارات الـ 7كلها إصدار HTML‏

Representation engineering: A top-down approach to ai transparency‏

A Zou, L Phan, S Chen, J Campbell, P Guo… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

In this paper, we identify and characterize the emerging area of representation engineering
(RepE), an approach to enhancing the transparency of AI systems that draws on insights …‏

حفظ اقتباس تم اقتباسها في عدد: 281 مقالات ذات صلة الإصدارات الـ 2كلها إصدار HTML‏

[Free GPT-4]

[PDF] arxiv.org

Does circuit analysis interpretability scale? evidence from multiple choice capabilities in chinchilla‏

T Lieberum, M Rahtz, J Kramár, N Nanda… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

\emph {Circuit analysis} is a promising technique for understanding the internal mechanisms
of language models. However, existing analyses are done in small models far from the state …‏

حفظ اقتباس تم اقتباسها في عدد: 63 مقالات ذات صلة الإصدارات الـ 2كلها إصدار HTML‏

[Free GPT-4]

[PDF] arxiv.org

Function vectors in large language models‏

E Todd, ML Li, AS Sharma, A Mueller… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

We report the presence of a simple neural mechanism that represents an input-output
function as a vector within autoregressive transformer language models (LMs). Using causal …‏

حفظ اقتباس تم اقتباسها في عدد: 102 مقالات ذات صلة الإصدارات الـ 4كلها إصدار HTML‏

[Free GPT-4]

[PDF] arxiv.org

Linearity of relation decoding in transformer language models‏

E Hernandez, AS Sharma, T Haklay, K Meng… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

Much of the knowledge encoded in transformer language models (LMs) may be expressed
in terms of relations: relations between words and their synonyms, entities and their …‏

حفظ اقتباس تم اقتباسها في عدد: 57 مقالات ذات صلة الإصدارات الـ 3كلها إصدار HTML‏

[Free GPT-4]

[PDF] arxiv.org

Rethinking interpretability in the era of large language models‏

C Singh, JP Inala, M Galley, R Caruana… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

Interpretable machine learning has exploded as an area of interest over the last decade,
sparked by the rise of increasingly large datasets and deep neural networks …‏

حفظ اقتباس تم اقتباسها في عدد: 87 مقالات ذات صلة الإصدارات الـ 2كلها إصدار HTML‏

إنشاء تنبيه

اقتباس

بحث متقدم

تم حفظ المقالة في مكتبتي.

Eliciting latent predictions from transformers with the tuned lens

Challenges and applications of large language models‏

Ai alignment: A comprehensive survey‏

Pythia: A suite for analyzing large language models across training and scaling‏

Emergent and predictable memorization in large language models‏

Llemma: An open language model for mathematics‏

Representation engineering: A top-down approach to ai transparency‏

Does circuit analysis interpretability scale? evidence from multiple choice capabilities in chinchilla‏

Function vectors in large language models‏

Linearity of relation decoding in transformer language models‏

Rethinking interpretability in the era of large language models‏