Google Наука

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arxiv preprint arxiv …, 2023 - arxiv.org

AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, so do risks from misalignment. To provide a comprehensive …

Запазване Позоваване С позовавания в 247 Сродни статии Всички 4 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] mit.edu

Probing classifiers: Promises, shortcomings, and advances

Y Belinkov - Computational Linguistics, 2022 - direct.mit.edu

Probing classifiers have emerged as one of the prominent methodologies for interpreting
and analyzing deep neural network models of natural language processing. The basic idea …

Запазване Позоваване С позовавания в 467 Сродни статии Всички 9 версии

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model

M Hanna, O Liu, A Variengien - Advances in Neural …, 2023 - proceedings.neurips.cc

Pre-trained language models can be surprisingly adept at tasks they were not explicitly
trained on, but how they implement these capabilities is poorly understood. In this paper, we …

Запазване Позоваване С позовавания в 156 Сродни статии Всички 5 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Language models represent space and time

W Gurnee, M Tegmark - arxiv preprint arxiv:2310.02207, 2023 - arxiv.org

The capabilities of large language models (LLMs) have sparked debate over whether such
systems just learn an enormous collection of superficial statistics or a set of more coherent …

Запазване Позоваване С позовавания в 192 Сродни статии Всички 3 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Black-box access is insufficient for rigorous ai audits

S Casper, C Ezell, C Siegmann, N Kolt… - Proceedings of the …, 2024 - dl.acm.org

External audits of AI systems are increasingly recognized as a key mechanism for AI
governance. The effectiveness of an audit, however, depends on the degree of access …

Запазване Позоваване С позовавания в 74 Сродни статии Всички 6 версии

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Toward transparent ai: A survey on interpreting the inner structures of deep neural networks

T Räuker, A Ho, S Casper… - 2023 ieee conference …, 2023 - ieeexplore.ieee.org

The last decade of machine learning has seen drastic increases in scale and capabilities.
Deep neural networks (DNNs) are increasingly being deployed in the real world. However …

Запазване Позоваване С позовавания в 190 Сродни статии Всички 5 версии

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Finding neurons in a haystack: Case studies with sparse probing

W Gurnee, N Nanda, M Pauly, K Harvey… - arxiv preprint arxiv …, 2023 - arxiv.org

Despite rapid adoption and deployment of large language models (LLMs), the internal
computations of these models remain opaque and poorly understood. In this work, we seek …

Запазване Позоваване С позовавания в 134 Сродни статии Всички 3 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] mit.edu

[PDF][PDF] Towards faithful model explanation in nlp: A survey

Q Lyu, M Apidianaki, C Callison-Burch - Computational Linguistics, 2024 - direct.mit.edu

End-to-end neural Natural Language Processing (NLP) models are notoriously difficult to
understand. This has given rise to numerous efforts towards model explainability in recent …

Запазване Позоваване С позовавания в 114 Сродни статии Всички 8 версии

[Free GPT-4]
[DeepSeek]

[PDF] mit.edu

Amnesic probing: Behavioral explanation with amnesic counterfactuals

Y Elazar, S Ravfogel, A Jacovi… - Transactions of the …, 2021 - direct.mit.edu

A growing body of work makes use of probing in order to investigate the working of neural
models, often considered black boxes. Recently, an ongoing debate emerged surrounding …

Запазване Позоваване С позовавания в 224 Сродни статии Всички 11 версии

[Free GPT-4]
[DeepSeek]

[PDF] mit.edu

What do self-supervised speech models know about words?

A Pasad, CM Chien, S Settle, K Livescu - Transactions of the …, 2024 - direct.mit.edu

Many self-supervised speech models (S3Ms) have been introduced over the last few years,
improving performance and data efficiency on various speech tasks. However, these …

Запазване Позоваване С позовавания в 29 Сродни статии Всички 8 версии

Създаване на сигнал

Позоваване

Разширено търсене

Запазено в „Моята библиотека“

Probing the probing paradigm: Does probing accuracy entail task relevance?

Ai alignment: A comprehensive survey

Probing classifiers: Promises, shortcomings, and advances

How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model

Language models represent space and time

Black-box access is insufficient for rigorous ai audits

Toward transparent ai: A survey on interpreting the inner structures of deep neural networks

Finding neurons in a haystack: Case studies with sparse probing

[PDF][PDF] Towards faithful model explanation in nlp: A survey

Amnesic probing: Behavioral explanation with amnesic counterfactuals

What do self-supervised speech models know about words?