- Academic Search

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arxiv preprint arxiv …, 2023 - arxiv.org

AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, so do risks from misalignment. To provide a comprehensive …

Spara Citera Citerat av 243 Relaterade artiklar Alla 4 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Mechanistic Interpretability for AI Safety--A Review

L Bereska, E Gavves - arxiv preprint arxiv:2404.14082, 2024 - arxiv.org

Understanding AI systems' inner workings is critical for ensuring value alignment and safety.
This review explores mechanistic interpretability: reverse engineering the computational …

Spara Citera Citerat av 86 Relaterade artiklar Alla 4 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Finding neurons in a haystack: Case studies with sparse probing

W Gurnee, N Nanda, M Pauly, K Harvey… - arxiv preprint arxiv …, 2023 - arxiv.org

Despite rapid adoption and deployment of large language models (LLMs), the internal
computations of these models remain opaque and poorly understood. In this work, we seek …

Spara Citera Citerat av 132 Relaterade artiklar Alla 3 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[HTML] distill.pub

[HTML][HTML] Multimodal neurons in artificial neural networks

G Goh, N Cammarata, C Voss, S Carter, M Petrov… - Distill, 2021 - distill.pub

Gabriel Goh: Research lead. Gabriel Goh first discovered multimodal neurons, sketched out
the project direction and paper outline, and did much of the conceptual and engineering …

Spara Citera Citerat av 381 Relaterade artiklar Alla 4 versionerna Cachad

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Modality competition: What makes joint training of multi-modal network fail in deep learning?(provably)

Y Huang, J Lin, C Zhou, H Yang… - … conference on machine …, 2022 - proceedings.mlr.press

Despite the remarkable success of deep multi-modal learning in practice, it has not been
well-explained in theory. Recently, it has been observed that the best uni-modal network …

Spara Citera Citerat av 103 Relaterade artiklar Alla 5 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Toward understanding the feature learning process of self-supervised contrastive learning

Z Wen, Y Li - International Conference on Machine Learning, 2021 - proceedings.mlr.press

We formally study how contrastive learning learns the feature representations for neural
networks by investigating its feature learning process. We consider the case where our data …

Spara Citera Citerat av 150 Relaterade artiklar Alla 4 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Distributional semantics and linguistic theory

G Boleda - Annual Review of Linguistics, 2020 - annualreviews.org

Distributional semantics provides multidimensional, graded, empirically induced word
representations that successfully capture many aspects of meaning in natural languages, as …

Spara Citera Citerat av 309 Relaterade artiklar Alla 6 versionerna

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Learning gender-neutral word embeddings

J Zhao, Y Zhou, Z Li, W Wang, KW Chang - arxiv preprint arxiv …, 2018 - arxiv.org

Word embedding models have become a fundamental component in a wide range of
Natural Language Processing (NLP) applications. However, embeddings trained on human …

Spara Citera Citerat av 517 Relaterade artiklar Alla 7 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Reverse engineering self-supervised learning

I Ben-Shaul, R Shwartz-Ziv, T Galanti… - Advances in …, 2023 - proceedings.neurips.cc

Understanding the learned representation and underlying mechanisms of Self-Supervised
Learning (SSL) often poses a challenge. In this paper, we 'reverse engineer'SSL, conducting …

Spara Citera Citerat av 35 Relaterade artiklar Alla 6 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Feature purification: How adversarial training performs robust deep learning

Z Allen-Zhu, Y Li - 2021 IEEE 62nd Annual Symposium on …, 2022 - ieeexplore.ieee.org

Despite the empirical success of using adversarial training to defend deep learning models
against adversarial perturbations, so far, it still remains rather unclear what the principles are …

Spara Citera Citerat av 192 Relaterade artiklar Alla 4 versionerna

Skapa alarm

Citera

Avancerad sökning

Har sparats i Mitt bibliotek

Linear algebraic structure of word senses, with applications to polysemy

Ai alignment: A comprehensive survey

Mechanistic Interpretability for AI Safety--A Review

Finding neurons in a haystack: Case studies with sparse probing

[HTML][HTML] Multimodal neurons in artificial neural networks

Modality competition: What makes joint training of multi-modal network fail in deep learning?(provably)

Toward understanding the feature learning process of self-supervised contrastive learning

Distributional semantics and linguistic theory

Learning gender-neutral word embeddings

Reverse engineering self-supervised learning

Feature purification: How adversarial training performs robust deep learning