Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Ai alignment: A comprehensive survey
AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, so do risks from misalignment. To provide a comprehensive …
AI systems grow more capable, so do risks from misalignment. To provide a comprehensive …
Mechanistic Interpretability for AI Safety--A Review
Understanding AI systems' inner workings is critical for ensuring value alignment and safety.
This review explores mechanistic interpretability: reverse engineering the computational …
This review explores mechanistic interpretability: reverse engineering the computational …
Finding neurons in a haystack: Case studies with sparse probing
Despite rapid adoption and deployment of large language models (LLMs), the internal
computations of these models remain opaque and poorly understood. In this work, we seek …
computations of these models remain opaque and poorly understood. In this work, we seek …
[HTML][HTML] Multimodal neurons in artificial neural networks
Gabriel Goh: Research lead. Gabriel Goh first discovered multimodal neurons, sketched out
the project direction and paper outline, and did much of the conceptual and engineering …
the project direction and paper outline, and did much of the conceptual and engineering …
Modality competition: What makes joint training of multi-modal network fail in deep learning?(provably)
Despite the remarkable success of deep multi-modal learning in practice, it has not been
well-explained in theory. Recently, it has been observed that the best uni-modal network …
well-explained in theory. Recently, it has been observed that the best uni-modal network …
Toward understanding the feature learning process of self-supervised contrastive learning
We formally study how contrastive learning learns the feature representations for neural
networks by investigating its feature learning process. We consider the case where our data …
networks by investigating its feature learning process. We consider the case where our data …
Distributional semantics and linguistic theory
G Boleda - Annual Review of Linguistics, 2020 - annualreviews.org
Distributional semantics provides multidimensional, graded, empirically induced word
representations that successfully capture many aspects of meaning in natural languages, as …
representations that successfully capture many aspects of meaning in natural languages, as …
Learning gender-neutral word embeddings
Word embedding models have become a fundamental component in a wide range of
Natural Language Processing (NLP) applications. However, embeddings trained on human …
Natural Language Processing (NLP) applications. However, embeddings trained on human …
Reverse engineering self-supervised learning
Understanding the learned representation and underlying mechanisms of Self-Supervised
Learning (SSL) often poses a challenge. In this paper, we 'reverse engineer'SSL, conducting …
Learning (SSL) often poses a challenge. In this paper, we 'reverse engineer'SSL, conducting …
Feature purification: How adversarial training performs robust deep learning
Despite the empirical success of using adversarial training to defend deep learning models
against adversarial perturbations, so far, it still remains rather unclear what the principles are …
against adversarial perturbations, so far, it still remains rather unclear what the principles are …