Leace: Perfect linear concept erasure in closed form

N Belrose, D Schneider-Joseph… - Advances in …, 2023 - proceedings.neurips.cc
Abstract Concept erasure aims to remove specified features from a representation. It can
improve fairness (eg preventing a classifier from using gender or race) and interpretability …

Spectral editing of activations for large language model alignment

Y Qiu, Z Zhao, Y Ziser, A Korhonen, EM Ponti… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) often exhibit undesirable behaviours, such as generating
untruthful or biased content. Editing their internal representations has been shown to be …

Log-linear guardedness and its implications

S Ravfogel, Y Goldberg, R Cotterell - arxiv preprint arxiv:2210.10012, 2022 - arxiv.org
Methods for erasing human-interpretable concepts from neural representations that assume
linearity have been found to be tractable and useful. However, the impact of this removal on …

Gold doesn't always glitter: Spectral removal of linear and nonlinear guarded attribute information

S Shao, Y Ziser, SB Cohen - arxiv preprint arxiv:2203.07893, 2022 - arxiv.org
We describe a simple and effective method (Spectral Attribute removaL; SAL) to remove
private or guarded information from neural representations. Our method uses matrix …

Cross-attention is not enough: Incongruity-aware dynamic hierarchical fusion for multimodal affect recognition

Y Wang, Y Li, PP Liang, LP Morency, P Bell… - arxiv preprint arxiv …, 2023 - arxiv.org
Fusing multiple modalities has proven effective for multimodal information processing.
However, the incongruity between modalities poses a challenge for multimodal fusion …

BERT is not the count: Learning to match mathematical statements with proofs

WW Li, Y Ziser, M Coavoux, SB Cohen - arxiv preprint arxiv:2302.09350, 2023 - arxiv.org
We introduce a task consisting in matching a proof to a given mathematical statement. The
task fits well within current research on Mathematical Information Retrieval and, more …

Surgical Feature-Space Decomposition of LLMs: Why, When and How?

A Chavan, N Lele, D Gupta - arxiv preprint arxiv:2405.13039, 2024 - arxiv.org
Low-rank approximations, of the weight and feature space can enhance the performance of
deep learning models, whether in terms of improving generalization or reducing the latency …

Taco: Targeted concept erasure prevents non-linear classifiers from detecting protected attributes

F Jourdan, L Béthune, A Picard, L Risser… - arxiv preprint arxiv …, 2023 - arxiv.org
Ensuring fairness in NLP models is crucial, as they often encode sensitive attributes like
gender and ethnicity, leading to biased outcomes. Current concept erasure methods attempt …

Advancing Fairness in Natural Language Processing: From Traditional Methods to Explainability

F Jourdan - arxiv preprint arxiv:2410.12511, 2024 - arxiv.org
The burgeoning field of Natural Language Processing (NLP) stands at a critical juncture
where the integration of fairness within its frameworks has become an imperative. This PhD …

How To Build Competitive Multi-gender Speech Translation Models For Controlling Speaker Gender Translation

M Gaido, D Fucci, M Negri, L Bentivogli - 2023 - books.google.com
When translating from notional gender languages (eg, English) into grammatical gender
languages (eg, Italian), the generated translation requires explicit gender assignments for …