Foundational challenges in assuring alignment and safety of large language models

U Anwar, A Saparov, J Rando, D Paleka… - arxiv preprint arxiv …, 2024 - arxiv.org
This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …

Transfer learning for radio frequency machine learning: a taxonomy and survey

LJ Wong, AJ Michaels - Sensors, 2022 - mdpi.com
Transfer learning is a pervasive technology in computer vision and natural language
processing fields, yielding exponential performance improvements by leveraging prior …

Toward transparent ai: A survey on interpreting the inner structures of deep neural networks

T Räuker, A Ho, S Casper… - 2023 ieee conference …, 2023 - ieeexplore.ieee.org
The last decade of machine learning has seen drastic increases in scale and capabilities.
Deep neural networks (DNNs) are increasingly being deployed in the real world. However …

Similarity of neural network representations revisited

S Kornblith, M Norouzi, H Lee… - … conference on machine …, 2019 - proceedings.mlr.press
Recent work has sought to understand the behavior of neural networks by comparing
representations between layers and between different trained models. We examine methods …

Emerging cross-lingual structure in pretrained language models

S Wu, A Conneau, H Li, L Zettlemoyer… - arxiv preprint arxiv …, 2019 - arxiv.org
We study the problem of multilingual masked language modeling, ie the training of a single
model on concatenated text from multiple languages, and present a detailed study of several …

Individual differences among deep neural network models

J Mehrer, CJ Spoerer, N Kriegeskorte… - Nature …, 2020 - nature.com
Deep neural networks (DNNs) excel at visual recognition tasks and are increasingly used as
a modeling framework for neural computations in the primate brain. Just like individual …

Grounding representation similarity through statistical testing

F Ding, JS Denain, J Steinhardt - Advances in Neural …, 2021 - proceedings.neurips.cc
To understand neural network behavior, recent works quantitatively compare different
networks' learned representations using canonical correlation analysis (CCA), centered …

Joint a-snn: Joint training of artificial and spiking neural networks via self-distillation and weight factorization

Y Guo, W Peng, Y Chen, L Zhang, X Liu, X Huang… - Pattern Recognition, 2023 - Elsevier
Emerged as a biology-inspired method, Spiking Neural Networks (SNNs) mimic the spiking
nature of brain neurons and have received lots of research attention. SNNs deal with binary …

Similarity and matching of neural network representations

A Csiszárik, P Kőrösi-Szabó… - Advances in …, 2021 - proceedings.neurips.cc
We employ a toolset---dubbed Dr. Frankenstein---to analyse the similarity of representations
in deep neural networks. With this toolset we aim to match the activations on given layers of …

Similarity of neural network models: A survey of functional and representational measures

M Klabunde, T Schumacher, M Strohmaier… - arxiv preprint arxiv …, 2023 - arxiv.org
Measuring similarity of neural networks to understand and improve their behavior has
become an issue of great importance and research interest. In this survey, we provide a …