Taxonomy of machine learning safety: A survey and primer

S Mohseni, H Wang, C **ao, Z Yu, Z Wang… - ACM Computing …, 2022 - dl.acm.org
The open-world deployment of Machine Learning (ML) algorithms in safety-critical
applications such as autonomous vehicles needs to address a variety of ML vulnerabilities …

[PDF][PDF] DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models.

B Wang, W Chen, H Pei, C **e, M Kang, C Zhang, C Xu… - NeurIPS, 2023 - blogs.qub.ac.uk
Abstract Generative Pre-trained Transformer (GPT) models have exhibited exciting progress
in their capabilities, capturing the interest of practitioners and the public alike. Yet, while the …

Smoothllm: Defending large language models against jailbreaking attacks

A Robey, E Wong, H Hassani, GJ Pappas - arxiv preprint arxiv …, 2023 - arxiv.org
Despite efforts to align large language models (LLMs) with human intentions, widely-used
LLMs such as GPT, Llama, and Claude are susceptible to jailbreaking attacks, wherein an …

Surgical fine-tuning improves adaptation to distribution shifts

Y Lee, AS Chen, F Tajwar, A Kumar, H Yao… - arxiv preprint arxiv …, 2022 - arxiv.org
A common approach to transfer learning under distribution shift is to fine-tune the last few
layers of a pre-trained model, preserving learned features while also adapting to the new …

From admission to discharge: a systematic review of clinical natural language processing along the patient journey

K Klug, K Beckh, D Antweiler, N Chakraborty… - BMC Medical Informatics …, 2024 - Springer
Background Medical text, as part of an electronic health record, is an essential information
source in healthcare. Although natural language processing (NLP) techniques for medical …

Trak: Attributing model behavior at scale

SM Park, K Georgiev, A Ilyas, G Leclerc… - arxiv preprint arxiv …, 2023 - arxiv.org
The goal of data attribution is to trace model predictions back to training data. Despite a long
line of work towards this goal, existing approaches to data attribution tend to force users to …

Accuracy on the line: on the strong correlation between out-of-distribution and in-distribution generalization

JP Miller, R Taori, A Raghunathan… - International …, 2021 - proceedings.mlr.press
For machine learning systems to be reliable, we must understand their performance in
unseen, out-of-distribution environments. In this paper, we empirically show that out-of …

[HTML][HTML] Multimodal neurons in artificial neural networks

G Goh, N Cammarata, C Voss, S Carter, M Petrov… - Distill, 2021 - distill.pub
Gabriel Goh: Research lead. Gabriel Goh first discovered multimodal neurons, sketched out
the project direction and paper outline, and did much of the conceptual and engineering …

Discover and cure: Concept-aware mitigation of spurious correlation

S Wu, M Yuksekgonul, L Zhang… - … Conference on Machine …, 2023 - proceedings.mlr.press
Deep neural networks often rely on spurious correlations to make predictions, which hinders
generalization beyond training environments. For instance, models that associate cats with …

Change is hard: A closer look at subpopulation shift

Y Yang, H Zhang, D Katabi, M Ghassemi - arxiv preprint arxiv:2302.12254, 2023 - arxiv.org
Machine learning models often perform poorly on subgroups that are underrepresented in
the training data. Yet, little is understood on the variation in mechanisms that cause …