Trustworthy artificial intelligence: a review

D Kaur, S Uslu, KJ Rittichier, A Durresi - ACM computing surveys (CSUR …, 2022 - dl.acm.org
Artificial intelligence (AI) and algorithmic decision making are having a profound impact on
our daily lives. These systems are vastly used in different high-stakes applications like …

GAN-based anomaly detection: A review

X **a, X Pan, N Li, X He, L Ma, X Zhang, N Ding - Neurocomputing, 2022 - Elsevier
Supervised learning algorithms have shown limited use in the field of anomaly detection due
to the unpredictability and difficulty in acquiring abnormal samples. In recent years …

Representation engineering: A top-down approach to ai transparency

A Zou, L Phan, S Chen, J Campbell, P Guo… - arxiv preprint arxiv …, 2023 - arxiv.org
In this paper, we identify and characterize the emerging area of representation engineering
(RepE), an approach to enhancing the transparency of AI systems that draws on insights …

Does localization inform editing? surprising differences in causality-based localization vs. knowledge editing in language models

P Hase, M Bansal, B Kim… - Advances in Neural …, 2024 - proceedings.neurips.cc
Abstract Language models learn a great quantity of factual information during pretraining,
and recent work localizes this information to specific model weights like mid-layer MLP …

Layercam: Exploring hierarchical class activation maps for localization

PT Jiang, CB Zhang, Q Hou… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
The class activation maps are generated from the final convolutional layer of CNN. They can
highlight discriminative object regions for the class of interest. These discovered object …

Interpretable machine learning: Fundamental principles and 10 grand challenges

C Rudin, C Chen, Z Chen, H Huang… - Statistic …, 2022 - projecteuclid.org
Interpretability in machine learning (ML) is crucial for high stakes decisions and
troubleshooting. In this work, we provide fundamental principles for interpretable ML, and …

[HTML][HTML] A novel approach to explain the black-box nature of machine learning in compressive strength predictions of concrete using Shapley additive explanations …

IU Ekanayake, DPP Meddage, U Rathnayake - Case Studies in …, 2022 - Elsevier
Abstract Machine learning (ML) techniques are often employed for the accurate prediction of
the compressive strength of concrete. Despite higher accuracy, previous ML models failed to …

Concept bottleneck models

PW Koh, T Nguyen, YS Tang… - International …, 2020 - proceedings.mlr.press
We seek to learn models that we can interact with using high-level concepts: if the model did
not think there was a bone spur in the x-ray, would it still predict severe arthritis? State-of-the …

Opportunities and challenges in explainable artificial intelligence (xai): A survey

A Das, P Rad - arxiv preprint arxiv:2006.11371, 2020 - arxiv.org
Nowadays, deep neural networks are widely used in mission critical systems such as
healthcare, self-driving vehicles, and military which have direct impact on human lives …

Interpretability and fairness evaluation of deep learning models on MIMIC-IV dataset

C Meng, L Trinh, N Xu, J Enouen, Y Liu - Scientific Reports, 2022 - nature.com
The recent release of large-scale healthcare datasets has greatly propelled the research of
data-driven deep learning models for healthcare applications. However, due to the nature of …