Explainable ai: A review of machine learning interpretability methods
Recent advances in artificial intelligence (AI) have led to its widespread industrial adoption,
with machine learning systems demonstrating superhuman performance in a significant …
with machine learning systems demonstrating superhuman performance in a significant …
Ai alignment: A comprehensive survey
AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …
Red teaming language models with language models
Language Models (LMs) often cannot be deployed because of their potential to harm users
in hard-to-predict ways. Prior work identifies harmful behaviors before deployment by using …
in hard-to-predict ways. Prior work identifies harmful behaviors before deployment by using …
Radar: Robust ai-text detection via adversarial learning
Recent advances in large language models (LLMs) and the intensifying popularity of
ChatGPT-like applications have blurred the boundary of high-quality text generation …
ChatGPT-like applications have blurred the boundary of high-quality text generation …
A survey of safety and trustworthiness of large language models through the lens of verification and validation
Large language models (LLMs) have exploded a new heatwave of AI for their ability to
engage end-users in human-level conversations with detailed and articulate answers across …
engage end-users in human-level conversations with detailed and articulate answers across …
Trustworthy ai: A computational perspective
In the past few decades, artificial intelligence (AI) technology has experienced swift
developments, changing everyone's daily life and profoundly altering the course of human …
developments, changing everyone's daily life and profoundly altering the course of human …
Adversarial attacks and defenses in images, graphs and text: A review
Deep neural networks (DNN) have achieved unprecedented success in numerous machine
learning tasks in various domains. However, the existence of adversarial examples raises …
learning tasks in various domains. However, the existence of adversarial examples raises …
Universal adversarial triggers for attacking and analyzing NLP
Adversarial examples highlight model vulnerabilities and are useful for evaluation and
interpretation. We define universal adversarial triggers: input-agnostic sequences of tokens …
interpretation. We define universal adversarial triggers: input-agnostic sequences of tokens …
Adversarial sensor attack on lidar-based perception in autonomous driving
In Autonomous Vehicles (AVs), one fundamental pillar is perception, which leverages
sensors like cameras and LiDARs (Light Detection and Ranging) to understand the driving …
sensors like cameras and LiDARs (Light Detection and Ranging) to understand the driving …
Adversarial attacks on deep-learning models in natural language processing: A survey
With the development of high computational devices, deep neural networks (DNNs), in
recent years, have gained significant popularity in many Artificial Intelligence (AI) …
recent years, have gained significant popularity in many Artificial Intelligence (AI) …