On the duality between sharpness-aware minimization and adversarial training

Y Zhang, H He, J Zhu, H Chen, Y Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Adversarial Training (AT), which adversarially perturb the input samples during training, has
been acknowledged as one of the most effective defenses against adversarial attacks, yet …

Boosting jailbreak attack with momentum

Y Zhang, Z Wei - arxiv preprint arxiv:2405.01229, 2024 - arxiv.org
Large Language Models (LLMs) have achieved remarkable success across diverse tasks,
yet they remain vulnerable to adversarial attacks, notably the well-documented\textit …

LUNA: A Model-Based Universal Analysis Framework for Large Language Models

D Song, X **e, J Song, D Zhu, Y Huang… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
Over the past decade, Artificial Intelligence (AI) has had great success recently and is being
used in a wide range of academic and industrial fields. More recently, Large Language …

Automata Extraction from Transformers

Y Zhang, Z Wei, M Sun - arxiv preprint arxiv:2406.05564, 2024 - arxiv.org
In modern machine (ML) learning systems, Transformer-based architectures have achieved
milestone success across a broad spectrum of tasks, yet understanding their operational …

Particle Swarm Optimization-Based Model Abstraction and Explanation Generation for a Recurrent Neural Network

Y Liu, H Wang, Y Ma - Algorithms, 2024 - mdpi.com
In text classifier models, the complexity of recurrent neural networks (RNNs) is very high
because of the vast state space and uncertainty of transitions, which makes the RNN …

Enhancing Adversarial Attacks: The Similar Target Method

S Zhang, Z Wang, Z Zhou, J Liu… - 2024 International Joint …, 2024 - ieeexplore.ieee.org
Adversarial examples are notably characterized by their strong transferability, allowing
attackers to craft these examples on their models and subsequently deploy them against …

Adaptive Resilience via Probabilistic Automaton: Safeguarding Multi-Agent Systems from Leader Missing Attacks

K Wang, X Gong - Applied Mathematics and Statistics, 2024 - sciltp.com
The resilience of leader-following structures has been a hotspot in both academic and
industrial research. Existing studies mainly focus on maintaining follower coherence, usually …

Causal Abstraction in Model Interpretability: A Compact Survey

Y Zhang - arxiv preprint arxiv:2410.20161, 2024 - arxiv.org
The pursuit of interpretable artificial intelligence has led to significant advancements in the
development of methods that aim to explain the decision-making processes of complex …

Artificial instinct

Y Li, J Wang - International Conference on Algorithms, High …, 2024 - spiedigitallibrary.org
Artificial Intelligence (AI) has made remarkable advancements, surpassing human
capabilities in various domains. This paper delves into cutting-edge AI technologies …

[PDF][PDF] On the Robustness of In-Context Learning with Noisy Labels: Train, Inference, and Beyond

C Cheng, H Wen, X Yu, Z Wei - chencheng.me
Abstract Recently, the mysterious In-Context Learning (ICL) ability of Transformer
architecture, particularly in large language models, has garnered considerable research …