Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Jailbreak Defense in a Narrow Domain: Limitations of Existing Methods and a New Transcript-Classifier Approach
Defending large language models against jailbreaks so that they never engage in a broadly-
defined set of forbidden behaviors is an open problem. In this paper, we investigate the …
defined set of forbidden behaviors is an open problem. In this paper, we investigate the …
STEER-ME: Assessing the Microeconomic Reasoning of Large Language Models
How should one judge whether a given large language model (LLM) can reliably perform
economic reasoning? Most existing LLM benchmarks focus on specific applications and fail …
economic reasoning? Most existing LLM benchmarks focus on specific applications and fail …
On Adversarial Robustness and Out-of-Distribution Robustness of Large Language Models
A Yang, J Tab, P Shah, P Kotchavong - arxiv preprint arxiv:2412.10535, 2024 - arxiv.org
The increasing reliance on large language models (LLMs) for diverse applications
necessitates a thorough understanding of their robustness to adversarial perturbations and …
necessitates a thorough understanding of their robustness to adversarial perturbations and …
Jailbreak Defense in a Narrow Domain: Failures of existing methods and Improving Transcript-Based Classifiers
Defending large language models against jailbreaks so that they never engage in a broad
set of forbidden behaviors is an open problem. In this paper, we study if jailbreak-defense is …
set of forbidden behaviors is an open problem. In this paper, we study if jailbreak-defense is …
Large Language Models for Explainability in Machine Learning
D Beamish, G Exarchakis - openreview.net
We investigate the potential of large language models (LLMs) in explainable artificial
intelligence (XAI) by examining their ability to generate understandable explanations for …
intelligence (XAI) by examining their ability to generate understandable explanations for …