Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Constitutional classifiers: Defending against universal jailbreaks across thousands of hours of red teaming
Large language models (LLMs) are vulnerable to universal jailbreaks-prompting strategies
that systematically bypass model safeguards and enable users to carry out harmful …
that systematically bypass model safeguards and enable users to carry out harmful …
Adversarial ML Problems Are Getting Harder to Solve and to Evaluate
In the past decade, considerable research effort has been devoted to securing machine
learning (ML) models that operate in adversarial settings. Yet, progress has been slow even …
learning (ML) models that operate in adversarial settings. Yet, progress has been slow even …
Multi-Modal One-Shot Federated Ensemble Learning for Medical Data with Vision Large Language Model
Federated learning (FL) has attracted considerable interest in the medical domain due to its
capacity to facilitate collaborative model training while maintaining data privacy. However …
capacity to facilitate collaborative model training while maintaining data privacy. However …
MSTS: A Multimodal Safety Test Suite for Vision-Language Models
Vision-language models (VLMs), which process image and text inputs, are increasingly
integrated into chat assistants and other consumer AI applications. Without proper …
integrated into chat assistants and other consumer AI applications. Without proper …
Peering Behind the Shield: Guardrail Identification in Large Language Models
Human-AI conversations have gained increasing attention since the era of large language
models. Consequently, more techniques, such as input/output guardrails and safety …
models. Consequently, more techniques, such as input/output guardrails and safety …
Towards Efficient Large Multimodal Model Serving
Recent advances in generative AI have led to large multi-modal models (LMMs) capable of
simultaneously processing inputs of various modalities such as text, images, video, and …
simultaneously processing inputs of various modalities such as text, images, video, and …
ELITE: Enhanced Language-Image Toxicity Evaluation for Safety
Current Vision Language Models (VLMs) remain vulnerable to malicious prompts that
induce harmful outputs. Existing safety benchmarks for VLMs primarily rely on automated …
induce harmful outputs. Existing safety benchmarks for VLMs primarily rely on automated …
Gradient Co-occurrence Analysis for Detecting Unsafe Prompts in Large Language Models
Unsafe prompts pose significant safety risks to large language models (LLMs). Existing
methods for detecting unsafe prompts rely on data-driven fine-tuning to train guardrail …
methods for detecting unsafe prompts rely on data-driven fine-tuning to train guardrail …
Universal Adversarial Attack on Aligned Multimodal LLMs
T Rahmatullaev, P Druzhinina, M Mikhalchuk… - arxiv preprint arxiv …, 2025 - arxiv.org
We propose a universal adversarial attack on multimodal Large Language Models (LLMs)
that leverages a single optimized image to override alignment safeguards across diverse …
that leverages a single optimized image to override alignment safeguards across diverse …
FLAME: Flexible LLM-Assisted Moderation Engine
The rapid advancement of Large Language Models (LLMs) has introduced significant
challenges in moderating user-model interactions. While LLMs demonstrate remarkable …
challenges in moderating user-model interactions. While LLMs demonstrate remarkable …