Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Survey of vulnerabilities in large language models revealed by adversarial attacks
Large Language Models (LLMs) are swiftly advancing in architecture and capability, and as
they integrate more deeply into complex systems, the urgency to scrutinize their security …
they integrate more deeply into complex systems, the urgency to scrutinize their security …
A survey of adversarial defenses and robustness in nlp
In the past few years, it has become increasingly evident that deep neural networks are not
resilient enough to withstand adversarial perturbations in input data, leaving them …
resilient enough to withstand adversarial perturbations in input data, leaving them …
Trustworthy llms: a survey and guideline for evaluating large language models' alignment
Ensuring alignment, which refers to making models behave in accordance with human
intentions [1, 2], has become a critical task before deploying large language models (LLMs) …
intentions [1, 2], has become a critical task before deploying large language models (LLMs) …
Easily accessible text-to-image generation amplifies demographic stereotypes at large scale
Machine learning models that convert user-written text descriptions into images are now
widely available online and used by millions of users to generate millions of images a day …
widely available online and used by millions of users to generate millions of images a day …
Red teaming language models with language models
Language Models (LMs) often cannot be deployed because of their potential to harm users
in hard-to-predict ways. Prior work identifies harmful behaviors before deployment by using …
in hard-to-predict ways. Prior work identifies harmful behaviors before deployment by using …
A survey of safety and trustworthiness of large language models through the lens of verification and validation
Large language models (LLMs) have exploded a new heatwave of AI for their ability to
engage end-users in human-level conversations with detailed and articulate answers across …
engage end-users in human-level conversations with detailed and articulate answers across …
Algorithmic content moderation: Technical and political challenges in the automation of platform governance
As government pressure on major technology companies builds, both firms and legislators
are searching for technical solutions to difficult platform governance puzzles such as hate …
are searching for technical solutions to difficult platform governance puzzles such as hate …
Quark: Controllable text generation with reinforced unlearning
Large-scale language models often learn behaviors that are misaligned with user
expectations. Generated text may contain offensive or toxic language, contain significant …
expectations. Generated text may contain offensive or toxic language, contain significant …
Weight poisoning attacks on pre-trained models
Recently, NLP has seen a surge in the usage of large pre-trained models. Users download
weights of models pre-trained on large datasets, then fine-tune the weights on a task of their …
weights of models pre-trained on large datasets, then fine-tune the weights on a task of their …
Mind the style of text! adversarial and backdoor attacks based on text style transfer
Adversarial attacks and backdoor attacks are two common security threats that hang over
deep learning. Both of them harness task-irrelevant features of data in their implementation …
deep learning. Both of them harness task-irrelevant features of data in their implementation …