Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Towards generalisable hate speech detection: a review on obstacles and solutions
Hate speech is one type of harmful online content which directly attacks or promotes hate
towards a group or an individual member based on their actual or perceived aspects of …
towards a group or an individual member based on their actual or perceived aspects of …
Handling bias in toxic speech detection: A survey
Detecting online toxicity has always been a challenge due to its inherent subjectivity. Factors
such as the context, geography, socio-political climate, and background of the producers …
such as the context, geography, socio-political climate, and background of the producers …
Xstest: A test suite for identifying exaggerated safety behaviours in large language models
Without proper safeguards, large language models will readily follow malicious instructions
and generate toxic content. This risk motivates safety efforts such as red-teaming and large …
and generate toxic content. This risk motivates safety efforts such as red-teaming and large …
Five sources of bias in natural language processing
Recently, there has been an increased interest in demographically grounded bias in natural
language processing (NLP) applications. Much of the recent work has focused on describing …
language processing (NLP) applications. Much of the recent work has focused on describing …
Nationality bias in text generation
Little attention is placed on analyzing nationality bias in language models, especially when
nationality is highly used as a factor in increasing the performance of social NLP models …
nationality is highly used as a factor in increasing the performance of social NLP models …
HateCheck: Functional tests for hate speech detection models
Detecting online hate is a difficult task that even state-of-the-art models struggle with.
Typically, hate speech detection models are evaluated by measuring their performance on …
Typically, hate speech detection models are evaluated by measuring their performance on …
Learning from the worst: Dynamically generated datasets to improve online hate detection
We present a human-and-model-in-the-loop process for dynamically generating datasets
and training better performing and more robust hate detection models. We provide a new …
and training better performing and more robust hate detection models. We provide a new …
[PDF][PDF] HONEST: Measuring hurtful sentence completion in language models
Abstract Language models have revolutionized the field of NLP. However, language models
capture and proliferate hurtful stereotypes, especially in text generation. Our results show …
capture and proliferate hurtful stereotypes, especially in text generation. Our results show …
Hate speech classifiers learn normative social stereotypes
Social stereotypes negatively impact individuals' judgments about different groups and may
have a critical role in understanding language directed toward marginalized groups. Here …
have a critical role in understanding language directed toward marginalized groups. Here …
A survey on gender bias in natural language processing
Language can be used as a means of reproducing and enforcing harmful stereotypes and
biases and has been analysed as such in numerous research. In this paper, we present a …
biases and has been analysed as such in numerous research. In this paper, we present a …