- Academic Search

W Yin, A Zubiaga - PeerJ Computer Science, 2021 - peerj.com

Hate speech is one type of harmful online content which directly attacks or promotes hate
towards a group or an individual member based on their actual or perceived aspects of …

Gem Citer Citeret af 226 Relaterede artikler Alle 13 versioner Cached

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Handling bias in toxic speech detection: A survey

T Garg, S Masud, T Suresh, T Chakraborty - ACM Computing Surveys, 2023 - dl.acm.org

Detecting online toxicity has always been a challenge due to its inherent subjectivity. Factors
such as the context, geography, socio-political climate, and background of the producers …

Gem Citer Citeret af 93 Relaterede artikler Alle 5 versioner

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Xstest: A test suite for identifying exaggerated safety behaviours in large language models

P Röttger, HR Kirk, B Vidgen, G Attanasio… - arxiv preprint arxiv …, 2023 - arxiv.org

Without proper safeguards, large language models will readily follow malicious instructions
and generate toxic content. This risk motivates safety efforts such as red-teaming and large …

Gem Citer Citeret af 139 Relaterede artikler Alle 6 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] wiley.com

Five sources of bias in natural language processing

D Hovy, S Prabhumoye - Language and linguistics compass, 2021 - Wiley Online Library

Recently, there has been an increased interest in demographically grounded bias in natural
language processing (NLP) applications. Much of the recent work has focused on describing …

Gem Citer Citeret af 318 Relaterede artikler Alle 8 versioner

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Nationality bias in text generation

PN Venkit, S Gautam, R Panchanadikar… - arxiv preprint arxiv …, 2023 - arxiv.org

Little attention is placed on analyzing nationality bias in language models, especially when
nationality is highly used as a factor in increasing the performance of social NLP models …

Gem Citer Citeret af 107 Relaterede artikler Alle 5 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

HateCheck: Functional tests for hate speech detection models

P Röttger, B Vidgen, D Nguyen, Z Waseem… - arxiv preprint arxiv …, 2020 - arxiv.org

Detecting online hate is a difficult task that even state-of-the-art models struggle with.
Typically, hate speech detection models are evaluated by measuring their performance on …

Gem Citer Citeret af 271 Relaterede artikler Alle 8 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Learning from the worst: Dynamically generated datasets to improve online hate detection

B Vidgen, T Thrush, Z Waseem, D Kiela - arxiv preprint arxiv:2012.15761, 2020 - arxiv.org

We present a human-and-model-in-the-loop process for dynamically generating datasets
and training better performing and more robust hate detection models. We provide a new …

Gem Citer Citeret af 262 Relaterede artikler Alle 4 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] unibocconi.it

[PDF][PDF] HONEST: Measuring hurtful sentence completion in language models

D Nozza, F Bianchi, D Hovy - … of the 2021 conference of the …, 2021 - iris.unibocconi.it

Abstract Language models have revolutionized the field of NLP. However, language models
capture and proliferate hurtful stereotypes, especially in text generation. Our results show …

Gem Citer Citeret af 166 Relaterede artikler Alle 6 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] mit.edu

Hate speech classifiers learn normative social stereotypes

AM Davani, M Atari, B Kennedy… - Transactions of the …, 2023 - direct.mit.edu

Social stereotypes negatively impact individuals' judgments about different groups and may
have a critical role in understanding language directed toward marginalized groups. Here …

Gem Citer Citeret af 54 Relaterede artikler Alle 7 versioner

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey on gender bias in natural language processing

K Stanczak, I Augenstein - arxiv preprint arxiv:2112.14168, 2021 - arxiv.org

Language can be used as a means of reproducing and enforcing harmful stereotypes and
biases and has been analysed as such in numerous research. In this paper, we present a …

Gem Citer Citeret af 122 Relaterede artikler Alle 6 versioner Vis som HTML

Opret underretning

Citer

Avanceret søgning

Gemt i Min samling

Contextualizing hate speech classifiers with post-hoc explanation

Towards generalisable hate speech detection: a review on obstacles and solutions

Handling bias in toxic speech detection: A survey

Xstest: A test suite for identifying exaggerated safety behaviours in large language models

Five sources of bias in natural language processing

Nationality bias in text generation

HateCheck: Functional tests for hate speech detection models

Learning from the worst: Dynamically generated datasets to improve online hate detection

[PDF][PDF] HONEST: Measuring hurtful sentence completion in language models

Hate speech classifiers learn normative social stereotypes

A survey on gender bias in natural language processing