A survey on fairness in large language models
Large language models (LLMs) have shown powerful performance and development
prospect and are widely deployed in the real world. However, LLMs can capture social …
prospect and are widely deployed in the real world. However, LLMs can capture social …
Handling bias in toxic speech detection: A survey
Detecting online toxicity has always been a challenge due to its inherent subjectivity. Factors
such as the context, geography, socio-political climate, and background of the producers …
such as the context, geography, socio-political climate, and background of the producers …
Toxigen: A large-scale machine-generated dataset for adversarial and implicit hate speech detection
Toxic language detection systems often falsely flag text that contains minority group
mentions as toxic, as those groups are often the targets of online hate. Such over-reliance …
mentions as toxic, as those groups are often the targets of online hate. Such over-reliance …
On the opportunities and risks of foundation models
AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …
Dealing with disagreements: Looking beyond the majority vote in subjective annotations
Majority voting and averaging are common approaches used to resolve annotator
disagreements and derive single ground truth labels from multiple annotations. However …
disagreements and derive single ground truth labels from multiple annotations. However …
Challenges in detoxifying language models
Large language models (LM) generate remarkably fluent text and can be efficiently adapted
across NLP tasks. Measuring and guaranteeing the quality of generated text in terms of …
across NLP tasks. Measuring and guaranteeing the quality of generated text in terms of …
Prosocialdialog: A prosocial backbone for conversational agents
Most existing dialogue systems fail to respond properly to potentially unsafe user utterances
by either ignoring or passively agreeing with them. To address this issue, we introduce …
by either ignoring or passively agreeing with them. To address this issue, we introduce …
MOMENTA: A multimodal framework for detecting harmful memes and their targets
Internet memes have become powerful means to transmit political, psychological, and socio-
cultural ideas. Although memes are typically humorous, recent days have witnessed an …
cultural ideas. Although memes are typically humorous, recent days have witnessed an …
Having beer after prayer? measuring cultural bias in large language models
As the reach of large language models (LMs) expands globally, their ability to cater to
diverse cultural contexts becomes crucial. Despite advancements in multilingual …
diverse cultural contexts becomes crucial. Despite advancements in multilingual …
[PDF][PDF] SafetyKit: First aid for measuring safety in open-domain conversational systems
The social impact of natural language processing and its applications has received
increasing attention. In this position paper, we focus on the problem of safety for end-to-end …
increasing attention. In this position paper, we focus on the problem of safety for end-to-end …