Xstest: A test suite for identifying exaggerated safety behaviours in large language models
Without proper safeguards, large language models will readily follow malicious instructions
and generate toxic content. This risk motivates safety efforts such as red-teaming and large …
and generate toxic content. This risk motivates safety efforts such as red-teaming and large …
ROBBIE: Robust bias evaluation of large generative language models
As generative large language models (LLMs) grow more performant and prevalent, we must
develop comprehensive enough tools to measure and improve their fairness. Different …
develop comprehensive enough tools to measure and improve their fairness. Different …
Hate speech detection: A comprehensive review of recent works
There has been surge in the usage of Internet as well as social media platforms which has
led to rise in online hate speech targeted on individual or group. In the recent years, hate …
led to rise in online hate speech targeted on individual or group. In the recent years, hate …
Recent advances in hate speech moderation: Multimodality and the role of large models
In the evolving landscape of online communication, moderating hate speech (HS) presents
an intricate challenge, compounded by the multimodal nature of digital content. This …
an intricate challenge, compounded by the multimodal nature of digital content. This …
Culturellm: Incorporating cultural differences into large language models
Large language models (LLMs) are reported to be partial to certain cultures owing to the
training data dominance from the English corpora. Since multilingual cultural data are often …
training data dominance from the English corpora. Since multilingual cultural data are often …
Evaluating ChatGPT's performance for multilingual and emoji-based hate speech detection
Hate speech is a severe issue that affects many online platforms. So far, several studies
have been performed to develop robust hate speech detection systems. Large language …
have been performed to develop robust hate speech detection systems. Large language …
Validating multimedia content moderation software via semantic fusion
The exponential growth of social media platforms, such as Facebook, Instagram, Youtube,
and TikTok, has revolutionized communication and content publication in human society …
and TikTok, has revolutionized communication and content publication in human society …
Improving the Detection of Multilingual Online Attacks with Rich Social Media Data from Singapore
Toxic content is a global problem, but most resources for detecting toxic content are in
English. When datasets are created in other languages, they often focus exclusively on one …
English. When datasets are created in other languages, they often focus exclusively on one …
Exploring Amharic hate speech data collection and classification approaches
In this paper, we present a study of efficient data selection and annotation strategies for
Amharic hate speech. We also build various classification models and investigate the …
Amharic hate speech. We also build various classification models and investigate the …
Jailbreakhunter: a visual analytics approach for jailbreak prompts discovery from large-scale human-llm conversational datasets
Large Language Models (LLMs) have gained significant attention but also raised concerns
due to the risk of misuse. Jailbreak prompts, a popular type of adversarial attack towards …
due to the risk of misuse. Jailbreak prompts, a popular type of adversarial attack towards …