Red-Teaming for generative AI: Silver bullet or security theater?
In response to rising concerns surrounding the safety, security, and trustworthiness of
Generative AI (GenAI) models, practitioners and regulators alike have pointed to AI red …
Generative AI (GenAI) models, practitioners and regulators alike have pointed to AI red …
Zero shot VLMs for hate meme detection: Are we there yet?
Multimedia content on social media is rapidly evolving, with memes gaining prominence as
a distinctive form. Unfortunately, some malicious users exploit memes to target individuals or …
a distinctive form. Unfortunately, some malicious users exploit memes to target individuals or …
Hate Speech Detection using Large Language Models: A Comprehensive Review
The widespread use of social media and other online platforms has facilitated
unprecedented communication and information exchange. However, it has also led to the …
unprecedented communication and information exchange. However, it has also led to the …
Mmidr: Teaching large language model to interpret multimodal misinformation via knowledge distillation
Automatic detection of multimodal misinformation has gained a widespread attention
recently. However, the potential of powerful Large Language Models (LLMs) for multimodal …
recently. However, the potential of powerful Large Language Models (LLMs) for multimodal …
Dell: Generating reactions and explanations for llm-based misinformation detection
Large language models are limited by challenges in factuality and hallucinations to be
directly employed off-the-shelf for judging the veracity of news articles, where factual …
directly employed off-the-shelf for judging the veracity of news articles, where factual …
Hate Personified: Investigating the role of LLMs in content moderation
For subjective tasks such as hate detection, where people perceive hate differently, the
Large Language Model's (LLM) ability to represent diverse groups is unclear. By including …
Large Language Model's (LLM) ability to represent diverse groups is unclear. By including …
DESTEIN: Navigating Detoxification of Language Models via Universal Steering Pairs and Head-wise Activation Fusion
Y Li, H Jiang, C Gong, Z Wei - arxiv preprint arxiv:2404.10464, 2024 - arxiv.org
Despite the remarkable achievements of language models (LMs) across a broad spectrum
of tasks, their propensity for generating toxic outputs remains a prevalent concern. Current …
of tasks, their propensity for generating toxic outputs remains a prevalent concern. Current …
A Survey on Online Aggression: Content Detection and Behavioural Analysis on Social Media Platforms
The proliferation of social media has increased cyber-aggressive behavior behind the
freedom of speech, posing societal risks from online anonymity to real-world consequences …
freedom of speech, posing societal risks from online anonymity to real-world consequences …
What Does the Bot Say? Opportunities and Risks of Large Language Models in Social Media Bot Detection
Social media bot detection has always been an arms race between advancements in
machine learning bot detectors and adversarial bot strategies to evade detection. In this …
machine learning bot detectors and adversarial bot strategies to evade detection. In this …
Decoding Hate: Exploring Language Models' Reactions to Hate Speech
Hate speech is a harmful form of online expression, often manifesting as derogatory posts. It
is a significant risk in digital environments. With the rise of Large Language Models (LLMs) …
is a significant risk in digital environments. With the rise of Large Language Models (LLMs) …