- Academic Search

M Feffer, A Sinha, WH Deng, ZC Lipton… - Proceedings of the AAAI …, 2024 - ojs.aaai.org

In response to rising concerns surrounding the safety, security, and trustworthiness of
Generative AI (GenAI) models, practitioners and regulators alike have pointed to AI red …

Save Cite Cited by 41 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Zero shot VLMs for hate meme detection: Are we there yet?

N Rizwan, P Bhaskar, M Das, SS Majhi, P Saha… - arxiv preprint arxiv …, 2024 - arxiv.org

Multimedia content on social media is rapidly evolving, with memes gaining prominence as
a distinctive form. Unfortunately, some malicious users exploit memes to target individuals or …

Save Cite Cited by 5 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] ieee.org

Hate Speech Detection using Large Language Models: A Comprehensive Review

A Albladi, M Islam, A Das, M Bigonah, Z Zhang… - IEEE …, 2025 - ieeexplore.ieee.org

The widespread use of social media and other online platforms has facilitated
unprecedented communication and information exchange. However, it has also led to the …

Save Cite Cited by 1 Related articles

[Free GPT-4]

[PDF] arxiv.org

Mmidr: Teaching large language model to interpret multimodal misinformation via knowledge distillation

L Wang, X Xu, L Zhang, J Lu, Y Xu, H Xu… - arxiv preprint arxiv …, 2024 - arxiv.org

Automatic detection of multimodal misinformation has gained a widespread attention
recently. However, the potential of powerful Large Language Models (LLMs) for multimodal …

Save Cite Cited by 5 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Dell: Generating reactions and explanations for llm-based misinformation detection

H Wan, S Feng, Z Tan, H Wang, Y Tsvetkov… - arxiv preprint arxiv …, 2024 - arxiv.org

Large language models are limited by challenges in factuality and hallucinations to be
directly employed off-the-shelf for judging the veracity of news articles, where factual …

Save Cite Cited by 22 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Hate Personified: Investigating the role of LLMs in content moderation

S Masud, S Singh, V Hangya, A Fraser… - arxiv preprint arxiv …, 2024 - arxiv.org

For subjective tasks such as hate detection, where people perceive hate differently, the
Large Language Model's (LLM) ability to represent diverse groups is unclear. By including …

Save Cite Cited by 2 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

DESTEIN: Navigating Detoxification of Language Models via Universal Steering Pairs and Head-wise Activation Fusion

Y Li, H Jiang, C Gong, Z Wei - arxiv preprint arxiv:2404.10464, 2024 - arxiv.org

Despite the remarkable achievements of language models (LMs) across a broad spectrum
of tasks, their propensity for generating toxic outputs remains a prevalent concern. Current …

Save Cite Cited by 2 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] acm.org

A Survey on Online Aggression: Content Detection and Behavioural Analysis on Social Media Platforms

S Mane, S Kundu, R Sharma - ACM Computing Surveys, 2023 - dl.acm.org

The proliferation of social media has increased cyber-aggressive behavior behind the
freedom of speech, posing societal risks from online anonymity to real-world consequences …

Save Cite Cited by 2 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

What Does the Bot Say? Opportunities and Risks of Large Language Models in Social Media Bot Detection

S Feng, H Wan, N Wang, Z Tan, M Luo… - arxiv preprint arxiv …, 2024 - arxiv.org

Social media bot detection has always been an arms race between advancements in
machine learning bot detectors and adversarial bot strategies to evade detection. In this …

Save Cite Cited by 9 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Decoding Hate: Exploring Language Models' Reactions to Hate Speech

P Piot, J Parapar - arxiv preprint arxiv:2410.00775, 2024 - arxiv.org

Hate speech is a harmful form of online expression, often manifesting as derogatory posts. It
is a significant risk in digital environments. With the rise of Large Language Models (LLMs) …

Save Cite Cited by 1 Related articles All 2 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

Probing LLMs for hate speech detection: strengths and vulnerabilities

Red-Teaming for generative AI: Silver bullet or security theater?

Zero shot VLMs for hate meme detection: Are we there yet?

Hate Speech Detection using Large Language Models: A Comprehensive Review

Mmidr: Teaching large language model to interpret multimodal misinformation via knowledge distillation

Dell: Generating reactions and explanations for llm-based misinformation detection

Hate Personified: Investigating the role of LLMs in content moderation

DESTEIN: Navigating Detoxification of Language Models via Universal Steering Pairs and Head-wise Activation Fusion

A Survey on Online Aggression: Content Detection and Behavioural Analysis on Social Media Platforms

What Does the Bot Say? Opportunities and Risks of Large Language Models in Social Media Bot Detection

Decoding Hate: Exploring Language Models' Reactions to Hate Speech