الباحث العلمي من Google

P Röttger, HR Kirk, B Vidgen, G Attanasio… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

Without proper safeguards, large language models will readily follow malicious instructions
and generate toxic content. This risk motivates safety efforts such as red-teaming and large …‏

حفظ اقتباس تم اقتباسها في عدد: 130 مقالات ذات صلة الإصدارات الـ 3كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

ROBBIE: Robust bias evaluation of large generative language models‏

D Esiobu, X Tan, S Hosseini, M Ung, Y Zhang… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

As generative large language models (LLMs) grow more performant and prevalent, we must
develop comprehensive enough tools to measure and improve their fairness. Different …‏

حفظ اقتباس تم اقتباسها في عدد: 43 مقالات ذات صلة الإصدارات الـ 4كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] sentic.net

Hate speech detection: A comprehensive review of recent works‏

A Gandhi, P Ahir, K Adhvaryu, P Shah… - Expert …, 2024‏ - Wiley Online Library‏

There has been surge in the usage of Internet as well as social media platforms which has
led to rise in online hate speech targeted on individual or group. In the recent years, hate …‏

حفظ اقتباس تم اقتباسها في عدد: 17 مقالات ذات صلة الإصدارات الـ 3كلها

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Recent advances in hate speech moderation: Multimodality and the role of large models‏

MS Hee, S Sharma, R Cao, P Nandi… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

In the evolving landscape of online communication, moderating hate speech (HS) presents
an intricate challenge, compounded by the multimodal nature of digital content. This …‏

حفظ اقتباس تم اقتباسها في عدد: 8 مقالات ذات صلة الإصدارات الـ 2كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Culturellm: Incorporating cultural differences into large language models‏

C Li, M Chen, J Wang, S Sitaram, X **e - arxiv preprint arxiv:2402.10946, 2024‏ - arxiv.org‏

Large language models (LLMs) are reported to be partial to certain cultures owing to the
training data dominance from the English corpora. Since multilingual cultural data are often …‏

حفظ اقتباس تم اقتباسها في عدد: 49 مقالات ذات صلة الإصدارات الـ 2كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Evaluating ChatGPT's performance for multilingual and emoji-based hate speech detection‏

M Das, SK Pandey, A Mukherjee - arxiv preprint arxiv:2305.13276, 2023‏ - arxiv.org‏

Hate speech is a severe issue that affects many online platforms. So far, several studies
have been performed to develop robust hate speech detection systems. Large language …‏

حفظ اقتباس تم اقتباسها في عدد: 20 مقالات ذات صلة الإصدارات الـ 2كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Validating multimedia content moderation software via semantic fusion‏

W Wang, J Huang, C Chen, J Gu, J Zhang… - Proceedings of the …, 2023‏ - dl.acm.org‏

The exponential growth of social media platforms, such as Facebook, Instagram, Youtube,
and TikTok, has revolutionized communication and content publication in human society …‏

حفظ اقتباس تم اقتباسها في عدد: 7 مقالات ذات صلة الإصدارات الـ 4كلها

[Free GPT-4]
[DeepSeek]

[PDF] aclanthology.org

Improving the Detection of Multilingual Online Attacks with Rich Social Media Data from Singapore‏

J Haber, B Vidgen, M Chapman… - Proceedings of the …, 2023‏ - aclanthology.org‏

Toxic content is a global problem, but most resources for detecting toxic content are in
English. When datasets are created in other languages, they often focus exclusively on one …‏

حفظ اقتباس تم اقتباسها في عدد: 6 مقالات ذات صلة الإصدارات الـ 3كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] aclanthology.org

Exploring Amharic hate speech data collection and classification approaches‏

AA Ayele, SM Yimam, TD Belay, T Asfaw… - Proceedings of the …, 2023‏ - aclanthology.org‏

In this paper, we present a study of efficient data selection and annotation strategies for
Amharic hate speech. We also build various classification models and investigate the …‏

حفظ اقتباس تم اقتباسها في عدد: 13 مقالات ذات صلة الإصدارات الـ 3كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Jailbreakhunter: a visual analytics approach for jailbreak prompts discovery from large-scale human-llm conversational datasets‏

Z **, S Liu, H Li, X Zhao, H Qu - arxiv preprint arxiv:2407.03045, 2024‏ - arxiv.org‏

Large Language Models (LLMs) have gained significant attention but also raised concerns
due to the risk of misuse. Jailbreak prompts, a popular type of adversarial attack towards …‏

حفظ اقتباس تم اقتباسها في عدد: 3 مقالات ذات صلة الإصدارات الـ 2كلها إصدار HTML‏

إنشاء تنبيه

اقتباس

بحث متقدم

تم حفظ المقالة في مكتبتي.

Multilingual HateCheck: Functional tests for multilingual hate speech detection models

Xstest: A test suite for identifying exaggerated safety behaviours in large language models‏

ROBBIE: Robust bias evaluation of large generative language models‏

Hate speech detection: A comprehensive review of recent works‏

Recent advances in hate speech moderation: Multimodality and the role of large models‏

Culturellm: Incorporating cultural differences into large language models‏

Evaluating ChatGPT's performance for multilingual and emoji-based hate speech detection‏

Validating multimedia content moderation software via semantic fusion‏

Improving the Detection of Multilingual Online Attacks with Rich Social Media Data from Singapore‏

Exploring Amharic hate speech data collection and classification approaches‏

Jailbreakhunter: a visual analytics approach for jailbreak prompts discovery from large-scale human-llm conversational datasets‏