A human-centered systematic literature review of cyberbullying detection algorithms

S Kim, A Razi, G Stringhini, PJ Wisniewski… - Proceedings of the …, 2021 - dl.acm.org
Cyberbullying is a growing problem across social media platforms, inflicting short and long-
lasting effects on victims. To mitigate this problem, research has looked into building …

Identifying and mitigating vulnerabilities in llm-integrated applications

F Jiang - 2024 - search.proquest.com
Large language models (LLMs) are increasingly deployed as the backend for various
applications, including code completion tools and AI-powered search engines. Unlike …

Holistic evaluation of language models

P Liang, R Bommasani, T Lee, D Tsipras… - arxiv preprint arxiv …, 2022 - arxiv.org
Language models (LMs) are becoming the foundation for almost all major language
technologies, but their capabilities, limitations, and risks are not well understood. We present …

Exploiting programmatic behavior of llms: Dual-use through standard security attacks

D Kang, X Li, I Stoica, C Guestrin… - 2024 IEEE Security …, 2024 - ieeexplore.ieee.org
Recent advances in instruction-following large language models (LLMs) have led to
dramatic improvements in a range of NLP tasks. Unfortunately, we find that the same …

A holistic approach to undesired content detection in the real world

T Markov, C Zhang, S Agarwal, FE Nekoul… - Proceedings of the …, 2023 - ojs.aaai.org
We present a holistic approach to building a robust and useful natural language
classification system for real-world content moderation. The success of such a system relies …

Persistent interaction patterns across social media platforms and over time

M Avalle, N Di Marco, G Etta, E Sangiorgio, S Alipour… - Nature, 2024 - nature.com
Growing concern surrounds the impact of social media platforms on public discourse,,–and
their influence on social dynamics,,,–, especially in the context of toxicity,–. Here, to better …

Self-diagnosis and self-debiasing: A proposal for reducing corpus-based bias in nlp

T Schick, S Udupa, H Schütze - Transactions of the Association for …, 2021 - direct.mit.edu
Abstract⚠ This paper contains prompts and model outputs that are offensive in nature. When
trained on large, unfiltered crawls from the Internet, language models pick up and reproduce …

Toxicchat: Unveiling hidden challenges of toxicity detection in real-world user-ai conversation

Z Lin, Z Wang, Y Tong, Y Wang, Y Guo, Y Wang… - arxiv preprint arxiv …, 2023 - arxiv.org
Despite remarkable advances that large language models have achieved in chatbots,
maintaining a non-toxic user-AI interactive environment has become increasingly critical …

Large language model supply chain: A research agenda

S Wang, Y Zhao, X Hou, H Wang - ACM Transactions on Software …, 2024 - dl.acm.org
The rapid advancement of large language models (LLMs) has revolutionized artificial
intelligence, introducing unprecedented capabilities in natural language processing and …

Dices dataset: Diversity in conversational ai evaluation for safety

L Aroyo, A Taylor, M Diaz, C Homan… - Advances in …, 2023 - proceedings.neurips.cc
Abstract Machine learning approaches often require training and evaluation datasets with a
clear separation between positive and negative examples. This requirement overly …