Ai alignment: A comprehensive survey

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - ar**… - Advances in Neural …, 2023 - proceedings.neurips.cc
Instruction tuning is an effective technique to align large language models (LLMs) with
human intent. In this work, we investigate how an adversary can exploit instruction tuning by …

Artificial intelligence (AI) cybersecurity dimensions: a comprehensive framework for understanding adversarial and offensive AI

M Malatji, A Tolah - AI and Ethics, 2024 - Springer
Abstract As Artificial Intelligence (AI) rapidly advances and integrates into various domains,
cybersecurity emerges as a critical field grappling with both the benefits and pitfalls of AI …

Llm self defense: By self examination, llms know they are being tricked

M Phute, A Helbling, M Hull, SY Peng, S Szyller… - arxiv preprint arxiv …, 2023 - arxiv.org
Large language models (LLMs) are popular for high-quality text generation but can produce
harmful content, even when aligned with human values through reinforcement learning …

[PDF][PDF] AI-driven threat detection and response: A paradigm shift in cybersecurity

A Yaseen - International Journal of Information and Cybersecurity, 2023 - researchgate.net
The research paper delves into the transformative role of artificial intelligence (AI) in
revolutionizing cybersecurity. This study examines the historical context and evolution of AI …

Who wrote this code? watermarking for code generation

T Lee, S Hong, J Ahn, I Hong, H Lee, S Yun… - arxiv preprint arxiv …, 2023 - arxiv.org
Since the remarkable generation performance of large language models raised ethical and
legal concerns, approaches to detect machine-generated text by embedding watermarks are …

Deepfakes, misinformation, and disinformation in the era of frontier AI, generative AI, and large AI models

MR Shoaib, Z Wang, MT Ahvanooey… - … on Computer and …, 2023 - ieeexplore.ieee.org
With the advent of sophisticated artificial intelligence (AI) technologies, the proliferation of
deepfakes and the spread of m/disinformation have emerged as formidable threats to the …