Generative AI agents in autonomous machines: A safety perspective

J Jabbour, VJ Reddi - arxiv preprint arxiv:2410.15489, 2024 - arxiv.org
The integration of Generative Artificial Intelligence (AI) into autonomous machines
represents a major paradigm shift in how these systems operate and unlocks new solutions …

Knowledge-guided prompt-based continual learning: Aligning task-prompts through contrastive hard negatives

H Lu, L Lin, C Fan, C Wang, W Fang, X Wu - Knowledge-Based Systems, 2025 - Elsevier
Continual Learning aims to empower a single model to continually adapt to novel
environments and perform new tasks while retaining previous knowledge without …

Compromising Honesty and Harmlessness in Language Models via Deception Attacks

LÃĻ Vaugrante, F Carlon, M Menke… - arxiv preprint arxiv …, 2025 - arxiv.org
Recent research on large language models (LLMs) has demonstrated their ability to
understand and employ deceptive behavior, even without explicit prompting. However, such …

Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond

S Han - arxiv preprint arxiv:2410.18114, 2024 - arxiv.org
The advancements in generative AI inevitably raise concerns about their risks and safety
implications, which, in return, catalyzes significant progress in AI safety. However, as this …

" Moralized" Multi-Step Jailbreak Prompts: Black-Box Testing of Guardrails in Large Language Models for Verbal Attacks

L Wang - arxiv preprint arxiv:2411.16730, 2024 - arxiv.org
As the application of large language models continues to expand in various fields, it poses
higher challenges to the effectiveness of identifying harmful content generation and …

Gen-AI for User Safety: A Survey

AP Desai, T Ravi, M Luqman, M Sharma… - … Conference on Big …, 2024 - ieeexplore.ieee.org
In this manuscript, we provide a comprehensive overview of the various work done while
using Gen-AI techniques wrt user safety. In particular, we first provide the various domains …

A Formal Framework for Assessing and Mitigating Emergent Security Risks in Generative AI Models: Bridging Theory and Dynamic Risk Mitigation

A Srivastava, S Panda - arxiv preprint arxiv:2410.13897, 2024 - arxiv.org
As generative AI systems, including large language models (LLMs) and diffusion models,
advance rapidly, their growing adoption has led to new and complex security risks often …

FLAME: Flexible LLM-Assisted Moderation Engine

I Bakulin, I Kopanichuk, I Bespalov… - arxiv preprint arxiv …, 2025 - arxiv.org
The rapid advancement of Large Language Models (LLMs) has introduced significant
challenges in moderating user-model interactions. While LLMs demonstrate remarkable …

Securing Retrieval-Augmented Generation Pipelines: A Comprehensive Framework

S Nandagopal - Journal of Computer Science and Technology Studies, 2025 - neliti.com
Abstract Retrieval-Augmented Generation (RAG) has significantly enhanced the capabilities
of Large Language Models (LLMs) by enabling them to access and incorporate external …