Google Academic

J Jabbour, VJ Reddi - arxiv preprint arxiv:2410.15489, 2024 - arxiv.org

The integration of Generative Artificial Intelligence (AI) into autonomous machines
represents a major paradigm shift in how these systems operate and unlocks new solutions …

Salvați Citați Citat de 1 ori Articole cu conținut similar Toate cele 2 versiuni Afișare ca HTML

Knowledge-guided prompt-based continual learning: Aligning task-prompts through contrastive hard negatives

H Lu, L Lin, C Fan, C Wang, W Fang, X Wu - Knowledge-Based Systems, 2025 - Elsevier

Continual Learning aims to empower a single model to continually adapt to novel
environments and perform new tasks while retaining previous knowledge without …

Salvați Citați Articole cu conținut similar Toate cele 2 versiuni

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Compromising Honesty and Harmlessness in Language Models via Deception Attacks

LÃĻ Vaugrante, F Carlon, M Menke… - arxiv preprint arxiv …, 2025 - arxiv.org

Recent research on large language models (LLMs) has demonstrated their ability to
understand and employ deceptive behavior, even without explicit prompting. However, such …

Salvați Citați Articole cu conținut similar Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond

S Han - arxiv preprint arxiv:2410.18114, 2024 - arxiv.org

The advancements in generative AI inevitably raise concerns about their risks and safety
implications, which, in return, catalyzes significant progress in AI safety. However, as this …

Salvați Citați Articole cu conținut similar Toate cele 4 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

" Moralized" Multi-Step Jailbreak Prompts: Black-Box Testing of Guardrails in Large Language Models for Verbal Attacks

L Wang - arxiv preprint arxiv:2411.16730, 2024 - arxiv.org

As the application of large language models continues to expand in various fields, it poses
higher challenges to the effectiveness of identifying harmful content generation and …

Salvați Citați Articole cu conținut similar Toate cele 3 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Gen-AI for User Safety: A Survey

AP Desai, T Ravi, M Luqman, M Sharma… - … Conference on Big …, 2024 - ieeexplore.ieee.org

In this manuscript, we provide a comprehensive overview of the various work done while
using Gen-AI techniques wrt user safety. In particular, we first provide the various domains …

Salvați Citați Articole cu conținut similar Toate cele 3 versiuni

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A Formal Framework for Assessing and Mitigating Emergent Security Risks in Generative AI Models: Bridging Theory and Dynamic Risk Mitigation

A Srivastava, S Panda - arxiv preprint arxiv:2410.13897, 2024 - arxiv.org

As generative AI systems, including large language models (LLMs) and diffusion models,
advance rapidly, their growing adoption has led to new and complex security risks often …

Salvați Citați Articole cu conținut similar Toate cele 3 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

FLAME: Flexible LLM-Assisted Moderation Engine

I Bakulin, I Kopanichuk, I Bespalov… - arxiv preprint arxiv …, 2025 - arxiv.org

The rapid advancement of Large Language Models (LLMs) has introduced significant
challenges in moderating user-model interactions. While LLMs demonstrate remarkable …

Salvați Citați Articole cu conținut similar Afișare ca HTML

Securing Retrieval-Augmented Generation Pipelines: A Comprehensive Framework

S Nandagopal - Journal of Computer Science and Technology Studies, 2025 - neliti.com

Abstract Retrieval-Augmented Generation (RAG) has significantly enhanced the capabilities
of Large Language Models (LLMs) by enabling them to access and incorporate external …

Salvați Citați Articole cu conținut similar Afișare ca HTML

Creează alerta

Citați

Căutare avansată

Salvat în Bibliotecă

Ai safety in generative ai large language models: A survey

Generative AI agents in autonomous machines: A safety perspective

Knowledge-guided prompt-based continual learning: Aligning task-prompts through contrastive hard negatives

Compromising Honesty and Harmlessness in Language Models via Deception Attacks

Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond

" Moralized" Multi-Step Jailbreak Prompts: Black-Box Testing of Guardrails in Large Language Models for Verbal Attacks

Gen-AI for User Safety: A Survey

A Formal Framework for Assessing and Mitigating Emergent Security Risks in Generative AI Models: Bridging Theory and Dynamic Risk Mitigation

FLAME: Flexible LLM-Assisted Moderation Engine

Securing Retrieval-Augmented Generation Pipelines: A Comprehensive Framework