Open-Ethical AI: Advancements in Open-Source Human-Centric Neural Language Models

S Sicari, JF Cevallos M, A Rizzardi… - ACM Computing …, 2024 - dl.acm.org
This survey summarises the most recent methods for building and assessing helpful, honest,
and harmless neural language models, considering small, medium, and large-size models …

Generative language models exhibit social identity biases

T Hu, Y Kyrychenko, S Rathje, N Collier… - Nature Computational …, 2025 - nature.com
Social identity biases, particularly the tendency to favor one's own group (ingroup solidarity)
and derogate other groups (outgroup hostility), are deeply rooted in human psychology and …

PerfCodeGen: Improving Performance of LLM Generated Code with Execution Feedback

Y Peng, AD Gotmare, M Lyu, C **ong… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) are widely adopted for assisting in software development
tasks, yet their performance evaluations have narrowly focused on the functional correctness …

Teaching models to balance resisting and accepting persuasion

E Stengel-Eskin, P Hase, M Bansal - arxiv preprint arxiv:2410.14596, 2024 - arxiv.org
Large language models (LLMs) are susceptible to persuasion, which can pose risks when
models are faced with an adversarial interlocutor. We take a first step towards defending …

[HTML][HTML] Claude 2.0 large language model: Tackling a real-world classification problem with a new iterative prompt engineering approach

L Caruccio, S Cirillo, G Polese, G Solimando… - Intelligent Systems with …, 2024 - Elsevier
In the last year, Large Language Models (LLMs) have transformed the way of tackling
problems, opening up new perspectives in various works and research fields, due to their …

Antagonistic AI

A Cai, I Arawjo, EL Glassman - arxiv preprint arxiv:2402.07350, 2024 - arxiv.org
The vast majority of discourse around AI development assumes that subservient," moral"
models aligned with" human values" are universally beneficial--in short, that good AI is …

Sycophancy in Large Language Models: Causes and Mitigations

L Malmqvist - arxiv preprint arxiv:2411.15287, 2024 - arxiv.org
Large language models (LLMs) have demonstrated remarkable capabilities across a wide
range of natural language processing tasks. However, their tendency to exhibit sycophantic …

Fostering Appropriate Reliance on Large Language Models: The Role of Explanations, Sources, and Inconsistencies

SSY Kim, JW Vaughan, QV Liao, T Lombrozo… - arxiv preprint arxiv …, 2025 - arxiv.org
Large language models (LLMs) can produce erroneous responses that sound fluent and
convincing, raising the risk that users will rely on these responses as if they were correct …

Prompt Leakage effect and mitigation strategies for multi-turn LLM Applications

D Agarwal, AR Fabbri, B Risher, P Laban… - Proceedings of the …, 2024 - aclanthology.org
Prompt leakage poses a compelling security and privacy threat in LLM applications.
Leakage of system prompts may compromise intellectual property, and act as adversarial …

Understanding the Effects of Iterative Prompting on Truthfulness

S Krishna, C Agarwal, H Lakkaraju - arxiv preprint arxiv:2402.06625, 2024 - arxiv.org
The development of Large Language Models (LLMs) has notably transformed numerous
sectors, offering impressive text generation capabilities. Yet, the reliability and truthfulness of …