Open-Ethical AI: Advancements in Open-Source Human-Centric Neural Language Models
This survey summarises the most recent methods for building and assessing helpful, honest,
and harmless neural language models, considering small, medium, and large-size models …
and harmless neural language models, considering small, medium, and large-size models …
Generative language models exhibit social identity biases
Social identity biases, particularly the tendency to favor one's own group (ingroup solidarity)
and derogate other groups (outgroup hostility), are deeply rooted in human psychology and …
and derogate other groups (outgroup hostility), are deeply rooted in human psychology and …
PerfCodeGen: Improving Performance of LLM Generated Code with Execution Feedback
Large Language Models (LLMs) are widely adopted for assisting in software development
tasks, yet their performance evaluations have narrowly focused on the functional correctness …
tasks, yet their performance evaluations have narrowly focused on the functional correctness …
Teaching models to balance resisting and accepting persuasion
Large language models (LLMs) are susceptible to persuasion, which can pose risks when
models are faced with an adversarial interlocutor. We take a first step towards defending …
models are faced with an adversarial interlocutor. We take a first step towards defending …
[HTML][HTML] Claude 2.0 large language model: Tackling a real-world classification problem with a new iterative prompt engineering approach
In the last year, Large Language Models (LLMs) have transformed the way of tackling
problems, opening up new perspectives in various works and research fields, due to their …
problems, opening up new perspectives in various works and research fields, due to their …
Antagonistic AI
The vast majority of discourse around AI development assumes that subservient," moral"
models aligned with" human values" are universally beneficial--in short, that good AI is …
models aligned with" human values" are universally beneficial--in short, that good AI is …
Sycophancy in Large Language Models: Causes and Mitigations
L Malmqvist - arxiv preprint arxiv:2411.15287, 2024 - arxiv.org
Large language models (LLMs) have demonstrated remarkable capabilities across a wide
range of natural language processing tasks. However, their tendency to exhibit sycophantic …
range of natural language processing tasks. However, their tendency to exhibit sycophantic …
Fostering Appropriate Reliance on Large Language Models: The Role of Explanations, Sources, and Inconsistencies
SSY Kim, JW Vaughan, QV Liao, T Lombrozo… - arxiv preprint arxiv …, 2025 - arxiv.org
Large language models (LLMs) can produce erroneous responses that sound fluent and
convincing, raising the risk that users will rely on these responses as if they were correct …
convincing, raising the risk that users will rely on these responses as if they were correct …
Prompt Leakage effect and mitigation strategies for multi-turn LLM Applications
Prompt leakage poses a compelling security and privacy threat in LLM applications.
Leakage of system prompts may compromise intellectual property, and act as adversarial …
Leakage of system prompts may compromise intellectual property, and act as adversarial …
Understanding the Effects of Iterative Prompting on Truthfulness
The development of Large Language Models (LLMs) has notably transformed numerous
sectors, offering impressive text generation capabilities. Yet, the reliability and truthfulness of …
sectors, offering impressive text generation capabilities. Yet, the reliability and truthfulness of …