Towards bidirectional human-ai alignment: A systematic review for clarifications, framework, and future directions

H Shen, T Knearem, R Ghosh, K Alkiek… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent advancements in general-purpose AI have highlighted the importance of guiding AI
systems towards the intended goals, ethical principles, and values of individuals and …

Large language model alignment: A survey

T Shen, R **, Y Huang, C Liu, W Dong, Z Guo… - arxiv preprint arxiv …, 2023 - arxiv.org
Recent years have witnessed remarkable progress made in large language models (LLMs).
Such advancements, while garnering significant attention, have concurrently elicited various …

Language-Models-as-a-Service: Overview of a new paradigm and its challenges

E La Malfa, A Petrov, S Frieder, C Weinhuber… - Journal of Artificial …, 2024 - jair.org
Some of the most powerful language models currently are proprietary systems, accessible
only via (typically restrictive) web or software programming interfaces. This is the …

A survey on knowledge distillation of large language models

X Xu, M Li, C Tao, T Shen, R Cheng, J Li, C Xu… - arxiv preprint arxiv …, 2024 - arxiv.org
In the era of Large Language Models (LLMs), Knowledge Distillation (KD) emerges as a
pivotal methodology for transferring advanced capabilities from leading proprietary LLMs …

Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback

HR Kirk, B Vidgen, P Röttger, SA Hale - arxiv preprint arxiv:2303.05453, 2023 - arxiv.org
Large language models (LLMs) are used to generate content for a wide range of tasks, and
are set to reach a growing audience in coming years due to integration in product interfaces …

Knowledge of cultural moral norms in large language models

A Ramezani, Y Xu - arxiv preprint arxiv:2306.01857, 2023 - arxiv.org
Moral norms vary across cultures. A recent line of work suggests that English large language
models contain human-like moral biases, but these studies typically do not examine moral …

Aligning large language models through synthetic feedback

S Kim, S Bae, J Shin, S Kang, D Kwak, KM Yoo… - arxiv preprint arxiv …, 2023 - arxiv.org
Aligning large language models (LLMs) to human values has become increasingly
important as it enables sophisticated steering of LLMs. However, it requires significant …

Interactive natural language processing

Z Wang, G Zhang, K Yang, N Shi, W Zhou… - arxiv preprint arxiv …, 2023 - arxiv.org
Interactive Natural Language Processing (iNLP) has emerged as a novel paradigm within
the field of NLP, aimed at addressing limitations in existing frameworks while aligning with …

Prp: Propagating universal perturbations to attack large language model guard-rails

N Mangaokar, A Hooda, J Choi… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) are typically aligned to be harmless to humans.
Unfortunately, recent work has shown that such models are susceptible to automated …

Ethical reasoning over moral alignment: A case and framework for in-context ethical policies in LLMs

A Rao, A Khandelwal, K Tanmay, U Agarwal… - arxiv preprint arxiv …, 2023 - arxiv.org
In this position paper, we argue that instead of morally aligning LLMs to specific set of ethical
principles, we should infuse generic ethical reasoning capabilities into them so that they can …