The rise and potential of large language model based agents: A survey

Z **, W Chen, X Guo, W He, Y Ding, B Hong… - Science China …, 2025 - Springer
For a long time, researchers have sought artificial intelligence (AI) that matches or exceeds
human intelligence. AI agents, which are artificial entities capable of sensing the …

Challenges and applications of large language models

J Kaddour, J Harris, M Mozes, H Bradley… - arxiv preprint arxiv …, 2023 - arxiv.org
Large Language Models (LLMs) went from non-existent to ubiquitous in the machine
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …

Are aligned neural networks adversarially aligned?

N Carlini, M Nasr… - Advances in …, 2023 - proceedings.neurips.cc
Large language models are now tuned to align with the goals of their creators, namely to be"
helpful and harmless." These models should respond helpfully to user questions, but refuse …

Regulating ChatGPT and other large generative AI models

P Hacker, A Engel, M Mauer - Proceedings of the 2023 ACM conference …, 2023 - dl.acm.org
Large generative AI models (LGAIMs), such as ChatGPT, GPT-4 or Stable Diffusion, are
rapidly transforming the way we communicate, illustrate, and create. However, AI regulation …

Towards automated circuit discovery for mechanistic interpretability

A Conmy, A Mavor-Parker, A Lynch… - Advances in …, 2023 - proceedings.neurips.cc
Through considerable effort and intuition, several recent works have reverse-engineered
nontrivial behaviors oftransformer models. This paper systematizes the mechanistic …

Artificial Intelligence Trust, risk and security management (AI trism): Frameworks, applications, challenges and future research directions

A Habbal, MK Ali, MA Abuzaraida - Expert Systems with Applications, 2024 - Elsevier
Artificial Intelligence (AI) has become pervasive, enabling transformative advancements in
various industries including smart city, smart healthcare, smart manufacturing, smart virtual …

The stable signature: Rooting watermarks in latent diffusion models

P Fernandez, G Couairon, H Jégou… - Proceedings of the …, 2023 - openaccess.thecvf.com
Generative image modeling enables a wide range of applications but raises ethical
concerns about responsible deployment. This paper introduces an active strategy combining …

Ethical principles for artificial intelligence in education

A Nguyen, HN Ngo, Y Hong, B Dang… - Education and …, 2023 - Springer
The advancement of artificial intelligence in education (AIED) has the potential to transform
the educational landscape and influence the role of all involved stakeholders. In recent …

Harmbench: A standardized evaluation framework for automated red teaming and robust refusal

M Mazeika, L Phan, X Yin, A Zou, Z Wang, N Mu… - arxiv preprint arxiv …, 2024 - arxiv.org
Automated red teaming holds substantial promise for uncovering and mitigating the risks
associated with the malicious use of large language models (LLMs), yet the field lacks a …

Foundational challenges in assuring alignment and safety of large language models

U Anwar, A Saparov, J Rando, D Paleka… - arxiv preprint arxiv …, 2024 - arxiv.org
This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …