Survey on factuality in large language models: Knowledge, retrieval and domain-specificity

C Wang, X Liu, Y Yue, X Tang, T Zhang… - arxiv preprint arxiv …, 2023 - arxiv.org
This survey addresses the crucial issue of factuality in Large Language Models (LLMs). As
LLMs find applications across diverse domains, the reliability and accuracy of their outputs …

Ai agents under threat: A survey of key security challenges and future pathways

Z Deng, Y Guo, C Han, W Ma, J **ong, S Wen… - ACM Computing …, 2024 - dl.acm.org
An Artificial Intelligence (AI) agent is a software entity that autonomously performs tasks or
makes decisions based on pre-defined objectives and data inputs. AI agents, capable of …

Siren's song in the AI ocean: a survey on hallucination in large language models

Y Zhang, Y Li, L Cui, D Cai, L Liu, T Fu… - arxiv preprint arxiv …, 2023 - arxiv.org
While large language models (LLMs) have demonstrated remarkable capabilities across a
range of downstream tasks, a significant concern revolves around their propensity to exhibit …

Wizardlm: Empowering large language models to follow complex instructions

C Xu, Q Sun, K Zheng, X Geng, P Zhao, J Feng… - arxiv preprint arxiv …, 2023 - arxiv.org
Training large language models (LLMs) with open-domain instruction following data brings
colossal success. However, manually creating such instruction data is very time-consuming …

Factscore: Fine-grained atomic evaluation of factual precision in long form text generation

S Min, K Krishna, X Lyu, M Lewis, W Yih… - arxiv preprint arxiv …, 2023 - arxiv.org
Evaluating the factuality of long-form text generated by large language models (LMs) is non-
trivial because (1) generations often contain a mixture of supported and unsupported pieces …

Enabling large language models to generate text with citations

T Gao, H Yen, J Yu, D Chen - arxiv preprint arxiv:2305.14627, 2023 - arxiv.org
Large language models (LLMs) have emerged as a widely-used tool for information
seeking, but their generated outputs are prone to hallucination. In this work, our aim is to …

WizardLM: Empowering large pre-trained language models to follow complex instructions

C Xu, Q Sun, K Zheng, X Geng, P Zhao… - The Twelfth …, 2024 - openreview.net
Training large language models (LLMs) with open-domain instruction following data brings
colossal success. However, manually creating such instruction data is very time-consuming …

Fine-grained human feedback gives better rewards for language model training

Z Wu, Y Hu, W Shi, N Dziri, A Suhr… - Advances in …, 2023 - proceedings.neurips.cc
Abstract Language models (LMs) often exhibit undesirable text generation behaviors,
including generating false, toxic, or irrelevant outputs. Reinforcement learning from human …

Adaptive chameleon or stubborn sloth: Revealing the behavior of large language models in knowledge conflicts

J **e, K Zhang, J Chen, R Lou, Y Su - The Twelfth International …, 2023 - openreview.net
By providing external information to large language models (LLMs), tool augmentation
(including retrieval augmentation) has emerged as a promising solution for addressing the …

Ares: An automated evaluation framework for retrieval-augmented generation systems

J Saad-Falcon, O Khattab, C Potts… - arxiv preprint arxiv …, 2023 - arxiv.org
Evaluating retrieval-augmented generation (RAG) systems traditionally relies on hand
annotations for input queries, passages to retrieve, and responses to generate. We …