Grounding and evaluation for large language models: Practical challenges and lessons learned (survey)

K Kenthapadi, M Sameki, A Taly - Proceedings of the 30th ACM SIGKDD …, 2024‏ - dl.acm.org
With the ongoing rapid adoption of Artificial Intelligence (AI)-based systems in high-stakes
domains, ensuring the trustworthiness, safety, and observability of these systems has …

Large legal fictions: Profiling legal hallucinations in large language models

M Dahl, V Magesh, M Suzgun… - Journal of Legal Analysis, 2024‏ - academic.oup.com
Do large language models (LLMs) know the law? LLMs are increasingly being used to
augment legal practice, education, and research, yet their revolutionary potential is …

Multi-hop question answering

V Mavi, A Jangra, A Jatowt - Foundations and Trends® in …, 2024‏ - nowpublishers.com
Abstract The task of Question Answering (QA) has attracted significant research interest for a
long time. Its relevance to language understanding and knowledge retrieval tasks, along …

Open-Ethical AI: Advancements in Open-Source Human-Centric Neural Language Models

S Sicari, JF Cevallos M, A Rizzardi… - ACM Computing …, 2024‏ - dl.acm.org
This survey summarises the most recent methods for building and assessing helpful, honest,
and harmless neural language models, considering small, medium, and large-size models …

The art of saying no: Contextual noncompliance in language models

F Brahman, S Kumar, V Balachandran, P Dasigi… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Chat-based language models are designed to be helpful, yet they should not comply with
every user request. While most existing work primarily focuses on refusal of" unsafe" …

DebUnc: mitigating hallucinations in large language model agent communication with uncertainty estimations

L Yoffe, A Amayuelas, WY Wang - arxiv preprint arxiv:2407.06426, 2024‏ - arxiv.org
To enhance Large Language Model (LLM) capabilities, multi-agent debates have been
introduced, where multiple LLMs discuss solutions to a problem over several rounds of …

Self-introspective decoding: Alleviating hallucinations for large vision-language models

F Huo, W Xu, Z Zhang, H Wang, Z Chen… - arxiv preprint arxiv …, 2024‏ - arxiv.org
While Large Vision-Language Models (LVLMs) have rapidly advanced in recent years, the
prevalent issue known as thehallucination'problem has emerged as a significant bottleneck …

Defining knowledge: Bridging epistemology and large language models

C Fierro, R Dhar, F Stamatiou, N Garneau… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Knowledge claims are abundant in the literature on large language models (LLMs); but can
we say that GPT-4 truly" knows" the Earth is round? To address this question, we review …

Think twice before trusting: Self-detection for large language models through comprehensive answer reflection

M Li, W Wang, F Feng, F Zhu, Q Wang… - Findings of the …, 2024‏ - aclanthology.org
Abstract Self-detection for Large Language Models (LLMs) seeks to evaluate the
trustworthiness of the LLM's output by leveraging its own capabilities, thereby alleviating the …

On the limits of language generation: Trade-offs between hallucination and mode collapse

A Kalavasis, A Mehrotra, G Velegkas - arxiv preprint arxiv:2411.09642, 2024‏ - arxiv.org
Specifying all desirable properties of a language model is challenging, but certain
requirements seem essential. Given samples from an unknown language, the trained model …