Tool learning with foundation models

Y Qin, S Hu, Y Lin, W Chen, N Ding, G Cui… - ACM Computing …, 2024 - dl.acm.org
Humans possess an extraordinary ability to create and utilize tools. With the advent of
foundation models, artificial intelligence systems have the potential to be equally adept in …

Survey on factuality in large language models: Knowledge, retrieval and domain-specificity

C Wang, X Liu, Y Yue, X Tang, T Zhang… - arxiv preprint arxiv …, 2023 - arxiv.org
This survey addresses the crucial issue of factuality in Large Language Models (LLMs). As
LLMs find applications across diverse domains, the reliability and accuracy of their outputs …

A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions

L Huang, W Yu, W Ma, W Zhong, Z Feng… - ACM Transactions on …, 2025 - dl.acm.org
The emergence of large language models (LLMs) has marked a significant breakthrough in
natural language processing (NLP), fueling a paradigm shift in information acquisition …

Fine-tuning aligned language models compromises safety, even when users do not intend to!

X Qi, Y Zeng, T **e, PY Chen, R Jia, P Mittal… - arxiv preprint arxiv …, 2023 - arxiv.org
Optimizing large language models (LLMs) for downstream use cases often involves the
customization of pre-trained LLMs through further fine-tuning. Meta's open release of Llama …

The flan collection: Designing data and methods for effective instruction tuning

S Longpre, L Hou, T Vu, A Webson… - International …, 2023 - proceedings.mlr.press
We study the design decision of publicly available instruction tuning methods, by
reproducing and breaking down the development of Flan 2022 (Chung et al., 2022) …

Trustworthy llms: a survey and guideline for evaluating large language models' alignment

Y Liu, Y Yao, JF Ton, X Zhang, R Guo, H Cheng… - arxiv preprint arxiv …, 2023 - arxiv.org
Ensuring alignment, which refers to making models behave in accordance with human
intentions [1, 2], has become a critical task before deploying large language models (LLMs) …

When not to trust language models: Investigating effectiveness of parametric and non-parametric memories

A Mallen, A Asai, V Zhong, R Das, D Khashabi… - arxiv preprint arxiv …, 2022 - arxiv.org
Despite their impressive performance on diverse tasks, large language models (LMs) still
struggle with tasks requiring rich world knowledge, implying the limitations of relying solely …

A pretrainer's guide to training data: Measuring the effects of data age, domain coverage, quality, & toxicity

S Longpre, G Yauney, E Reif, K Lee… - Proceedings of the …, 2024 - aclanthology.org
Pretraining data design is critically under-documented and often guided by empirically
unsupported intuitions. We pretrain models on data curated (1) at different collection …

Rarr: Researching and revising what language models say, using language models

L Gao, Z Dai, P Pasupat, A Chen, AT Chaganty… - arxiv preprint arxiv …, 2022 - arxiv.org
Language models (LMs) now excel at many tasks such as few-shot learning, question
answering, reasoning, and dialog. However, they sometimes generate unsupported or …

Trusting your evidence: Hallucinate less with context-aware decoding

W Shi, X Han, M Lewis, Y Tsvetkov… - Proceedings of the …, 2024 - aclanthology.org
Abstract Language models (LMs) often struggle to pay enough attention to the input context,
and generate texts that are unfaithful or contain hallucinations. To mitigate this issue, we …