Camel: Communicative agents for" mind" exploration of large language model society

G Li, H Hammoud, H Itani… - Advances in Neural …, 2023 - proceedings.neurips.cc
The rapid advancement of chat-based language models has led to remarkable progress in
complex task-solving. However, their success heavily relies on human input to guide the …

Character-llm: A trainable agent for role-playing

Y Shao, L Li, J Dai, X Qiu - arxiv preprint arxiv:2310.10158, 2023 - arxiv.org
Large language models (LLMs) can be used to serve as agents to simulate human
behaviors, given the powerful ability to understand human instructions and provide high …

Aya model: An instruction finetuned open-access multilingual language model

A Üstün, V Aryabumi, ZX Yong, WY Ko… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent breakthroughs in large language models (LLMs) have centered around a handful of
data-rich languages. What does it take to broaden access to breakthroughs beyond first …

Retrieval meets long context large language models

P Xu, W **, X Wu, L McAfee, C Zhu, Z Liu… - The Twelfth …, 2023 - openreview.net
Extending the context window of large language models (LLMs) is getting popular recently,
while the solution of augmenting LLMs with retrieval has existed for years. The natural …

Lmsys-chat-1m: A large-scale real-world llm conversation dataset

L Zheng, WL Chiang, Y Sheng, T Li, S Zhuang… - arxiv preprint arxiv …, 2023 - arxiv.org
Studying how people interact with large language models (LLMs) in real-world scenarios is
increasingly important due to their widespread use in various applications. In this paper, we …

Aya dataset: An open-access collection for multilingual instruction tuning

S Singh, F Vargus, D Dsouza, BF Karlsson… - arxiv preprint arxiv …, 2024 - arxiv.org
Datasets are foundational to many breakthroughs in modern artificial intelligence. Many
recent achievements in the space of natural language processing (NLP) can be attributed to …

FANToM: A benchmark for stress-testing machine theory of mind in interactions

H Kim, M Sclar, X Zhou, RL Bras, G Kim, Y Choi… - arxiv preprint arxiv …, 2023 - arxiv.org
Theory of mind (ToM) evaluations currently focus on testing models using passive narratives
that inherently lack interactivity. We introduce FANToM, a new benchmark designed to stress …

Chatqa: Surpassing gpt-4 on conversational qa and rag

Z Liu, W **, R Roy, P Xu, C Lee… - Advances in …, 2025 - proceedings.neurips.cc
In this work, we introduce ChatQA, a suite of models that outperform GPT-4 on retrieval-
augmented generation (RAG) and conversational question answering (QA). To enhance …

Value kaleidoscope: Engaging ai with pluralistic human values, rights, and duties

T Sorensen, L Jiang, JD Hwang, S Levine… - Proceedings of the …, 2024 - ojs.aaai.org
Human values are crucial to human decision-making.\textit {Value pluralism} is the view that
multiple correct values may be held in tension with one another (eg, when considering\textit …

The cot collection: Improving zero-shot and few-shot learning of language models via chain-of-thought fine-tuning

S Kim, SJ Joo, D Kim, J Jang, S Ye, J Shin… - arxiv preprint arxiv …, 2023 - arxiv.org
Language models (LMs) with less than 100B parameters are known to perform poorly on
chain-of-thought (CoT) reasoning in contrast to large LMs when solving unseen tasks. In this …