Agent lumos: Unified and modular training for open-source language agents

D Yin, F Brahman, A Ravichander… - Proceedings of the …, 2024 - aclanthology.org
Closed-source agents suffer from several issues such as a lack of affordability, transparency,
and reproducibility, particularly on complex interactive tasks. This motivates the …

Lumos: Learning agents with unified data, modular design, and open-source llms

D Yin, F Brahman, A Ravichander… - ICLR 2024 Workshop …, 2023 - openreview.net
We introduce Lumos, a novel framework for training language agents that employs a unified
data format and a modular architecture based on open-source large language models …

MacGyver: Are Large Language Models Creative Problem Solvers?

Y Tian, A Ravichander, L Qin, RL Bras… - arxiv preprint arxiv …, 2023 - arxiv.org
We explore the creative problem-solving capabilities of modern large language models
(LLMs) in a constrained setting. The setting requires circumventing a cognitive bias known in …

Tasklama: probing the complex task understanding of language models

Q Yuan, M Kazemi, X Xu, I Noble… - Proceedings of the …, 2024 - ojs.aaai.org
Structured Complex Task Decomposition (SCTD) is the problem of breaking down a
complex real-world task (such as planning a wedding) into a directed acyclic graph over …

STEER: Unified Style Transfer with Expert Reinforcement

S Hallinan, F Brahman, X Lu, J Jung, S Welleck… - arxiv preprint arxiv …, 2023 - arxiv.org
While text style transfer has many applications across natural language processing, the core
premise of transferring from a single source style is unrealistic in a real-world setting. In this …

Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos

KRY Nagasinghe, H Zhou… - Proceedings of the …, 2024 - openaccess.thecvf.com
In this paper we explore the capability of an agent to construct a logical sequence of action
steps thereby assembling a strategic procedural plan. This plan is crucial for navigating from …

Geometric-averaged preference optimization for soft preference labels

H Furuta, KH Lee, SS Gu, Y Matsuo, A Faust… - arxiv preprint arxiv …, 2024 - arxiv.org
Many algorithms for aligning LLMs with human preferences assume that human preferences
are binary and deterministic. However, human preferences can vary across individuals, and …

CAT-BENCH: Benchmarking Language Model Understanding of Causal and Temporal Dependencies in Plans

YK Lal, V Cohen, N Chambers… - arxiv preprint arxiv …, 2024 - arxiv.org
Understanding the abilities of LLMs to reason about natural language plans, such as
instructional text and recipes, is critical to reliably using them in decision-making systems. A …

E2CL: Exploration-based Error Correction Learning for Embodied Agents

H Wang, CT Leong, J Wang, W Li - arxiv preprint arxiv:2409.03256, 2024 - arxiv.org
Language models are exhibiting increasing capability in knowledge utilization and
reasoning. However, when applied as agents in embodied environments, they often suffer …

Do large language models and humans have similar behaviors in causal inference with script knowledge?

X Hong, M Ryzhova, DA Biondi, V Demberg - arxiv preprint arxiv …, 2023 - arxiv.org
Recently, large pre-trained language models (LLMs) have demonstrated superior language
understanding abilities, including zero-shot causal reasoning. However, it is unclear to what …