Entropic distribution matching in supervised fine-tuning of LLMs: Less overfitting and better diversity

Z Li, C Chen, T Xu, Z Qin, J **ao, R Sun… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models rely on Supervised Fine-Tuning (SFT) to specialize in downstream
tasks. Cross Entropy (CE) loss is the de facto choice in SFT, but it often leads to overfitting …

Stabilizing Q-learning with linear architectures for provable efficient learning

A Zanette, M Wainwright - International Conference on …, 2022 - proceedings.mlr.press
The Q-learning algorithm is a simple, fundamental and practically very effective
reinforcement learning algorithm. However, the basic protocol can exhibit an unstable …

Provably and Practically Efficient Adversarial Imitation Learning with General Function Approximation

T Xu, Z Zhang, R Chen, Y Sun, Y Yu - arxiv preprint arxiv:2411.00610, 2024 - arxiv.org
As a prominent category of imitation learning methods, adversarial imitation learning (AIL)
has garnered significant practical success powered by neural network approximation …

Event tables for efficient experience replay

V Kompella, TJ Walsh, S Barrett, P Wurman… - arxiv preprint arxiv …, 2022 - arxiv.org
Experience replay (ER) is a crucial component of many deep reinforcement learning (RL)
systems. However, uniform sampling from an ER buffer can lead to slow convergence and …

Efficient and scalable reinforcement learning via Hypermodel

Y Li, J Xu, ZQ Luo - … on Adaptive Experimental Design and Active …, 2023 - openreview.net
Data-efficient reinforcement learning (RL) requires deep exploration. Thompson sampling is
a principled method for deep exploration in reinforcement learning. However, Thompson …