On reinforcement learning and distribution matching for fine-tuning language models with no catastrophic forgetting

T Korbak, H Elsahar, G Kruszewski… - Advances in Neural …, 2022 - proceedings.neurips.cc
The availability of large pre-trained models is changing the landscape of Machine Learning
research and practice, moving from a" training from scratch" to a" fine-tuning''paradigm …

Oops i took a gradient: Scalable sampling for discrete distributions

W Grathwohl, K Swersky, M Hashemi… - International …, 2021 - proceedings.mlr.press
We propose a general and scalable approximate sampling strategy for probabilistic models
with discrete variables. Our approach uses gradients of the likelihood function with respect …

On the calibration of pre-trained language models using mixup guided by area under the margin and saliency

SY Park, C Caragea - arxiv preprint arxiv:2203.07559, 2022 - arxiv.org
A well-calibrated neural model produces confidence (probability outputs) closely
approximated by the expected accuracy. While prior studies have shown that mixup training …

Building minimal and reusable causal state abstractions for reinforcement learning

Z Wang, C Wang, X **ao, Y Zhu, P Stone - Proceedings of the AAAI …, 2024 - ojs.aaai.org
Two desiderata of reinforcement learning (RL) algorithms are the ability to learn from
relatively little experience and the ability to learn policies that generalize to a range of …

On the Calibration of Multilingual Question Answering LLMs

Y Yang, S Dan, D Roth, I Lee - arxiv preprint arxiv:2311.08669, 2023 - arxiv.org
Multilingual pre-trained Large Language Models (LLMs) are incredibly effective at Question
Answering (QA), a core task in Natural Language Understanding, achieving high accuracies …

Triple-Hybrid Energy-based Model Makes Better Calibrated Natural Language Understanding Models

H Xu, Y Zhang - Proceedings of the 17th Conference of the …, 2023 - aclanthology.org
Though pre-trained language models achieve notable success in many applications, it's
usually controversial for over-confident predictions. Specifically, the in-distribution (ID) …

Energy-based models with applications to speech and language processing

Z Ou - Foundations and Trends® in Signal Processing, 2024 - nowpublishers.com
Abstract Energy-Based Models (EBMs) are an important class of probabilistic models, also
known as random fields and undirected graphical models. EBMs are un-normalized and …

Consistent and efficient long document understanding

Q Zeng - 2023 - ideals.illinois.edu
In the age of information overload, people's information needs from long documents are
rapidly emerging, while people's patience for careful reading and reasoning is gradually …

Improving NMT Models by Retrofitting Quality Estimators into Trainable Energy Loss

G Yoo, JY Lee - Proceedings of the 31st International Conference …, 2025 - aclanthology.org
Reinforcement learning has shown great promise in aligning language models with human
preferences in a variety of text generation tasks, including machine translation. For …

Consistent training via energy-based gflownets for modeling discrete joint distributions

C Ekbote, M Jain, P Das, Y Bengio - arxiv preprint arxiv:2211.00568, 2022 - arxiv.org
Generative Flow Networks (GFlowNets) have demonstrated significant performance
improvements for generating diverse discrete objects $ x $ given a reward function $ R (x) …