[PDF][PDF] Open-environment machine learning

ZH Zhou - National Science Review, 2022 - academic.oup.com
Conventional machine learning studies generally assume close-environment scenarios
where important factors of the learning process hold invariant. With the great success of …

Multi-agent reinforcement learning: A selective overview of theories and algorithms

K Zhang, Z Yang, T Başar - Handbook of reinforcement learning and …, 2021 - Springer
Recent years have witnessed significant advances in reinforcement learning (RL), which
has registered tremendous success in solving various sequential decision-making problems …

[PDF][PDF] Nash learning from human feedback

R Munos, M Valko, D Calandriello, MG Azar… - arxiv preprint arxiv …, 2023 - ai-plans.com
Large language models (LLMs)(Anil et al., 2023; Glaese et al., 2022; OpenAI, 2023; Ouyang
et al., 2022) have made remarkable strides in enhancing natural language understanding …

User-friendly introduction to PAC-Bayes bounds

P Alquier - Foundations and Trends® in Machine Learning, 2024 - nowpublishers.com
Aggregated predictors are obtained by making a set of basic predictors vote according to
some weights, that is, to some probability distribution. Randomized predictors are obtained …

Direct nash optimization: Teaching language models to self-improve with general preferences

C Rosset, CA Cheng, A Mitra, M Santacroce… - arxiv preprint arxiv …, 2024 - arxiv.org
This paper studies post-training large language models (LLMs) using preference feedback
from a powerful oracle to help a model iteratively improve over itself. The typical approach …

The statistical complexity of interactive decision making

DJ Foster, SM Kakade, J Qian, A Rakhlin - arxiv preprint arxiv:2112.13487, 2021 - arxiv.org
A fundamental challenge in interactive learning and decision making, ranging from bandit
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …

Adversarially trained actor critic for offline reinforcement learning

CA Cheng, T **e, N Jiang… - … Conference on Machine …, 2022 - proceedings.mlr.press
Abstract We propose Adversarially Trained Actor Critic (ATAC), a new model-free algorithm
for offline reinforcement learning (RL) under insufficient data coverage, based on the …

An overview of multi-agent reinforcement learning from game theoretical perspective

Y Yang, J Wang - arxiv preprint arxiv:2011.00583, 2020 - arxiv.org
Following the remarkable success of the AlphaGO series, 2019 was a booming year that
witnessed significant advances in multi-agent reinforcement learning (MARL) techniques …

On gradient descent ascent for nonconvex-concave minimax problems

T Lin, C **, M Jordan - International conference on machine …, 2020 - proceedings.mlr.press
We consider nonconvex-concave minimax problems, $\min_ {\mathbf {x}}\max_ {\mathbf
{y}\in\mathcal {Y}} f (\mathbf {x},\mathbf {y}) $, where $ f $ is nonconvex in $\mathbf {x} $ but …

On the theory of policy gradient methods: Optimality, approximation, and distribution shift

A Agarwal, SM Kakade, JD Lee, G Mahajan - Journal of Machine Learning …, 2021 - jmlr.org
Policy gradient methods are among the most effective methods in challenging reinforcement
learning problems with large state and/or action spaces. However, little is known about even …