Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
[PDF][PDF] Open-environment machine learning
ZH Zhou - National Science Review, 2022 - academic.oup.com
Conventional machine learning studies generally assume close-environment scenarios
where important factors of the learning process hold invariant. With the great success of …
where important factors of the learning process hold invariant. With the great success of …
Multi-agent reinforcement learning: A selective overview of theories and algorithms
Recent years have witnessed significant advances in reinforcement learning (RL), which
has registered tremendous success in solving various sequential decision-making problems …
has registered tremendous success in solving various sequential decision-making problems …
[PDF][PDF] Nash learning from human feedback
Large language models (LLMs)(Anil et al., 2023; Glaese et al., 2022; OpenAI, 2023; Ouyang
et al., 2022) have made remarkable strides in enhancing natural language understanding …
et al., 2022) have made remarkable strides in enhancing natural language understanding …
User-friendly introduction to PAC-Bayes bounds
P Alquier - Foundations and Trends® in Machine Learning, 2024 - nowpublishers.com
Aggregated predictors are obtained by making a set of basic predictors vote according to
some weights, that is, to some probability distribution. Randomized predictors are obtained …
some weights, that is, to some probability distribution. Randomized predictors are obtained …
Direct nash optimization: Teaching language models to self-improve with general preferences
This paper studies post-training large language models (LLMs) using preference feedback
from a powerful oracle to help a model iteratively improve over itself. The typical approach …
from a powerful oracle to help a model iteratively improve over itself. The typical approach …
The statistical complexity of interactive decision making
A fundamental challenge in interactive learning and decision making, ranging from bandit
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …
Adversarially trained actor critic for offline reinforcement learning
Abstract We propose Adversarially Trained Actor Critic (ATAC), a new model-free algorithm
for offline reinforcement learning (RL) under insufficient data coverage, based on the …
for offline reinforcement learning (RL) under insufficient data coverage, based on the …
An overview of multi-agent reinforcement learning from game theoretical perspective
Y Yang, J Wang - arxiv preprint arxiv:2011.00583, 2020 - arxiv.org
Following the remarkable success of the AlphaGO series, 2019 was a booming year that
witnessed significant advances in multi-agent reinforcement learning (MARL) techniques …
witnessed significant advances in multi-agent reinforcement learning (MARL) techniques …
On gradient descent ascent for nonconvex-concave minimax problems
We consider nonconvex-concave minimax problems, $\min_ {\mathbf {x}}\max_ {\mathbf
{y}\in\mathcal {Y}} f (\mathbf {x},\mathbf {y}) $, where $ f $ is nonconvex in $\mathbf {x} $ but …
{y}\in\mathcal {Y}} f (\mathbf {x},\mathbf {y}) $, where $ f $ is nonconvex in $\mathbf {x} $ but …
On the theory of policy gradient methods: Optimality, approximation, and distribution shift
Policy gradient methods are among the most effective methods in challenging reinforcement
learning problems with large state and/or action spaces. However, little is known about even …
learning problems with large state and/or action spaces. However, little is known about even …