- Academic Search

ZH Zhou - National Science Review, 2022 - academic.oup.com

Conventional machine learning studies generally assume close-environment scenarios
where important factors of the learning process hold invariant. With the great success of …

Tallenna Viittaa Viittausten määrä 155 Aiheeseen liittyviä artikkeleita Kaikki 9 versiota

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Multi-agent reinforcement learning: A selective overview of theories and algorithms

K Zhang, Z Yang, T Başar - Handbook of reinforcement learning and …, 2021 - Springer

Recent years have witnessed significant advances in reinforcement learning (RL), which
has registered tremendous success in solving various sequential decision-making problems …

Tallenna Viittaa Viittausten määrä 1710 Aiheeseen liittyviä artikkeleita Kaikki 7 versiota

[Free GPT-4]
[DeepSeek]

[PDF] ai-plans.com

[PDF][PDF] Nash learning from human feedback

R Munos, M Valko, D Calandriello, MG Azar… - arxiv preprint arxiv …, 2023 - ai-plans.com

Large language models (LLMs)(Anil et al., 2023; Glaese et al., 2022; OpenAI, 2023; Ouyang
et al., 2022) have made remarkable strides in enhancing natural language understanding …

Tallenna Viittaa Viittausten määrä 101 Aiheeseen liittyviä artikkeleita Kaikki 10 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] nowpublishers.com

User-friendly introduction to PAC-Bayes bounds

P Alquier - Foundations and Trends® in Machine Learning, 2024 - nowpublishers.com

Aggregated predictors are obtained by making a set of basic predictors vote according to
some weights, that is, to some probability distribution. Randomized predictors are obtained …

Tallenna Viittaa Viittausten määrä 222 Aiheeseen liittyviä artikkeleita Kaikki 7 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Direct nash optimization: Teaching language models to self-improve with general preferences

C Rosset, CA Cheng, A Mitra, M Santacroce… - arxiv preprint arxiv …, 2024 - arxiv.org

This paper studies post-training large language models (LLMs) using preference feedback
from a powerful oracle to help a model iteratively improve over itself. The typical approach …

Tallenna Viittaa Viittausten määrä 79 Aiheeseen liittyviä artikkeleita Kaikki 3 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The statistical complexity of interactive decision making

DJ Foster, SM Kakade, J Qian, A Rakhlin - arxiv preprint arxiv:2112.13487, 2021 - arxiv.org

A fundamental challenge in interactive learning and decision making, ranging from bandit
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …

Tallenna Viittaa Viittausten määrä 214 Aiheeseen liittyviä artikkeleita Kaikki 5 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Adversarially trained actor critic for offline reinforcement learning

CA Cheng, T **e, N Jiang… - … Conference on Machine …, 2022 - proceedings.mlr.press

Abstract We propose Adversarially Trained Actor Critic (ATAC), a new model-free algorithm
for offline reinforcement learning (RL) under insufficient data coverage, based on the …

Tallenna Viittaa Viittausten määrä 150 Aiheeseen liittyviä artikkeleita Kaikki 8 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

An overview of multi-agent reinforcement learning from game theoretical perspective

Y Yang, J Wang - arxiv preprint arxiv:2011.00583, 2020 - arxiv.org

Following the remarkable success of the AlphaGO series, 2019 was a booming year that
witnessed significant advances in multi-agent reinforcement learning (MARL) techniques …

Tallenna Viittaa Viittausten määrä 351 Aiheeseen liittyviä artikkeleita Kaikki 2 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

On gradient descent ascent for nonconvex-concave minimax problems

T Lin, C **, M Jordan - International conference on machine …, 2020 - proceedings.mlr.press

We consider nonconvex-concave minimax problems, $\min_ {\mathbf {x}}\max_ {\mathbf
{y}\in\mathcal {Y}} f (\mathbf {x},\mathbf {y}) $, where $ f $ is nonconvex in $\mathbf {x} $ but …

Tallenna Viittaa Viittausten määrä 623 Aiheeseen liittyviä artikkeleita Kaikki 7 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] jmlr.org

On the theory of policy gradient methods: Optimality, approximation, and distribution shift

A Agarwal, SM Kakade, JD Lee, G Mahajan - Journal of Machine Learning …, 2021 - jmlr.org

Policy gradient methods are among the most effective methods in challenging reinforcement
learning problems with large state and/or action spaces. However, little is known about even …

Tallenna Viittaa Viittausten määrä 508 Aiheeseen liittyviä artikkeleita Kaikki 13 versiota HTML-versio

Luo ilmoitus

Viittaa

Tarkennettu haku

Tallennettu omaan kirjastoon

Prediction, learning, and games

[PDF][PDF] Open-environment machine learning

Multi-agent reinforcement learning: A selective overview of theories and algorithms

[PDF][PDF] Nash learning from human feedback

User-friendly introduction to PAC-Bayes bounds

Direct nash optimization: Teaching language models to self-improve with general preferences

The statistical complexity of interactive decision making

Adversarially trained actor critic for offline reinforcement learning

An overview of multi-agent reinforcement learning from game theoretical perspective

On gradient descent ascent for nonconvex-concave minimax problems

On the theory of policy gradient methods: Optimality, approximation, and distribution shift