[PDF][PDF] Nash learning from human feedback

R Munos, M Valko, D Calandriello, MG Azar… - ar** pace in the dynamic case
H Fang, NJA Harvey, VS Portella… - Journal of Machine …, 2022 - jmlr.org
Online mirror descent (OMD) and dual averaging (DA)--two fundamental algorithms for
online convex optimization--are known to have very similar (and sometimes identical) …

A survey on noncooperative games and distributed Nash equilibrium seeking over multi-agent networks

P Yi, J Lei, X Li, S Liang, M Meng… - CAAI Artificial Intelligence …, 2022 - sciopen.com
The work gives a review on the distributed Nash equilibrium seeking of noncooperative
games in multi-agent networks, which emerges as one of the frontier research topics in the …