Google Acadèmic

Articles

Acadèmic

2 resultats (0,02 s)

El meu perfil La meva biblioteca

Learning policies through quantile regression

Cerca en els articles que el citen

[Free GPT-4]
[DeepSeek]

[PDF] jmlr.org

Greedification operators for policy optimization: Investigating forward and reverse kl divergences

A Chan, H Silva, S Lim, T Kozuno, AR Mahmood… - Journal of Machine …, 2022 - jmlr.org

Approximate Policy Iteration (API) algorithms alternate between (approximate) policy
evaluation and (approximate) greedification. Many different approaches have been explored …

Desa Cita Citat per 33 Articles relacionats Totes les 4 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] mst.edu

QC_SANE: Robust control in DRL using quantile critic with spiking actor and normalized ensemble

S Gupta, G Singal, D Garg… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

Recently introduced deep reinforcement learning (DRL) techniques in discrete-time have
resulted in significant advances in online games, robotics, and so on. Inspired from recent …

Desa Cita Citat per 5 Articles relacionats Totes les 6 versions Free GPT-4 DeepSeek

Crea una alerta

Cita

Cerca avançada

S'ha desat a La meva biblioteca

Learning policies through quantile regression

Greedification operators for policy optimization: Investigating forward and reverse kl divergences

QC_SANE: Robust control in DRL using quantile critic with spiking actor and normalized ensemble