Tengyu Xu

Citeret af

	Alle	Siden 2020
Henvisninger	1105	1056
h-index	17	17
i10-indeks	17	17

320

160

240

2017201820192020202120222023202420256 10 31 87 182 217 225 320 22

Offentlig adgang

Se alle

13 artikler

1 artikel

tilgængelige

ikke tilgængelige

Baseret på krav i forbindelse med finansiering

Medforfattere

Yingbin LiangThe Ohio State UniversityVerificeret mail på osu.edu
Guanghui (George) LanProfessor, Georgia Institute of TechnologyVerificeret mail på isye.gatech.edu
HV PoorMichael Henry Strater University Professor, Princeton UniversityVerificeret mail på princeton.edu
Zhaoran WangAssociate Professor at Northwestern UniversityVerificeret mail på northwestern.edu

Følg

Tengyu Xu

OpenAI

Verificeret mail på openai.com - Startside

Reinforcement Learning Optimization Large Language Model


Titel Sortér efter henvisninger Sortér efter årstal Sortér efter titel	Citeret af Citeret af	År
Finite-sample analysis for sarsa with linear function approximation S Zou, T Xu, Y Liang Advances in neural information processing systems 32, 2019	209	2019
Crpo: A new approach for safe reinforcement learning with convergence guarantee T Xu, Y Liang, G Lan International Conference on Machine Learning, 11480-11491, 2021	175*	2021
Improving sample complexity bounds for (natural) actor-critic algorithms T Xu, Z Wang, Y Liang Advances in Neural Information Processing Systems 33, 4358-4369, 2020	155*	2020
Two time-scale off-policy TD learning: Non-asymptotic analysis over Markovian samples T Xu, S Zou, Y Liang Advances in neural information processing systems 32, 2019	91	2019
Reanalysis of variance reduced temporal difference learning T Xu, Z Wang, Y Zhou, Y Liang arXiv preprint arXiv:2001.01898, 2020	53	2020
Enhanced first and zeroth order variance reduced algorithms for min-max optimization T Xu, Z Wang, Y Liang, HV Poor	50*	2020
Algorithms for the estimation of transient surface heat flux during ultra-fast surface cooling ZF Zhou, TY Xu, B Chen International Journal of Heat and Mass Transfer 100, 1-10, 2016	47	2016
Faster algorithm and sharper analysis for constrained Markov decision process T Li, Z Guan, S Zou, T Xu, Y Liang, G Lan Operations Research Letters 54, 107107, 2024	40	2024
Non-asymptotic convergence of adam-type reinforcement learning algorithms under markovian sampling H Xiong, T Xu, Y Liang, W Zhang Proceedings of the AAAI Conference on Artificial Intelligence 35 (12), 10460 …, 2021	40	2021
Proximal gradient descent-ascent: Variable convergence under k {\L} geometry Z Chen, Y Zhou, T Xu, Y Liang arXiv preprint arXiv:2102.04653, 2021	37	2021
Sample complexity bounds for two timescale value-based reinforcement learning algorithms T Xu, Y Liang International conference on artificial intelligence and statistics, 811-819, 2021	36	2021
Doubly robust off-policy actor-critic: Convergence and optimality T Xu, Z Yang, Z Wang, Y Liang International Conference on Machine Learning, 11581-11591, 2021	35	2021
Model-based offline meta-reinforcement learning with regularization S Lin, J Wan, T Xu, Y Liang, J Zhang arXiv preprint arXiv:2202.02929, 2022	24	2022
When will generative adversarial imitation learning algorithms attain global convergence Z Guan, T Xu, Y Liang International Conference on Artificial Intelligence and Statistics, 1117-1125, 2021	24	2021
When Will Gradient Methods Converge to Max-margin Classifier under ReLU Models? T Xu, Y Zhou, K Ji, Y Liang arXiv preprint arXiv:1806.04339, 2018	24*	2018
Provably efficient offline reinforcement learning with trajectory-wise reward T Xu, Y Wang, S Zou, Y Liang IEEE Transactions on Information Theory, 2024	19	2024
Deterministic policy gradient: Convergence analysis H Xiong, T Xu, L Zhao, Y Liang, W Zhang Uncertainty in Artificial Intelligence, 2159-2169, 2022	18	2022
PER-ETD: A polynomially efficient emphatic temporal difference learning method Z Guan, T Xu, Y Liang arXiv preprint arXiv:2110.06906, 2021	9	2021
The perfect blend: Redefining RLHF with mixture of judges T Xu, E Helenowski, KA Sankararaman, D Jin, K Peng, E Han, S Nie, ... arXiv preprint arXiv:2409.20370, 2024	7	2024
A unifying framework of off-policy general value function evaluation T Xu, Z Yang, Z Wang, Y Liang Advances in Neural Information Processing Systems 35, 13570-13583, 2022	6*	2022

Systemet kan ikke foretage handlingen nu. Prøv igen senere.

Artikler 1–20

Henvisninger pr. år

Dublerede henvisninger

Flettede henvisninger

Tilføj medforfattereMedforfattere

Følg

Citeret af

Medforfattere