Følg
Nadav Merlis
Nadav Merlis
Assistant Professor @ Technion
Verifisert e-postadresse på technion.ac.il - Startside
Tittel
Sitert av
Sitert av
År
Learn what not to learn: Action elimination with deep reinforcement learning
T Zahavy, M Haroush, N Merlis, DJ Mankowitz, S Mannor
arXiv preprint arXiv:1809.02121, 2018
2622018
Tight regret bounds for model-based reinforcement learning with greedy policies
Y Efroni, N Merlis, M Ghavamzadeh, S Mannor
Advances in Neural Information Processing Systems 32, 2019
792019
Reinforcement learning with trajectory feedback
Y Efroni, N Merlis, S Mannor
Proceedings of the AAAI conference on artificial intelligence 35 (8), 7288-7295, 2021
562021
Ensemble bootstrapping for q-learning
O Peer, C Tessler, N Merlis, R Meir
International conference on machine learning, 8454-8463, 2021
472021
Batch-size independent regret bounds for the combinatorial multi-armed bandit problem
N Merlis, S Mannor
Conference on Learning Theory, 2465-2489, 2019
352019
Tight lower bounds for combinatorial multi-armed bandits
N Merlis, S Mannor
Conference on Learning Theory, 2830-2857, 2020
232020
Confidence-budget matching for sequential budgeted learning
Y Efroni, N Merlis, A Saha, S Mannor
International Conference on Machine Learning, 2937-2947, 2021
112021
Reinforcement learning with history dependent dynamic contexts
G Tennenholtz, N Merlis, L Shani, M Mladenov, C Boutilier
International Conference on Machine Learning, 34011-34053, 2023
72023
Lenient regret for multi-armed bandits
N Merlis, S Mannor
Proceedings of the AAAI Conference on Artificial Intelligence 35 (10), 8950-8957, 2021
72021
On preemption and learning in stochastic scheduling
N Merlis, H Richard, F Sentenac, C Odic, M Molina, V Perchet
International Conference on Machine Learning, 24478-24516, 2023
62023
Reinforcement learning with a terminator
G Tennenholtz, N Merlis, L Shani, S Mannor, U Shalit, G Chechik, ...
Advances in Neural Information Processing Systems 35, 35696-35709, 2022
42022
Never Worse, Mostly Better: Stable Policy Improvement in Deep Reinforcement Learning
P Khanna, G Tennenholtz, N Merlis, S Mannor, C Tessler
arXiv preprint arXiv:1910.01062, 2019
4*2019
Multi-armed bandits with guaranteed revenue per arm
D Baudry, N Merlis, MB Molina, H Richard, V Perchet
International Conference on Artificial Intelligence and Statistics, 379-387, 2024
32024
Improved algorithms for contextual dynamic pricing
M Tullii, S Gaucher, N Merlis, V Perchet
Advances in Neural Information Processing Systems 37, 126088-126117, 2025
22025
The value of reward lookahead in reinforcement learning
N Merlis, D Baudry, V Perchet
Advances in Neural Information Processing Systems 37, 83627-83664, 2025
12025
Reinforcement Learning with Lookahead Information
N Merlis
Advances in Neural Information Processing Systems 37, 64523--64581, 2024
2024
Stable Matching with Ties: Approximation Ratios and Learning
S Lin, S Mauras, N Merlis, V Perchet
arXiv preprint arXiv:2411.03270, 2024
2024
On Bits and Bandits: Quantifying the Regret-Information Trade-off
I Shufaro, N Merlis, N Weinberger, S Mannor
arXiv preprint arXiv:2405.16581, 2024
2024
Ranking with Popularity Bias: User Welfare under Self-Amplification Dynamics
G Tennenholtz, M Mladenov, N Merlis, RL Axtell, C Boutilier
arXiv preprint arXiv:2305.18333, 2023
2023
Query-Reward Tradeoffs in Multi-Armed Bandits
N Merlis, Y Efroni, S Mannor
arXiv preprint arXiv:2110.05724, 2021
2021
Systemet kan ikke utføre handlingen. Prøv på nytt senere.
Artikler 1–20