Transformers in reinforcement learning: a survey

P Agarwal, AA Rahman, PL St-Charles… - arxiv preprint arxiv …, 2023 - arxiv.org
Transformers have significantly impacted domains like natural language processing,
computer vision, and robotics, where they improve performance compared to other neural …

Deep reinforcement learning

SE Li - Reinforcement learning for sequential decision and …, 2023 - Springer
Similar to humans, RL agents use interactive learning to successfully obtain satisfactory
decision strategies. However, in many cases, it is desirable to learn directly from …

Simulation-guided beam search for neural combinatorial optimization

J Choo, YD Kwon, J Kim, J Jae… - Advances in …, 2022 - proceedings.neurips.cc
Neural approaches for combinatorial optimization (CO) equip a learning mechanism to
discover powerful heuristics for solving complex real-world problems. While neural …

Stochastic beams and where to find them: The gumbel-top-k trick for sampling sequences without replacement

W Kool, H Van Hoof, M Welling - … Conference on Machine …, 2019 - proceedings.mlr.press
Abstract The well-known Gumbel-Max trick for sampling from a categorical distribution can
be extended to sample $ k $ elements without replacement. We show how to implicitly apply …

Deep learning for approximate nearest neighbour search: A survey and future directions

M Li, YG Wang, P Zhang, H Wang, L Fan… - … on Knowledge and …, 2022 - ieeexplore.ieee.org
Approximate nearest neighbour search (ANNS) in high-dimensional space is an essential
and fundamental operation in many applications from many domains such as multimedia …

Learning deductive reasoning from synthetic corpus based on formal logic

T Morishita, G Morio, A Yamaguchi… - … on Machine Learning, 2023 - proceedings.mlr.press
We study a synthetic corpus based approach for language models (LMs) to acquire logical
deductive reasoning ability. The previous studies generated deduction examples using …

Learning optimal tree models under beam search

J Zhuo, Z Xu, W Dai, H Zhu, H Li… - … on Machine Learning, 2020 - proceedings.mlr.press
Retrieving relevant targets from an extremely large target set under computational limits is a
common challenge for information retrieval and recommendation systems. Tree models …

Lambdabeam: Neural program search with higher-order functions and lambdas

K Shi, H Dai, WD Li, K Ellis… - Advances in Neural …, 2023 - proceedings.neurips.cc
Search is an important technique in program synthesis that allows for adaptive strategies
such as focusing on particular search directions based on execution results. Several prior …

Estimating gradients for discrete random variables by sampling without replacement

W Kool, H van Hoof, M Welling - arxiv preprint arxiv:2002.06043, 2020 - arxiv.org
We derive an unbiased estimator for expectations over discrete random variables based on
sampling without replacement, which reduces variance as it avoids duplicate samples. We …

Reinforcement routing on proximity graph for efficient recommendation

C Feng, D Lian, X Wang, Z Liu, X **e… - ACM Transactions on …, 2023 - dl.acm.org
We focus on Maximum Inner Product Search (MIPS), which is an essential problem in many
machine learning communities. Given a query, MIPS finds the most similar items with the …