- Academic Search

D Mudigere, Y Hao, J Huang, Z Jia, A Tulloch… - Proceedings of the 49th …, 2022 - dl.acm.org

Deep learning recommendation models (DLRMs) have been used across many business-
critical services at Meta and are the single largest AI application in terms of infrastructure …

保存引用被引用数: 126 関連記事全 7 バージョン

[Free GPT-4]

[PDF] mlr.press

Off-policy evaluation for large action spaces via conjunct effect modeling

Y Saito, Q Ren, T Joachims - international conference on …, 2023 - proceedings.mlr.press

We study off-policy evaluation (OPE) of contextual bandit policies for large discrete action
spaces where conventional importance-weighting approaches suffer from excessive …

保存引用被引用数: 21 関連記事全 8 バージョン HTMLバージョン

[Free GPT-4]

[PDF] researchgate.net

Pessimistic reward models for off-policy learning in recommendation

O Jeunen, B Goethals - Proceedings of the 15th ACM Conference on …, 2021 - dl.acm.org

Methods for bandit learning from user interactions often require a model of the reward a
certain context-action pair will yield–for example, the probability of a click on a …

保存引用被引用数: 52 関連記事全 4 バージョン

[Free GPT-4]

[PDF] acm.org

Off-policy evaluation for large action spaces via policy convolution

N Sachdeva, L Wang, D Liang, N Kallus… - Proceedings of the ACM …, 2024 - dl.acm.org

Develo** accurate off-policy estimators is crucial for both evaluating and optimizing for
new policies. The main challenge in off-policy estimation is the distribution shift between the …

保存引用被引用数: 10 関連記事全 6 バージョン

[Free GPT-4]

[PDF] acm.org

Pessimistic decision-making for recommender systems

O Jeunen, B Goethals - ACM Transactions on Recommender Systems, 2023 - dl.acm.org

Modern recommender systems are often modelled under the sequential decision-making
paradigm, where the system decides which recommendations to show in order to maximise …

保存引用被引用数: 16 関連記事全 2 バージョン

[Free GPT-4]

[PDF] arxiv.org

On (Normalised) Discounted Cumulative Gain as an Off-Policy Evaluation Metric for Top-n Recommendation

O Jeunen, I Potapov, A Ustimenko - Proceedings of the 30th ACM …, 2024 - dl.acm.org

Approaches to recommendation are typically evaluated in one of two ways:(1) via a
(simulated) online experiment, often seen as the gold standard, or (2) via some offline …

保存引用被引用数: 12 関連記事全 2 バージョン

[Free GPT-4]

[PDF] neurips.cc

On component interactions in two-stage recommender systems

J Hron, K Krauth, M Jordan… - Advances in neural …, 2021 - proceedings.neurips.cc

Thanks to their scalability, two-stage recommenders are used by many of today's largest
online platforms, including YouTube, LinkedIn, and Pinterest. These systems produce …

保存引用被引用数: 37 関連記事全 6 バージョン HTMLバージョン

[Free GPT-4]

[PDF] mlr.press

Top-k extreme contextual bandits with arm hierarchy

R Sen, A Rakhlin, L Ying, R Kidambi… - International …, 2021 - proceedings.mlr.press

Motivated by modern applications, such as online advertisement and recommender
systems, we study the top-$ k $ extreme contextual bandits problem, where the total number …

保存引用被引用数: 30 関連記事全 10 バージョン HTMLバージョン

[Free GPT-4]

[PDF] tandfonline.com

The digital transformation in health: How AI can improve the performance of health systems

Á Periáñez, A Fernández Del Río, I Nazarov… - Health Systems & …, 2024 - Taylor & Francis

Mobile health has the potential to revolutionize health care delivery and patient
engagement. In this work, we discuss how integrating Artificial Intelligence into digital health …

保存引用被引用数: 5 関連記事

[Free GPT-4]

[PDF] arxiv.org

POTEC: Off-Policy Learning for Large Action Spaces via Two-Stage Policy Decomposition

Y Saito, J Yao, T Joachims - arxiv preprint arxiv:2402.06151, 2024 - arxiv.org

We study off-policy learning (OPL) of contextual bandit policies in large discrete action
spaces where existing methods--most of which rely crucially on reward-regression models …

保存引用被引用数: 4 関連記事全 2 バージョン HTMLバージョン

アラートを作成

引用

検索オプション

マイライブラリに保存しました

Learning from extreme bandit feedback

Software-hardware co-design for fast and scalable training of deep learning recommendation models

Off-policy evaluation for large action spaces via conjunct effect modeling

Pessimistic reward models for off-policy learning in recommendation

Off-policy evaluation for large action spaces via policy convolution

Pessimistic decision-making for recommender systems

On (Normalised) Discounted Cumulative Gain as an Off-Policy Evaluation Metric for Top-n Recommendation

On component interactions in two-stage recommender systems

Top-k extreme contextual bandits with arm hierarchy

The digital transformation in health: How AI can improve the performance of health systems

POTEC: Off-Policy Learning for Large Action Spaces via Two-Stage Policy Decomposition