- Academic Search

RF Prudencio, MROA Maximo… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

With the widespread adoption of deep learning, reinforcement learning (RL) has
experienced a dramatic increase in popularity, scaling to previously intractable problems …

Enregistrer Citer Cité 354 fois Autres articles Les 9 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Pretraining language models with human preferences

T Korbak, K Shi, A Chen, RV Bhalerao… - International …, 2023 - proceedings.mlr.press

Abstract Language models (LMs) are pretrained to imitate text from large and diverse
datasets that contain content that would violate human preferences if generated by an LM …

Enregistrer Citer Cité 188 fois Autres articles Les 9 versions Free GPT-4 DeepSeek Version HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Is conditional generative modeling all you need for decision-making?

A Ajay, Y Du, A Gupta, J Tenenbaum… - arxiv preprint arxiv …, 2022 - arxiv.org

Recent improvements in conditional generative modeling have made it possible to generate
high-quality images from language descriptions alone. We investigate whether these …

Enregistrer Citer Cité 360 fois Autres articles Les 4 versions Free GPT-4 DeepSeek Version HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Openchat: Advancing open-source language models with mixed-quality data

G Wang, S Cheng, X Zhan, X Li, S Song… - arxiv preprint arxiv …, 2023 - arxiv.org

Nowadays, open-source large language models like LLaMA have emerged. Recent
developments have incorporated supervised fine-tuning (SFT) and reinforcement learning …

Enregistrer Citer Cité 201 fois Autres articles Les 4 versions Free GPT-4 DeepSeek Version HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Contrastive learning as goal-conditioned reinforcement learning

B Eysenbach, T Zhang, S Levine… - Advances in Neural …, 2022 - proceedings.neurips.cc

In reinforcement learning (RL), it is easier to solve a task if given a good representation.
While deep RL should automatically acquire such good representations, prior work often …

Enregistrer Citer Cité 141 fois Autres articles Les 6 versions Free GPT-4 DeepSeek Version HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Onenet: Enhancing time series forecasting models under concept drift by online ensembling

Q Wen, W Chen, L Sun, Z Zhang… - Advances in …, 2023 - proceedings.neurips.cc

Online updating of time series forecasting models aims to address the concept drifting
problem by efficiently updating forecasting models based on streaming data. Many …

Enregistrer Citer Cité 36 fois Autres articles Les 5 versions Free GPT-4 DeepSeek Version HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Autonomous evaluation and refinement of digital agents

J Pan, Y Zhang, N Tomlin, Y Zhou, S Levine… - arxiv preprint arxiv …, 2024 - arxiv.org

We show that domain-general automatic evaluators can significantly improve the
performance of agents for web navigation and device control. We experiment with multiple …

Enregistrer Citer Cité 39 fois Autres articles Les 2 versions Free GPT-4 DeepSeek Version HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

When does return-conditioned supervised learning work for offline reinforcement learning?

D Brandfonbrener, A Bietti, J Buckman… - Advances in …, 2022 - proceedings.neurips.cc

Several recent works have proposed a class of algorithms for the offline reinforcement
learning (RL) problem that we will refer to as return-conditioned supervised learning …

Enregistrer Citer Cité 85 fois Autres articles Les 10 versions Free GPT-4 DeepSeek Version HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Masked trajectory models for prediction, representation, and control

P Wu, A Majumdar, K Stone, Y Lin… - International …, 2023 - proceedings.mlr.press

Abstract We introduce Masked Trajectory Models (MTM) as a generic abstraction for
sequential decision making. MTM takes a trajectory, such as a state-action sequence, and …

Enregistrer Citer Cité 44 fois Autres articles Les 9 versions Free GPT-4 DeepSeek Version HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Q-learning decision transformer: Leveraging dynamic programming for conditional sequence modelling in offline rl

T Yamagata, A Khalil… - … on Machine Learning, 2023 - proceedings.mlr.press

Recent works have shown that tackling offline reinforcement learning (RL) with a conditional
policy produces promising results. The Decision Transformer (DT) combines the conditional …

Enregistrer Citer Cité 81 fois Autres articles Les 9 versions Free GPT-4 DeepSeek Version HTML

Citer

Recherche avancée

Enregistré dans Ma bibliothèque

A survey on offline reinforcement learning: Taxonomy, review, and open problems

Pretraining language models with human preferences

Is conditional generative modeling all you need for decision-making?

Openchat: Advancing open-source language models with mixed-quality data

Contrastive learning as goal-conditioned reinforcement learning

Onenet: Enhancing time series forecasting models under concept drift by online ensembling

Autonomous evaluation and refinement of digital agents

When does return-conditioned supervised learning work for offline reinforcement learning?

Masked trajectory models for prediction, representation, and control

Q-learning decision transformer: Leveraging dynamic programming for conditional sequence modelling in offline rl