Google Acadèmic

DM Anstine, O Isayev - Journal of the American Chemical Society, 2023 - ACS Publications

Traditional computational approaches to design chemical species are limited by the need to
compute properties for a vast number of candidates, eg, by discriminative modeling …

Desa Cita Citat per 185 Articles relacionats Totes les 6 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] springer.com

Deep learning: systematic review, models, challenges, and research directions

T Talaei Khoei, H Ould Slimane… - Neural Computing and …, 2023 - Springer

The current development in deep learning is witnessing an exponential transition into
automation applications. This automation transition can provide a promising framework for …

Desa Cita Citat per 135 Articles relacionats Totes les 6 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Open problems and fundamental limitations of reinforcement learning from human feedback

S Casper, X Davies, C Shi, TK Gilbert… - arxiv preprint arxiv …, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) is a technique for training AI systems
to align with human goals. RLHF has emerged as the central method used to finetune state …

Desa Cita Citat per 473 Articles relacionats Totes les 7 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

MiniLLM: Knowledge distillation of large language models

Y Gu, L Dong, F Wei, M Huang - arxiv preprint arxiv:2306.08543, 2023 - arxiv.org

Knowledge Distillation (KD) is a promising technique for reducing the high computational
demand of large language models (LLMs). However, previous KD methods are primarily …

Desa Cita Citat per 305 Articles relacionats Totes les 4 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Scaling up and distilling down: Language-guided robot skill acquisition

H Ha, P Florence, S Song - Conference on Robot Learning, 2023 - proceedings.mlr.press

We present a framework for robot skill acquisition, which 1) efficiently scale up data
generation of language-labelled robot data and 2) effectively distills this data down into a …

Desa Cita Citat per 132 Articles relacionats Totes les 7 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Is conditional generative modeling all you need for decision-making?

A Ajay, Y Du, A Gupta, J Tenenbaum… - arxiv preprint arxiv …, 2022 - arxiv.org

Recent improvements in conditional generative modeling have made it possible to generate
high-quality images from language descriptions alone. We investigate whether these …

Desa Cita Citat per 375 Articles relacionats Totes les 4 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Openchat: Advancing open-source language models with mixed-quality data

G Wang, S Cheng, X Zhan, X Li, S Song… - arxiv preprint arxiv …, 2023 - arxiv.org

Nowadays, open-source large language models like LLaMA have emerged. Recent
developments have incorporated supervised fine-tuning (SFT) and reinforcement learning …

Desa Cita Citat per 204 Articles relacionats Totes les 5 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Cal-ql: Calibrated offline rl pre-training for efficient online fine-tuning

M Nakamoto, S Zhai, A Singh… - Advances in …, 2023 - proceedings.neurips.cc

A compelling use case of offline reinforcement learning (RL) is to obtain a policy initialization
from existing datasets followed by fast online fine-tuning with limited interaction. However …

Desa Cita Citat per 121 Articles relacionats Totes les 7 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey of meta-reinforcement learning

J Beck, R Vuorio, EZ Liu, Z **ong, L Zintgraf… - arxiv preprint arxiv …, 2023 - arxiv.org

While deep reinforcement learning (RL) has fueled multiple high-profile successes in
machine learning, it is held back from more widespread adoption by its often poor data …

Desa Cita Citat per 176 Articles relacionats Totes les 2 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Vip: Towards universal visual reward and representation via value-implicit pre-training

YJ Ma, S Sodhani, D Jayaraman, O Bastani… - arxiv preprint arxiv …, 2022 - arxiv.org

Reward and representation learning are two long-standing challenges for learning an
expanding set of robot manipulation skills from sensory observations. Given the inherent …

Desa Cita Citat per 261 Articles relacionats Totes les 5 versions Free GPT-4 DeepSeek Versió HTML

Crea una alerta

Cita

Cerca avançada

S'ha desat a La meva biblioteca

Offline reinforcement learning: Tutorial, review, and perspectives on open problems

Generative models as an emerging paradigm in the chemical sciences

Deep learning: systematic review, models, challenges, and research directions

Open problems and fundamental limitations of reinforcement learning from human feedback

MiniLLM: Knowledge distillation of large language models

Scaling up and distilling down: Language-guided robot skill acquisition

Is conditional generative modeling all you need for decision-making?

Openchat: Advancing open-source language models with mixed-quality data

Cal-ql: Calibrated offline rl pre-training for efficient online fine-tuning

A survey of meta-reinforcement learning

Vip: Towards universal visual reward and representation via value-implicit pre-training