Google 학술 검색

S So, T Badloe, J Noh, J Bravo-Abad, J Rho - Nanophotonics, 2020 - degruyter.com

Deep learning has become the dominant approach in artificial intelligence to solve complex
data-driven problems. Originally applied almost exclusively in computer-science areas such …

저장 인용 488회 인용 관련 학술자료 전체 6개의 버전

[Free GPT-4]

[PDF] arxiv.org

Open problems and fundamental limitations of reinforcement learning from human feedback

S Casper, X Davies, C Shi, TK Gilbert… - arxiv preprint arxiv …, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) is a technique for training AI systems
to align with human goals. RLHF has emerged as the central method used to finetune state …

저장 인용 436회 인용 관련 학술자료 전체 6개의 버전 HTML 버전

[Free GPT-4]

[PDF] neurips.cc

A simple baseline for bayesian uncertainty in deep learning

WJ Maddox, P Izmailov, T Garipov… - Advances in neural …, 2019 - proceedings.neurips.cc

Abstract We propose SWA-Gaussian (SWAG), a simple, scalable, and general purpose
approach for uncertainty representation and calibration in deep learning. Stochastic Weight …

저장 인용 960회 인용 관련 학술자료 전체 9개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

Transformers in reinforcement learning: a survey

P Agarwal, AA Rahman, PL St-Charles… - arxiv preprint arxiv …, 2023 - arxiv.org

Transformers have significantly impacted domains like natural language processing,
computer vision, and robotics, where they improve performance compared to other neural …

저장 인용 17회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]

[PDF] mlr.press

Deterministic policy gradient algorithms

D Silver, G Lever, N Heess, T Degris… - International …, 2014 - proceedings.mlr.press

In this paper we consider deterministic policy gradient algorithms for reinforcement learning
with continuous actions. The deterministic policy gradient has a particularly appealing form …

[Free GPT-4]

[PDF] neurips.cc

When do flat minima optimizers work?

J Kaddour, L Liu, R Silva… - Advances in Neural …, 2022 - proceedings.neurips.cc

Recently, flat-minima optimizers, which seek to find parameters in low-loss neighborhoods,
have been shown to improve a neural network's generalization performance over stochastic …

저장 인용 81회 인용 관련 학술자료 전체 7개의 버전 HTML 버전

[Free GPT-4]

[PDF] neurips.cc

Erdos goes neural: an unsupervised learning framework for combinatorial optimization on graphs

N Karalias, A Loukas - Advances in Neural Information …, 2020 - proceedings.neurips.cc

Combinatorial optimization (CO) problems are notoriously challenging for neural networks,
especially in the absence of labeled instances. This work proposes an unsupervised …

저장 인용 135회 인용 관련 학술자료 전체 8개의 버전 HTML 버전

[Free GPT-4]

[PDF] mdpi.com

A deep reinforcement-learning approach for inverse kinematics solution of a high degree of freedom robotic manipulator

A Malik, Y Lischuk, T Henderson, R Prazenica - Robotics, 2022 - mdpi.com

The foundation and emphasis of robotic manipulator control is Inverse Kinematics (IK). Due
to the complexity of derivation, difficulty of computation, and redundancy, traditional IK …

저장 인용 44회 인용 관련 학술자료 전체 3개의 버전 저장된 페이지

[Free GPT-4]

[PDF] arxiv.org

Warp: On the benefits of weight averaged rewarded policies

A Ramé, J Ferret, N Vieillard, R Dadashi… - arxiv preprint arxiv …, 2024 - arxiv.org

Reinforcement learning from human feedback (RLHF) aligns large language models (LLMs)
by encouraging their generations to have high rewards, using a reward model trained on …

저장 인용 9회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]

[PDF] neurips.cc

Policy optimization in a noisy neighborhood: On return landscapes in continuous control

N Rahn, P D'Oro, H Wiltzer… - Advances in Neural …, 2024 - proceedings.neurips.cc

Deep reinforcement learning agents for continuous control are known to exhibit significant
instability in their performance over time. In this work, we provide a fresh perspective on …

저장 인용 4회 인용 관련 학술자료 전체 6개의 버전 HTML 버전

알림 만들기

인용

고급 검색

라이브러리에 저장됨

Improving stability in deep reinforcement learning with weight averaging

Deep learning enabled inverse design in nanophotonics

Open problems and fundamental limitations of reinforcement learning from human feedback

A simple baseline for bayesian uncertainty in deep learning

Transformers in reinforcement learning: a survey

Deterministic policy gradient algorithms

When do flat minima optimizers work?

Erdos goes neural: an unsupervised learning framework for combinatorial optimization on graphs

A deep reinforcement-learning approach for inverse kinematics solution of a high degree of freedom robotic manipulator

Warp: On the benefits of weight averaged rewarded policies

Policy optimization in a noisy neighborhood: On return landscapes in continuous control