- Academic Search

Turnitin 降AI改写早检测系统早降重系统 Turnitin-UK版万方检测-期刊版维普编辑部版 Grammarly检测 Paperpass检测 checkpass检测 PaperYY检测

Entropic distribution matching in supervised fine-tuning of LLMs: Less overfitting and better diversity

Z Li, C Chen, T Xu, Z Qin, J **ao, R Sun… - arxiv preprint arxiv …, 2024 - arxiv.org

Large language models rely on Supervised Fine-Tuning (SFT) to specialize in downstream
tasks. Cross Entropy (CE) loss is the de facto choice in SFT, but it often leads to overfitting …

Lagre Referanse Sitert av 2 Beslektede artikler Alle 4 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Stabilizing Q-learning with linear architectures for provable efficient learning

A Zanette, M Wainwright - International Conference on …, 2022 - proceedings.mlr.press

The Q-learning algorithm is a simple, fundamental and practically very effective
reinforcement learning algorithm. However, the basic protocol can exhibit an unstable …

Lagre Referanse Sitert av 6 Beslektede artikler Alle 7 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Provably and Practically Efficient Adversarial Imitation Learning with General Function Approximation

T Xu, Z Zhang, R Chen, Y Sun, Y Yu - arxiv preprint arxiv:2411.00610, 2024 - arxiv.org

As a prominent category of imitation learning methods, adversarial imitation learning (AIL)
has garnered significant practical success powered by neural network approximation …

Lagre Referanse Beslektede artikler Alle 3 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Event tables for efficient experience replay

V Kompella, TJ Walsh, S Barrett, P Wurman… - arxiv preprint arxiv …, 2022 - arxiv.org

Experience replay (ER) is a crucial component of many deep reinforcement learning (RL)
systems. However, uniform sampling from an ER buffer can lead to slow convergence and …

Lagre Referanse Sitert av 3 Beslektede artikler Alle 5 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Efficient and scalable reinforcement learning via Hypermodel

Y Li, J Xu, ZQ Luo - … on Adaptive Experimental Design and Active …, 2023 - openreview.net

Data-efficient reinforcement learning (RL) requires deep exploration. Thompson sampling is
a principled method for deep exploration in reinforcement learning. However, Thompson …

Lagre Referanse Sitert av 2 Beslektede artikler Alle 2 versjoner HTML-versjon

Referanse

Avansert søk

Lagret i Mitt bibliotek

Entropic distribution matching in supervised fine-tuning of LLMs: Less overfitting and better diversity

Stabilizing Q-learning with linear architectures for provable efficient learning

Provably and Practically Efficient Adversarial Imitation Learning with General Function Approximation

Event tables for efficient experience replay

Efficient and scalable reinforcement learning via Hypermodel