Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Entropic distribution matching in supervised fine-tuning of LLMs: Less overfitting and better diversity
Large language models rely on Supervised Fine-Tuning (SFT) to specialize in downstream
tasks. Cross Entropy (CE) loss is the de facto choice in SFT, but it often leads to overfitting …
tasks. Cross Entropy (CE) loss is the de facto choice in SFT, but it often leads to overfitting …
Stabilizing Q-learning with linear architectures for provable efficient learning
A Zanette, M Wainwright - International Conference on …, 2022 - proceedings.mlr.press
The Q-learning algorithm is a simple, fundamental and practically very effective
reinforcement learning algorithm. However, the basic protocol can exhibit an unstable …
reinforcement learning algorithm. However, the basic protocol can exhibit an unstable …
Provably and Practically Efficient Adversarial Imitation Learning with General Function Approximation
As a prominent category of imitation learning methods, adversarial imitation learning (AIL)
has garnered significant practical success powered by neural network approximation …
has garnered significant practical success powered by neural network approximation …
Event tables for efficient experience replay
Experience replay (ER) is a crucial component of many deep reinforcement learning (RL)
systems. However, uniform sampling from an ER buffer can lead to slow convergence and …
systems. However, uniform sampling from an ER buffer can lead to slow convergence and …
Efficient and scalable reinforcement learning via Hypermodel
Data-efficient reinforcement learning (RL) requires deep exploration. Thompson sampling is
a principled method for deep exploration in reinforcement learning. However, Thompson …
a principled method for deep exploration in reinforcement learning. However, Thompson …