Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Adam-mini: Use fewer learning rates to gain more
We propose Adam-mini, an optimizer that achieves on par or better performance than
AdamW with 50% less memory footprint. Adam-mini reduces memory by cutting down the …
AdamW with 50% less memory footprint. Adam-mini reduces memory by cutting down the …
Soap: Improving and stabilizing shampoo using adam
There is growing evidence of the effectiveness of Shampoo, a higher-order preconditioning
method, over Adam in deep learning optimization tasks. However, Shampoo's drawbacks …
method, over Adam in deep learning optimization tasks. However, Shampoo's drawbacks …
Cautious optimizers: Improving training with one line of code
AdamW has been the default optimizer for transformer pretraining. For many years, our
community searches for faster and more stable optimizers with only constraint positive …
community searches for faster and more stable optimizers with only constraint positive …
Rethinking conventional wisdom in machine learning: From generalization to scaling
L **ao - arxiv preprint arxiv:2409.15156, 2024 - arxiv.org
The remarkable success of large language pretraining and the discovery of scaling laws
signify a paradigm shift in machine learning. Notably, the primary objective has evolved from …
signify a paradigm shift in machine learning. Notably, the primary objective has evolved from …
JaColBERTv2. 5: Optimising Multi-Vector Retrievers to Create State-of-the-Art Japanese Retrievers with Constrained Resources
B Clavié - arxiv preprint arxiv:2407.20750, 2024 - arxiv.org
Neural Information Retrieval has advanced rapidly in high-resource languages, but progress
in lower-resource ones such as Japanese has been hindered by data scarcity, among other …
in lower-resource ones such as Japanese has been hindered by data scarcity, among other …
4-bit Shampoo for Memory-Efficient Network Training
S Wang, P Zhou, J Li, H Huang - Advances in Neural …, 2025 - proceedings.neurips.cc
Second-order optimizers, maintaining a matrix termed a preconditioner, are superior to first-
order optimizers in both theory and practice. The states forming the preconditioner and its …
order optimizers in both theory and practice. The states forming the preconditioner and its …
How Does Critical Batch Size Scale in Pre-training?
Training large-scale models under given resources requires careful design of parallelism
strategies. In particular, the efficiency notion of critical batch size (CBS), concerning the …
strategies. In particular, the efficiency notion of critical batch size (CBS), concerning the …
An adaptive stochastic gradient method with non-negative gauss-newton stepsizes
We consider the problem of minimizing the average of a large number of smooth but
possibly non-convex functions. In the context of most machine learning applications, each …
possibly non-convex functions. In the context of most machine learning applications, each …
General framework for online-to-nonconvex conversion: Schedule-free SGD is also effective for nonconvex optimization
This work investigates the effectiveness of schedule-free methods, developed by A. Defazio
et al.(NeurIPS 2024), in nonconvex optimization settings, inspired by their remarkable …
et al.(NeurIPS 2024), in nonconvex optimization settings, inspired by their remarkable …
AI-driven skin cancer diagnosis: Grad-CAM and expert annotations for enhanced interpretability
An AI tool has been developed to provide interpretable support for the diagnosis of BCC via
teledermatology, thus speeding up referrals and optimizing resource utilization. The …
teledermatology, thus speeding up referrals and optimizing resource utilization. The …