Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
A review of sparse expert models in deep learning
Sparse expert models are a thirty-year old concept re-emerging as a popular architecture in
deep learning. This class of architecture encompasses Mixture-of-Experts, Switch …
deep learning. This class of architecture encompasses Mixture-of-Experts, Switch …
Branch-train-merge: Embarrassingly parallel training of expert language models
We present Branch-Train-Merge (BTM), a communication-efficient algorithm for
embarrassingly parallel training of large language models (LLMs). We show it is possible to …
embarrassingly parallel training of large language models (LLMs). We show it is possible to …
Continual pre-training of large language models: How to (re) warm your model?
Large language models (LLMs) are routinely pre-trained on billions of tokens, only to restart
the process over again once new data becomes available. A much cheaper and more …
the process over again once new data becomes available. A much cheaper and more …
Unified scaling laws for routed language models
A Clark, D de Las Casas, A Guy… - International …, 2022 - proceedings.mlr.press
The performance of a language model has been shown to be effectively modeled as a
power-law in its parameter count. Here we study the scaling behaviors of Routing Networks …
power-law in its parameter count. Here we study the scaling behaviors of Routing Networks …
Dynamically expandable graph convolution for streaming recommendation
Personalized recommender systems have been widely studied and deployed to reduce
information overload and satisfy users' diverse needs. However, conventional …
information overload and satisfy users' diverse needs. However, conventional …
Progfed: effective, communication, and computation efficient federated learning by progressive training
Federated learning is a powerful distributed learning scheme that allows numerous edge
devices to collaboratively train a model without sharing their data. However, training is …
devices to collaboratively train a model without sharing their data. However, training is …
Learning equi-angular representations for online continual learning
Online continual learning suffers from an underfitted solution due to insufficient training for
prompt model updates (eg single-epoch training). To address the challenge we propose an …
prompt model updates (eg single-epoch training). To address the challenge we propose an …
Nevis' 22: A stream of 100 tasks sampled from 30 years of computer vision research
A shared goal of several machine learning communities like continual learning, meta-
learning and transfer learning, is to design algorithms and models that efficiently and …
learning and transfer learning, is to design algorithms and models that efficiently and …
Just say the name: Online continual learning with category names only via data generation
Requiring extensive human supervision is often impractical for continual learning due to its
cost, leading to the emergence of'name-only continual learning'that only provides the name …
cost, leading to the emergence of'name-only continual learning'that only provides the name …
When does re-initialization work?
Re-initializing a neural network during training has been observed to improve generalization
in recent works. Yet it is neither widely adopted in deep learning practice nor is it often used …
in recent works. Yet it is neither widely adopted in deep learning practice nor is it often used …