Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Orion: Interference-aware, fine-grained GPU sharing for ML applications
GPUs are critical for maximizing the throughput-per-Watt of deep neural network (DNN)
applications. However, DNN applications often underutilize GPUs, even when using large …
applications. However, DNN applications often underutilize GPUs, even when using large …
Training and serving system of foundation models: A comprehensive survey
Foundation models (eg, ChatGPT, DALL-E, PengCheng Mind, PanGu-) have demonstrated
extraordinary performance in key technological areas, such as natural language processing …
extraordinary performance in key technological areas, such as natural language processing …
Decentralized bilevel optimization
Bilevel optimization has been successfully applied to many important machine learning
problems. Algorithms for solving bilevel optimization have been studied under various …
problems. Algorithms for solving bilevel optimization have been studied under various …
Merak: An efficient distributed dnn training framework with automated 3d parallelism for giant foundation models
Foundation models are in the process of becoming the dominant deep learning technology.
Pretraining a foundation model is always time-consuming due to the large scale of both the …
Pretraining a foundation model is always time-consuming due to the large scale of both the …
[PDF][PDF] Daphne: An open and extensible system infrastructure for integrated data analysis pipelines
Integrated data analysis (IDA) pipelines---that combine data management (DM) and query
processing, high-performance computing (HPC), and machine learning (ML) training and …
processing, high-performance computing (HPC), and machine learning (ML) training and …
Persia: An open, hybrid system scaling deep learning-based recommenders up to 100 trillion parameters
Recent years have witnessed an exponential growth of model scale in deep learning-based
recommender systems---from Google's 2016 model with 1 billion parameters to the latest …
recommender systems---from Google's 2016 model with 1 billion parameters to the latest …
Bluefog: Make decentralized algorithms practical for optimization and deep learning
Decentralized algorithm is a form of computation that achieves a global goal through local
dynamics that relies on low-cost communication between directly-connected agents. On …
dynamics that relies on low-cost communication between directly-connected agents. On …
Fine-tuning language models over slow networks using activation quantization with guarantees
Communication compression is a crucial technique for modern distributed learning systems
to alleviate their communication bottlenecks over slower networks. Despite recent intensive …
to alleviate their communication bottlenecks over slower networks. Despite recent intensive …
Prophet: Fine-grained load balancing for parallel training of large-scale moe models
Mixture of Expert (MoE) has received increasing attention for scaling DNN models to extra-
large size with negligible increases in computation. The MoE model has achieved the …
large size with negligible increases in computation. The MoE model has achieved the …
A multidimensional communication scheduling method for hybrid parallel dnn training
The transformer-based deep neural network (DNN) models have shown considerable
success across diverse tasks, prompting widespread adoption of distributed training …
success across diverse tasks, prompting widespread adoption of distributed training …