Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Fmi: Fast and cheap message passing for serverless functions
Serverless functions provide elastic scaling and a fine-grained billing model, making
Function-as-a-Service (FaaS) an attractive programming model. However, for distributed …
Function-as-a-Service (FaaS) an attractive programming model. However, for distributed …
Massively parallel first-principles simulation of electron dynamics in materials
We present a highly scalable, parallel implementation of first-principles electron dynamics
coupled with molecular dynamics (MD). By using optimized kernels, network topology aware …
coupled with molecular dynamics (MD). By using optimized kernels, network topology aware …
Combing the communication hairball: Visualizing parallel execution traces using logical time
With the continuous rise in complexity of modern supercomputers, optimizing the
performance of large-scale parallel programs is becoming increasingly challenging …
performance of large-scale parallel programs is becoming increasingly challenging …
Identifying the culprits behind network congestion
Network congestion is one of the primary causes of performance degradation, performance
variability and poor scaling in communication-heavy parallel applications. However, the …
variability and poor scaling in communication-heavy parallel applications. However, the …
Democratizing AI: Open-source Scalable LLM Training on GPU-based Supercomputers
Training and fine-tuning large language models (LLMs) with hundreds of billions to trillions
of parameters requires tens of thousands of GPUs, and a highly scalable software stack. In …
of parameters requires tens of thousands of GPUs, and a highly scalable software stack. In …
Predicting application performance using supervised learning on communication features
Task map** on torus networks has traditionally focused on either reducing the maximum
dilation or average number of hops per byte for messages in an application. These metrics …
dilation or average number of hops per byte for messages in an application. These metrics …
Reducing communication in algebraic multigrid with multi-step node aware communication
Algebraic multigrid (AMG) is often viewed as a scalable O (n) solver for sparse linear
systems. Yet, AMG lacks parallel scalability due to increasingly large costs associated with …
systems. Yet, AMG lacks parallel scalability due to increasingly large costs associated with …
Quantum dynamics simulation of electrons in materials on high-performance computers
Advancement in high-performance computing allows us to calculate properties of
increasingly complex materials with unprecedented accuracy. At the same time, to take full …
increasingly complex materials with unprecedented accuracy. At the same time, to take full …
Performance optimality or reproducibility: that is the question
The era of extremely heterogeneous supercomputing brings with itself the devil of increased
performance variation and reduced reproducibility. There is a lack of understanding in the …
performance variation and reduced reproducibility. There is a lack of understanding in the …
Evaluation of an interference-free node allocation policy on fat-tree clusters
Interference between jobs competing for network bandwidth on a fat-tree cluster can cause
significant variability and degradation in performance. These performance issues can be …
significant variability and degradation in performance. These performance issues can be …