Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
A survey on automated log analysis for reliability engineering
Logs are semi-structured text generated by logging statements in software source code. In
recent decades, software logs have become imperative in the reliability assurance …
recent decades, software logs have become imperative in the reliability assurance …
The landscape of exascale research: A data-driven literature analysis
The next generation of supercomputers will break the exascale barrier. Soon we will have
systems capable of at least one quintillion (billion billion) floating-point operations per …
systems capable of at least one quintillion (billion billion) floating-point operations per …
Why does the cloud stop computing? lessons from hundreds of service outages
We conducted a cloud outage study (COS) of 32 popular Internet services. We analyzed
1247 headline news and public post-mortem reports that detail 597 unplanned outages that …
1247 headline news and public post-mortem reports that detail 597 unplanned outages that …
Failures in large scale systems: long-term measurement, analysis, and implications
Resilience is one of the key challenges in maintaining high efficiency of future extreme scale
supercomputers. Researchers and system practitioners rely on field-data studies to …
supercomputers. Researchers and system practitioners rely on field-data studies to …
Understanding GPU errors on large-scale HPC systems and the implications for system design and operation
D Tiwari, S Gupta, J Rogers, D Maxwell… - 2015 IEEE 21st …, 2015 - ieeexplore.ieee.org
Increase in graphics hardware performance and improvements in programmability has
enabled GPUs to evolve from a graphics-specific accelerator to a general-purpose …
enabled GPUs to evolve from a graphics-specific accelerator to a general-purpose …
What can we learn from four years of data center hardware failures?
Hardware failures have a big impact on the dependability of large-scale data centers. We
present studies on over 290,000 hardware failure reports collected over the past four years …
present studies on over 290,000 hardware failure reports collected over the past four years …
An analysis of {network-partitioning} failures in cloud systems
We present a comprehensive study of 136 system failures attributed to network-partitioning
faults from 25 widely used distributed systems. We found that the majority of the failures led …
faults from 25 widely used distributed systems. We found that the majority of the failures led …
Failure analysis of virtual and physical machines: Patterns, causes and characteristics
In today's commercial data centers, the computation density grows continuously as the
number of hardware components and workloads in units of virtual machines increase. The …
number of hardware components and workloads in units of virtual machines increase. The …
Failure analysis of jobs in compute clouds: A google cluster case study
X Chen, CD Lu, K Pattabiraman - 2014 IEEE 25th International …, 2014 - ieeexplore.ieee.org
In this paper, we analyze a workload trace from the Google cloud cluster and characterize
the observed failures. The goal of our work is to improve the understanding of failures in …
the observed failures. The goal of our work is to improve the understanding of failures in …
A large-scale study of soft-errors on GPUs in the field
Parallelism provided by the GPU architecture has enabled domain scientists to simulate
physical phenomena at a much faster rate and finer granularity than what was previously …
physical phenomena at a much faster rate and finer granularity than what was previously …