Google Академик

S He, P He, Z Chen, T Yang, Y Su, MR Lyu - ACM computing surveys …, 2021 - dl.acm.org

Logs are semi-structured text generated by logging statements in software source code. In
recent decades, software logs have become imperative in the reliability assurance …

Сачувај Цитирај 265 пута наведен Сродни чланци Све верзије (9)

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

The landscape of exascale research: A data-driven literature analysis

S Heldens, P Hijma, BV Werkhoven… - ACM Computing …, 2020 - dl.acm.org

The next generation of supercomputers will break the exascale barrier. Soon we will have
systems capable of at least one quintillion (billion billion) floating-point operations per …

Сачувај Цитирај 69 пута наведен Сродни чланци Све верзије (7)

[Free GPT-4]
[DeepSeek]

[PDF] drj.com

Why does the cloud stop computing? lessons from hundreds of service outages

HS Gunawi, M Hao, RO Suminto, A Laksono… - Proceedings of the …, 2016 - dl.acm.org

We conducted a cloud outage study (COS) of 32 popular Internet services. We analyzed
1247 headline news and public post-mortem reports that detail 597 unplanned outages that …

Сачувај Цитирај 299 пута наведен Сродни чланци Све верзије (6)

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Failures in large scale systems: long-term measurement, analysis, and implications

S Gupta, T Patel, C Engelmann, D Tiwari - Proceedings of the …, 2017 - dl.acm.org

Resilience is one of the key challenges in maintaining high efficiency of future extreme scale
supercomputers. Researchers and system practitioners rely on field-data studies to …

Сачувај Цитирај 183 пута наведен Сродни чланци Све верзије (10)

[Free GPT-4]
[DeepSeek]

[PDF] osti.gov

Understanding GPU errors on large-scale HPC systems and the implications for system design and operation

D Tiwari, S Gupta, J Rogers, D Maxwell… - 2015 IEEE 21st …, 2015 - ieeexplore.ieee.org

Increase in graphics hardware performance and improvements in programmability has
enabled GPUs to evolve from a graphics-specific accelerator to a general-purpose …

Сачувај Цитирај 205 пута наведен Сродни чланци Све верзије (10)

[Free GPT-4]
[DeepSeek]

[PDF] tsinghua.edu.cn

What can we learn from four years of data center hardware failures?

G Wang, L Zhang, W Xu - 2017 47th Annual IEEE/IFIP …, 2017 - ieeexplore.ieee.org

Hardware failures have a big impact on the dependability of large-scale data centers. We
present studies on over 290,000 hardware failure reports collected over the past four years …

Сачувај Цитирај 151 пута наведен Сродни чланци Све верзије (9)

[Free GPT-4]
[DeepSeek]

[PDF] usenix.org

An analysis of {network-partitioning} failures in cloud systems

A Alquraan, H Takruri, M Alfatafta… - 13th USENIX Symposium …, 2018 - usenix.org

We present a comprehensive study of 136 system failures attributed to network-partitioning
faults from 25 widely used distributed systems. We found that the majority of the failures led …

Сачувај Цитирај 102 пута наведен Сродни чланци Све верзије (16) HTML верзија

Failure analysis of virtual and physical machines: Patterns, causes and characteristics

R Birke, I Giurgiu, LY Chen… - 2014 44th Annual …, 2014 - ieeexplore.ieee.org

In today's commercial data centers, the computation density grows continuously as the
number of hardware components and workloads in units of virtual machines increase. The …

Сачувај Цитирај 130 пута наведен Сродни чланци Све верзије (6)

[Free GPT-4]
[DeepSeek]

[PDF] semanticscholar.org

Failure analysis of jobs in compute clouds: A google cluster case study

X Chen, CD Lu, K Pattabiraman - 2014 IEEE 25th International …, 2014 - ieeexplore.ieee.org

In this paper, we analyze a workload trace from the Google cloud cluster and characterize
the observed failures. The goal of our work is to improve the understanding of failures in …

Сачувај Цитирај 128 пута наведен Сродни чланци Све верзије (5)

[Free GPT-4]
[DeepSeek]

[PDF] wm.edu

A large-scale study of soft-errors on GPUs in the field

B Nie, D Tiwari, S Gupta, E Smirni… - 2016 IEEE International …, 2016 - ieeexplore.ieee.org

Parallelism provided by the GPU architecture has enabled domain scientists to simulate
physical phenomena at a much faster rate and finer granularity than what was previously …

Сачувај Цитирај 109 пута наведен Сродни чланци Све верзије (9)

Направи обавештење

Цитирај

Напредна претрага

Сачувано у мојој библиотеци

Reading between the lines of failure logs: Understanding how HPC systems fail

A survey on automated log analysis for reliability engineering

The landscape of exascale research: A data-driven literature analysis

Why does the cloud stop computing? lessons from hundreds of service outages

Failures in large scale systems: long-term measurement, analysis, and implications

Understanding GPU errors on large-scale HPC systems and the implications for system design and operation

What can we learn from four years of data center hardware failures?

An analysis of {network-partitioning} failures in cloud systems

Failure analysis of virtual and physical machines: Patterns, causes and characteristics

Failure analysis of jobs in compute clouds: A google cluster case study

A large-scale study of soft-errors on GPUs in the field