Google Академик

P He, J Zhu, S He, J Li, MR Lyu - 2016 46th annual IEEE/IFIP …, 2016 - ieeexplore.ieee.org

Logs, which record runtime information of modern systems, are widely utilized by developers
(and operators) in system development and maintenance. Due to the ever-increasing size of …

Сачувај Цитирај 333 пута наведен Сродни чланци Све верзије (9)

Towards automated log parsing for large-scale log data analysis

P He, J Zhu, S He, J Li, MR Lyu - IEEE Transactions on …, 2017 - ieeexplore.ieee.org

Logs are widely used in system management for dependability assurance because they are
often the only data available that record detailed system runtime behaviors in production …

Сачувај Цитирај 260 пута наведен Сродни чланци Све верзије (2)

[免费ChatGPT] [DeepSeek可用网址] [PDF] ieee.org

Landscape of automated log analysis: A systematic literature review and map** study

Ł Korzeniowski, K Goczyła - IEEE Access, 2022 - ieeexplore.ieee.org

Logging is a common practice in software engineering to provide insights into working
systems. The main uses of log files have always been failure identification and root cause …

Сачувај Цитирај 25 пута наведен Сродни чланци Све верзије (5)

[免费ChatGPT] [DeepSeek可用网址] [HTML] peerj.com

[HTML][HTML] Log-based software monitoring: a systematic map** study

J Cândido, M Aniche, A Van Deursen - PeerJ Computer Science, 2021 - peerj.com

Modern software development and operations rely on monitoring to understand how
systems behave in production. The data provided by application logs and runtime …

Сачувај Цитирај 46 пута наведен Сродни чланци Све верзије (13) Кеширано

[免费ChatGPT] [DeepSeek可用网址] [PDF] archive.org

Lessons learned from the analysis of system failures at petascale: The case of blue waters

C Di Martino, Z Kalbarczyk, RK Iyer… - 2014 44th Annual …, 2014 - ieeexplore.ieee.org

This paper provides an analysis of failures and their impact for Blue Waters, the Cray hybrid
(CPU/GPU) supercomputer at the University of Illinois at Urbana-Champaign. The analysis …

Сачувај Цитирај 281 пута наведен Сродни чланци Све верзије (4)

[免费ChatGPT] [DeepSeek可用网址] [PDF] acm.org

Desh: deep learning for system health prediction of lead times to failure in hpc

A Das, F Mueller, C Siegel, A Vishnu - Proceedings of the 27th …, 2018 - dl.acm.org

Today's large-scale supercomputers encounter faults on a daily basis. Exascale systems are
likely to experience even higher fault rates due to increased component count and density …

Сачувај Цитирај 115 пута наведен Сродни чланци Све верзије (3)

Measuring and understanding extreme-scale application resilience: A field study of 5,000,000 HPC application runs

C Di Martino, W Kramer, Z Kalbarczyk… - 2015 45th Annual IEEE …, 2015 - ieeexplore.ieee.org

This paper presents an in-depth characterization of the resiliency of more than 5 million HPC
application runs completed during the first 518 production days of Blue Waters, a 13.1 …

Сачувај Цитирај 85 пута наведен Сродни чланци Све верзије (4)

[免费ChatGPT] [DeepSeek可用网址] [PDF] academia.edu

Failure prediction for HPC systems and applications: Current situation and open issues

A Gainaru, F Cappello, M Snir… - … International journal of …, 2013 - journals.sagepub.com

As large-scale systems evolve towards post-petascale computing, it is crucial to focus on
providing fault-tolerance strategies that aim to minimize fault's effects on applications. By far …

Сачувај Цитирај 65 пута наведен Сродни чланци Све верзије (10)

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Big data meets hpc log analytics: Scalable approach to understanding systems at extreme scale

BH Park, S Hukerikar, R Adamson… - … on Cluster Computing …, 2017 - ieeexplore.ieee.org

Today's high-performance computing (HPC) systems are heavily instrumented, generating
logs containing information about abnormal events, such as critical conditions, faults, errors …

Сачувај Цитирај 36 пута наведен Сродни чланци Све верзије (10)

Logdiver: A tool for measuring resilience of extreme-scale systems and applications

CD Martino, S Jha, W Kramer, Z Kalbarczyk… - Proceedings of the 5th …, 2015 - dl.acm.org

This paper presents LogDiver, a tool for the analysis of application-level resiliency in
extreme-scale computing systems. The tool has been implemented to handle data …

Сачувај Цитирај 37 пута наведен Сродни чланци Све верзије (2)

Направи обавештење

Цитирај

Напредна претрага

Сачувано у мојој библиотеци

Assessing time coalescence techniques for the analysis of supercomputer logs

An evaluation study on log parsing and its use in log mining