- Academic Search

A Gainaru, F Cappello, M Snir… - … International journal of …, 2013 - journals.sagepub.com

As large-scale systems evolve towards post-petascale computing, it is crucial to focus on
providing fault-tolerance strategies that aim to minimize fault's effects on applications. By far …

Save Cite Cited by 65 Related articles All 9 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] mst.edu

Environmental performance analysis of solid freedom fabrication processes

Y Luo, Z Ji, MC Leu, R Caudill - Proceedings of the 1999 IEEE …, 1999 - ieeexplore.ieee.org

This paper presents a method for analyzing the environmental performance of solid freeform
fabrication (SFF) processes. In this method, each process is divided into life phases …

Save Cite Cited by 214 Related articles All 10 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] nsf.gov

Failure prediction by utilizing log analysis: A systematic map** study

D Das, M Schiewe, E Brighton, M Fuller… - Proceedings of the …, 2020 - dl.acm.org

In modern computing, log files provide a wealth of information regarding the past of a
system, including the system failures and security breaches that cost companies and …

Save Cite Cited by 21 Related articles All 4 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] illinois.edu

Fault prediction under the microscope: A closer look into HPC systems

A Gainaru, F Cappello, M Snir… - SC'12: Proceedings of …, 2012 - ieeexplore.ieee.org

A large percentage of computing capacity in today's large high-performance computing
systems is wasted because of failures. Consequently current research is focusing on …

Save Cite Cited by 179 Related articles All 14 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] academia.edu

Toward automated anomaly identification in large-scale systems

Z Lan, Z Zheng, Y Li - IEEE Transactions on Parallel and …, 2009 - ieeexplore.ieee.org

When a system fails to function properly, health-related data are collected for
troubleshooting. However, it is challenging to effectively identify anomalies from the …

Save Cite Cited by 148 Related articles All 7 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] researchgate.net

[PDF][PDF] Ensemble of Bayesian predictors and decision trees for proactive failure management in cloud computing systems.

Q Guan, Z Zhang, S Fu - J. Commun., 2012 - researchgate.net

In modern cloud computing systems, hundreds and even thousands of cloud servers are
interconnected by multi-layer networks. In such large-scale and complex systems, failures …

Save Cite Cited by 123 Related articles All 4 versions Free GPT-4 DeepSeek View as HTML

Logmaster: Mining event correlations in logs of large-scale cluster systems

X Fu, R Ren, J Zhan, W Zhou, Z Jia… - 2012 IEEE 31st …, 2012 - ieeexplore.ieee.org

This paper presents a set of innovative algorithms and a system, named Log Master, for
mining correlations of events that have multiple attributions, ie, node ID, application ID, event …

Save Cite Cited by 121 Related articles All 5 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] iit.edu

Fault-aware, utility-based job scheduling on blue, gene/p systems

W Tang, Z Lan, N Desai… - 2009 IEEE International …, 2009 - ieeexplore.ieee.org

Job scheduling on large-scale systems is an increasingly complicated affair, with numerous
factors influencing scheduling policy. Addressing these concerns results in sophisticated …

Save Cite Cited by 124 Related articles All 9 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] academia.edu

Mining frequent itemsets in a stream

T Calders, N Dexters, JJM Gillis, B Goethals - Information Systems, 2014 - Elsevier

Mining frequent itemsets in a datastream proves to be a difficult problem, as itemsets arrive
in rapid succession and storing parts of the stream is typically impossible. Nonetheless, it …

Save Cite Cited by 156 Related articles All 28 versions Free GPT-4 DeepSeek

Taming of the shrew: Modeling the normal and faulty behaviour of large-scale HPC systems

A Gainaru, F Cappello, W Kramer - 2012 IEEE 26th …, 2012 - ieeexplore.ieee.org

HPC systems are complex machines that generate a huge volume of system state data
called" events". Events are generated without following a general consistent rule and …

Save Cite Cited by 112 Related articles All 5 versions Free GPT-4 DeepSeek

Create alert

Cite

Advanced search

Saved to My library

Dynamic meta-learning for failure prediction in large-scale systems: A case study

Failure prediction for HPC systems and applications: Current situation and open issues

Environmental performance analysis of solid freedom fabrication processes

Failure prediction by utilizing log analysis: A systematic map** study

Fault prediction under the microscope: A closer look into HPC systems

Toward automated anomaly identification in large-scale systems

[PDF][PDF] Ensemble of Bayesian predictors and decision trees for proactive failure management in cloud computing systems.

Logmaster: Mining event correlations in logs of large-scale cluster systems

Fault-aware, utility-based job scheduling on blue, gene/p systems

Mining frequent itemsets in a stream

Taming of the shrew: Modeling the normal and faulty behaviour of large-scale HPC systems