Follow
Xiaoyu Chu
Xiaoyu Chu
Verified email at vu.nl
Title
Cited by
Cited by
Year
How Do ML Jobs Fail in Datacenters? Analysis of a Long-Term Dataset from an HPC Cluster
X Chu, S Talluri, L Versluis, A Iosup
Companion of the 2023 ACM/SPEC International Conference on Performance …, 2023
52023
Generic and ML Workloads in an HPC Datacenter: Node Energy, Job Failures, and Node-Job Analysis
X Chu, D Hofstätter, S Ilager, S Talluri, D Kampert, D Podareanu, ...
2024 IEEE 30th International Conference on Parallel and Distributed Systems …, 2024
12024
An Empirical Characterization of Outages and Incidents in Public Services for Large Language Models
X Chu, S Talluri, Q Lu, A Iosup
arXiv preprint arXiv:2501.12469, 2025
2025
Enabling Operational Data Analytics for Datacenters through Ontologies, Monitoring, and Simulation-based Prediction
S Suman, X Chu, D Niewenhuis, S Talluri, T De Matteis, A Iosup
Companion of the 15th ACM/SPEC International Conference on Performance …, 2024
2024
The system can't perform the operation now. Try again later.
Articles 1–4