The lightweight distributed metric service: a scalable infrastructure for continuous monitoring of large scale computing systems and applications A Agelastos, B Allan, J Brandt, P Cassella, J Enos, J Fullop, A Gentile, ... SC'14: Proceedings of the International Conference for High Performance …, 2014 | 305 | 2014 |
Reaction chemistry and optimization of plasma remediation of NxOy from gas streams AC Gentile, MJ Kushner Journal of applied physics 78 (3), 2074-2085, 1995 | 277 | 1995 |
Resource monitoring and management with OVIS to enable HPC in cloud computing environments J Brandt, A Gentile, J Mayo, P Pebay, D Roe, D Thompson, M Wong 2009 IEEE International Symposium on Parallel & Distributed Processing, 1-8, 2009 | 100 | 2009 |
Microstreamer dynamics during plasma remediation of NO using atmospheric pressure dielectric barrier discharges AC Gentile, MJ Kushner Journal of applied physics 79 (8), 3877-3885, 1996 | 70 | 1996 |
OVIS-2: A robust distributed architecture for scalable RAS JM Brandt, BJ Debusschere, AC Gentile, JR Mayo, PP Pébay, ... 2008 IEEE International Symposium on Parallel and Distributed Processing, 1-8, 2008 | 51 | 2008 |
Baler: deterministic, lossless log message clustering tool N Taerat, J Brandt, A Gentile, M Wong, C Leangsuksun Computer Science-Research and Development 26 (3), 285-295, 2011 | 44 | 2011 |
Toward Rapid Understanding of Production HPC Applications and Systems A Agelastos, B Allan, J Brandt, A Gentile, S Lefantzi, S Monk, J Ogden, ... Cluster Computing (CLUSTER), 2015 IEEE International Conference on, 464-473, 2015 | 40 | 2015 |
Ovis: a tool for intelligent, real-time monitoring of computational clusters JM Brandt, AC Gentile, DJ Hale, PP Pébay Proceedings 20th IEEE International Parallel & Distributed Processing …, 2006 | 35 | 2006 |
Integrating low-latency analysis into HPC system monitoring R Izadpanah, N Naksinehaboon, J Brandt, A Gentile, D Dechev Proceedings of the 47th International Conference on Parallel Processing, 1-10, 2018 | 34 | 2018 |
Measuring Congestion in {High-Performance} Datacenter Interconnects S Jha, A Patke, J Brandt, A Gentile, B Lim, M Showerman, G Bauer, ... 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI …, 2020 | 29 | 2020 |
Demonstrating improved application performance using dynamic monitoring and task mapping J Brandt, K Devine, A Gentile, K Pedretti 2014 IEEE International Conference on Cluster Computing (CLUSTER), 408-415, 2014 | 28 | 2014 |
Methodologies for advance warning of compute cluster problems via statistical analysis: A case study J Brandt, A Gentile, J Mayo, P Pébay, D Roe, D Thompson, M Wong Proceedings of the 2009 workshop on Resiliency in high performance, 7-14, 2009 | 28 | 2009 |
Using probabilistic characterization to reduce runtime faults in HPC systems J Brandt, B Debusschere, A Gentile, J Mayo, P Pébay, D Thompson, ... 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid …, 2008 | 24 | 2008 |
Overtime: A tool for analyzing performance variation due to network interference RE Grant, KT Pedretti, A Gentile Proceedings of the 3rd Workshop on Exascale MPI, 1-10, 2015 | 23 | 2015 |
Enabling Advanced Operational Analysis Through Multi-subsystem Data Integration on Trinity. JM Brandt, D DeBonis, AC Gentile, J Lujan, C Martin, DJ Martinez, ... Sandia National Lab.(SNL-CA), Livermore, CA (United States); Sandia National …, 2015 | 23 | 2015 |
Filtering log data: Finding the needles in the haystack L Yu, Z Zheng, Z Lan, T Jones, JM Brandt, AC Gentile IEEE/IFIP International Conference on Dependable Systems and Networks (DSN …, 2012 | 23 | 2012 |
Quantifying effectiveness of failure prediction and response in HPC systems: Methodology and example J Brandt, F Chen, V De Sapio, A Gentile, J Mayo, P Pèbay, D Roe, ... 2010 International Conference on Dependable Systems and Networks Workshops …, 2010 | 21 | 2010 |
Design and Implementation of a Scalable Monitoring System for Trinity. A DeConinck, A Bonnie, K Kelly, S Sanchez, C Martin, M Mason, ... Sandia National Lab.(SNL-NM), Albuquerque, NM (United States), 2016 | 20 | 2016 |
Lilith: Scalable execution of user code for distributed computing DA Evensky, AC Gentile, LJ Camp, RC Armstrong Proceedings. The Sixth IEEE International Symposium on High Performance …, 1997 | 20 | 1997 |
Continuous whole-system monitoring toward rapid understanding of production HPC applications and systems A Agelastos, B Allan, J Brandt, A Gentile, S Lefantzi, S Monk, J Ogden, ... Parallel Computing 58, 90-106, 2016 | 19 | 2016 |