Software fault tolerance in real-time systems: Identifying the future research questions
Tolerating hardware faults in modern architectures is becoming a prominent problem due to
the miniaturization of the hardware components, their increasing complexity, and the …
the miniaturization of the hardware components, their increasing complexity, and the …
Serving {DNNs} like clockwork: Performance predictability from the bottom up
Machine learning inference is becoming a core building block for interactive web
applications. As a result, the underlying model serving systems on which these applications …
applications. As a result, the underlying model serving systems on which these applications …
Cores that don't count
PH Hochschild, P Turner, JC Mogul… - Proceedings of the …, 2021 - dl.acm.org
We are accustomed to thinking of computers as fail-stop, especially the cores that execute
instructions, and most system software implicitly relies on that assumption. During most of …
instructions, and most system software implicitly relies on that assumption. During most of …
Taming performance variability
The performance of compute hardware varies: software run repeatedly on the same server
(or a different server with supposedly identical parts) can produce performance results that …
(or a different server with supposedly identical parts) can produce performance results that …
Understanding silent data corruptions in a large production cpu population
Silent Data Corruption (SDC) in processors can lead to various application-level issues,
such as incorrect calculations and even data loss. Since traditional techniques are not …
such as incorrect calculations and even data loss. Since traditional techniques are not …
Don't be a blockhead: zoned namespaces make work on conventional SSDs obsolete
T Stavrinos, DS Berger, E Katz-Bassett… - Proceedings of the …, 2021 - dl.acm.org
Research on flash devices almost exclusively focuses on conventional SSDs, which expose
a block interface. Industry, however, has standardized and is adopting Zoned Namespaces …
a block interface. Industry, however, has standardized and is adopting Zoned Namespaces …
Analog-to-digital conversion of information archived in display holograms: I. discussion
This discussion paper highlights the potential of display holograms in the storage of
information about objects' shape. The images recorded and reconstructed from holograms …
information about objects' shape. The images recorded and reconstructed from holograms …
Aggregathor: Byzantine machine learning via robust gradient aggregation
G Damaskinos, EM El-Mhamdi… - Proceedings of …, 2019 - proceedings.mlsys.org
We present AGGREGATHOR, a framework that implements state-of-the-art robust
(Byzantine-resilient) distributed stochastic gradient descent. Following the standard …
(Byzantine-resilient) distributed stochastic gradient descent. Following the standard …
Perseus: A {Fail-Slow} detection framework for cloud storage systems
The newly-emerging''fail-slow''failures plague both software and hardware where the victim
components are still functioning yet with degraded performance. To address this problem …
components are still functioning yet with degraded performance. To address this problem …
Unicorn: Reasoning about configurable system performance through the lens of causality
Modern computer systems are highly configurable, with the total variability space sometimes
larger than the number of atoms in the universe. Understanding and reasoning about the …
larger than the number of atoms in the universe. Understanding and reasoning about the …