Characterizing and mitigating soft errors in gpu dram

MB Sullivan, N Saxena, M O'Connor, D Lee… - MICRO-54: 54th Annual …, 2021 - dl.acm.org
GPUs are used in high-reliability systems, including high-performance computers and
autonomous vehicles. Because GPUs employ a high-bandwidth, wide-interface to DRAM …

Matrix codes for reliable and cost efficient memory chips

C Argyrides, DK Pradhan… - IEEE Transactions on Very …, 2009 - ieeexplore.ieee.org
This paper presents a method to protect memories against multiple bit upsets and to improve
manufacturing yield. The proposed method, called a Matrix code, combines Hamming and …

Hamming SEC-DAED and extended hamming SEC-DED-TAED codes through selective shortening and bit placement

A Sanchez-Macian, P Reviriego… - IEEE Transactions on …, 2012 - ieeexplore.ieee.org
Radiation particles can impact registers or memories creating soft errors. These errors can
modify more than one bit causing a multiple cell upset (MCU) which consists of errors in …

A new SEC-DED error correction code subclass for adjacent MBU tolerance in embedded memory

A Neale, M Sachdev - IEEE Transactions on Device and …, 2012 - ieeexplore.ieee.org
The reliability concern associated with radiation-induced soft errors in embedded memories
increases as semiconductor technology scales deep into the sub-40-nm regime. As the …

Characterizing SRAM and FF soft error rates with measurement and simulation

M Hashimoto, K Kobayashi, J Furuta, SI Abe… - Integration, 2019 - Elsevier
Soft error originating from cosmic ray is a serious concern for reliability demanding
applications, such as autonomous driving, supercomputer, and public transportation system …

MCU tolerance in SRAMs through low-redundancy triple adjacent error correction

LJ Saiz-Adalid, P Reviriego, P Gil… - … Transactions on Very …, 2014 - ieeexplore.ieee.org
Static random access memories (SRAMs) are key in electronic systems. They are used not
only as standalone devices, but also embedded in application specific integrated circuits …

Unity ECC: Unified Memory Protection Against Bit and Chip Errors

D Kim, J Lee, W Jung, M Sullivan, J Kim - Proceedings of the …, 2023 - dl.acm.org
DRAM vendors utilize On-Die Error Correction Codes (OD-ECC) to correct random bit errors
internally. Meanwhile, system companies utilize Rank-Level ECC (RL-ECC) to protect data …

Architectural enhancements in Stratix V™

D Lewis, D Cashman, M Chan, J Chromczak… - Proceedings of the …, 2013 - dl.acm.org
This paper describes architectural enhancements in the Altera Stratix-V" FPGA architecture,
built on a 28nm TSMC process, together with the data supporting those choices. Among the …

Extending 3-bit burst error-correction codes with quadruple adjacent error correction

J Li, P Reviriego, L **ao… - IEEE Transactions on …, 2017 - ieeexplore.ieee.org
The use of error-correction codes (ECCs) with advanced correction capability is a common
system-level strategy to harden the memory against multiple bit upsets (MBUs). Therefore …

Adjacent-MBU-tolerant SEC-DED-TAEC-yAED codes for embedded SRAMs

A Neale, M Jonkman, M Sachdev - IEEE Transactions on …, 2014 - ieeexplore.ieee.org
As technology scaling increases embedded static random access memory bit-cell density,
the number of soft errors due to radiation-induced multiple-bit upsets (MBUs) also increases …