Pioneering chiplet technology and design for the amd epyc™ and ryzen™ processor families: Industrial product

S Naffziger, N Beck, T Burd, K Lepak… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org
For decades, Moore's Law has delivered the ability to integrate an exponentially increasing
number of devices in the same silicon area at a roughly constant cost. This has enabled …

Data reorganization in memory using 3D-stacked DRAM

B Akin, F Franchetti, JC Hoe - ACM SIGARCH Computer Architecture …, 2015 - dl.acm.org
In this paper we focus on common data reorganization operations such as shuffle,
pack/unpack, swap, transpose, and layout transformations. Although these operations …

Integrated thermal analysis for processing in die-stacking memory

Y Zhu, B Wang, D Li, J Zhao - … of the Second International Symposium on …, 2016 - dl.acm.org
Recent application and technology trends bring a renaissance of the processing-in-memory
(PIM), which was envisioned decades ago. In particular, die-stacking and silicon interposer …

Bravo: Balanced reliability-aware voltage optimization

K Swaminathan, N Chandramoorthy… - … Symposium on High …, 2017 - ieeexplore.ieee.org
Defining a processor micro-architecture for a targeted productspace involves multi-
dimensional optimization across performance, power and reliability axes. A key decision in …

Exploring time and energy for complex accesses to a hybrid memory cube

J Schmidt, H Fröning, U Brüning - Proceedings of the Second …, 2016 - dl.acm.org
Through-Silicon Vias (TSVs) and three-dimensional die stacking technologies are enabling
a combination of DRAM and CMOS die layer within a single stack, leading to stacked …

MultiPULPly: A multiplication engine for accelerating neural networks on ultra-low-power architectures

A Eliahu, R Ronen, PE Gaillardon… - ACM Journal on Emerging …, 2021 - dl.acm.org
Computationally intensive neural network applications often need to run on resource-limited
low-power devices. Numerous hardware accelerators have been developed to speed up the …

Thermal-throttling server: A thermal-aware real-time task scheduling framework for three-dimensional multicore chips

TH Tsai, YS Chen - Journal of Systems and Software, 2016 - Elsevier
Abstract Three-dimensional (3D) multicore chips have been recently developed to deal with
the power consumption and interconnection delay problems of embedded systems; …

Beyond the lambertian assumption: A generative model for apparent brdf fields of faces using anti-symmetric tensor splines

A Barmpoutis, R Kumar, BC Vemuri… - 2008 IEEE Conference …, 2008 - ieeexplore.ieee.org
Human faces are neither exactly Lambertian nor entirely convex and hence most models in
literature which make the Lambertian assumption, fall short when dealing with specularities …

[PDF][PDF] A Formal Approach to Memory Access Optimization: Data Layout, Reorganization, and Near-Data Processing

B Akin - Data Processing, 2015 - kilthub.cmu.edu
The memory system is a major bottleneck in achieving high performance and energy
efficiency for various processing platforms. This thesis aims to improve memory performance …

The Macro-DSE for HPC Processing Unit: The Physical Constraints Perspective

Y Tang, L Wang, Y Deng, X Ni, Q Dou - … , GPC 2016, **'an, China, May 6-8 …, 2016 - Springer
Because of the popularity of big data and cloud computing, the evolution of
microarchitecture has to concentrated on raw computing ability, throughput, low power and …