Tiny but mighty: designing and realizing scalable latency tolerance for manycore SoCs
Modern computing systems employ significant heterogeneity and specialization to meet
performance targets at manageable power. However, memory latency bottlenecks remain …
performance targets at manageable power. However, memory latency bottlenecks remain …
Dalorex: A data-local program execution and architecture for memory-bound applications
Applications with low data reuse and frequent irregular memory accesses, such as graph or
sparse linear algebra workloads, fail to scale well due to memory bottlenecks and poor core …
sparse linear algebra workloads, fail to scale well due to memory bottlenecks and poor core …
Muchisim: A simulation framework for design exploration of multi-chip manycore systems
The design space exploration of scaled-out manycores for communication-intensive
applications (eg, graph analytics and sparse linear algebra) is hampered due to either lack …
applications (eg, graph analytics and sparse linear algebra) is hampered due to either lack …
Massive data-centric parallelism in the chiplet era
Recent works have introduced task-based parallelization schemes to accelerate graph
search and sparse data-structure traversal, where some solutions scale up to thousands of …
search and sparse data-structure traversal, where some solutions scale up to thousands of …
DECADES: A 67mm2, 1.46TOPS, 55 Giga Cache-Coherent 64-bit RISC-V Instructions per second, Heterogeneous Manycore SoC with 109 Tiles including …
As Moore's Law is coming to an end, heterogeneous SoCs have become ubiquitous,
improving performance and efficiency with specialized hardware. However, the addition of …
improving performance and efficiency with specialized hardware. However, the addition of …
The implications of page size management on graph analytics
Graph representations of data are ubiquitous in analytic applications. However, graph
workloads are notorious for having irregular memory access patterns with variable access …
workloads are notorious for having irregular memory access patterns with variable access …
DCRA: A distributed chiplet-based reconfigurable architecture for irregular applications
In recent years, the growing demand to process large graphs and sparse datasets has led to
increased research efforts to develop hardware-and software-based architectural solutions …
increased research efforts to develop hardware-and software-based architectural solutions …
Tascade: Hardware support for atomic-free, asynchronous and efficient reduction trees
As system parallelism at chip-and server-level increases, challenges that arose with network-
level systems a decade ago, are now being encountered with these massively parallel …
level systems a decade ago, are now being encountered with these massively parallel …
In-Memory Compute with Off-the-Shelf DRAMs and Efficient On-Chip Data Supply for Heterogeneous SoCs
F Gao - 2024 - search.proquest.com
In-memory computing has long been promised as a solution to the “Memory Wall” problem.
Unfortunately, performing computations with memory resources either has relied on …
Unfortunately, performing computations with memory resources either has relied on …
Navigating Heterogeneity and Scalability in Modern Chip Design
M Orenes-Vera - 2024 - search.proquest.com
Computing systems have become ubiquitous in the modern world but their design is far from
one-size-fits-all. From battery-powered devices to supercomputers, deployment …
one-size-fits-all. From battery-powered devices to supercomputers, deployment …