Tiny but mighty: designing and realizing scalable latency tolerance for manycore SoCs

M Orenes-Vera, A Manocha, J Balkind, F Gao… - Proceedings of the 49th …, 2022 - dl.acm.org
Modern computing systems employ significant heterogeneity and specialization to meet
performance targets at manageable power. However, memory latency bottlenecks remain …

Dalorex: A data-local program execution and architecture for memory-bound applications

M Orenes-Vera, E Tureci, D Wentzlaff… - … Symposium on High …, 2023 - ieeexplore.ieee.org
Applications with low data reuse and frequent irregular memory accesses, such as graph or
sparse linear algebra workloads, fail to scale well due to memory bottlenecks and poor core …

Muchisim: A simulation framework for design exploration of multi-chip manycore systems

M Orenes-Vera, E Tureci, M Martonosi… - … Analysis of Systems …, 2024 - ieeexplore.ieee.org
The design space exploration of scaled-out manycores for communication-intensive
applications (eg, graph analytics and sparse linear algebra) is hampered due to either lack …

Massive data-centric parallelism in the chiplet era

M Orenes-Vera, E Tureci, D Wentzlaff… - arxiv preprint arxiv …, 2023 - arxiv.org
Recent works have introduced task-based parallelization schemes to accelerate graph
search and sparse data-structure traversal, where some solutions scale up to thousands of …

DECADES: A 67mm2, 1.46TOPS, 55 Giga Cache-Coherent 64-bit RISC-V Instructions per second, Heterogeneous Manycore SoC with 109 Tiles including …

F Gao, TJ Chang, A Li, M Orenes-Vera… - 2023 IEEE Custom …, 2023 - ieeexplore.ieee.org
As Moore's Law is coming to an end, heterogeneous SoCs have become ubiquitous,
improving performance and efficiency with specialized hardware. However, the addition of …

The implications of page size management on graph analytics

A Manocha, Z Yan, E Tureci, JL Aragón… - 2022 IEEE …, 2022 - ieeexplore.ieee.org
Graph representations of data are ubiquitous in analytic applications. However, graph
workloads are notorious for having irregular memory access patterns with variable access …

DCRA: A distributed chiplet-based reconfigurable architecture for irregular applications

M Orenes-Vera, E Tureci, M Martonosi… - arxiv preprint arxiv …, 2023 - arxiv.org
In recent years, the growing demand to process large graphs and sparse datasets has led to
increased research efforts to develop hardware-and software-based architectural solutions …

Tascade: Hardware support for atomic-free, asynchronous and efficient reduction trees

M Orenes-Vera, E Tureci, D Wentzlaff… - arxiv preprint arxiv …, 2023 - arxiv.org
As system parallelism at chip-and server-level increases, challenges that arose with network-
level systems a decade ago, are now being encountered with these massively parallel …

In-Memory Compute with Off-the-Shelf DRAMs and Efficient On-Chip Data Supply for Heterogeneous SoCs

F Gao - 2024 - search.proquest.com
In-memory computing has long been promised as a solution to the “Memory Wall” problem.
Unfortunately, performing computations with memory resources either has relied on …

Navigating Heterogeneity and Scalability in Modern Chip Design

M Orenes-Vera - 2024 - search.proquest.com
Computing systems have become ubiquitous in the modern world but their design is far from
one-size-fits-all. From battery-powered devices to supercomputers, deployment …