ARQUIN: architectures for multinode superconducting quantum computers
Many proposals to scale quantum technology rely on modular or distributed designs
wherein individual quantum processors, called nodes, are linked together to form one large …
wherein individual quantum processors, called nodes, are linked together to form one large …
Dalorex: A data-local program execution and architecture for memory-bound applications
Applications with low data reuse and frequent irregular memory accesses, such as graph or
sparse linear algebra workloads, fail to scale well due to memory bottlenecks and poor core …
sparse linear algebra workloads, fail to scale well due to memory bottlenecks and poor core …
Muchisim: A simulation framework for design exploration of multi-chip manycore systems
The design space exploration of scaled-out manycores for communication-intensive
applications (eg, graph analytics and sparse linear algebra) is hampered due to either lack …
applications (eg, graph analytics and sparse linear algebra) is hampered due to either lack …
AutoCC: Automatic Discovery of Covert Channels in Time-Shared Hardware
Covert channels enable information leakage between security domains that should be
isolated by observing execution differences in shared hardware. These channels can …
isolated by observing execution differences in shared hardware. These channels can …
HotTiles: Accelerating SpMM with Heterogeneous Accelerator Architectures
Sparse Matrix Dense Matrix Multiplication (SpMM) is an important kernel with application
across a wide range of domains, including machine learning and linear algebra solvers. In …
across a wide range of domains, including machine learning and linear algebra solvers. In …
Cohort: Software-oriented acceleration for heterogeneous socs
Philosophically, our approaches to acceleration focus on the extreme. We must optimise
accelerators to the maximum, leaving software to fix any hardware-software mismatches …
accelerators to the maximum, leaving software to fix any hardware-software mismatches …
SMAPPIC: Scalable multi-FPGA architecture prototype platform in the cloud
Traditionally, architecture prototypes are built on top of FPGA infrastructure, with two
associated problems. First, very large FPGAs are prohibitively expensive for most people …
associated problems. First, very large FPGAs are prohibitively expensive for most people …
Seizing the bandwidth scaling of on-package interconnect in a post-Moore's law world
The slowing and forecasted end of Moore's Law have forced designers to look beyond
simply adding transistors, encouraging them to employ other unused resources as a manner …
simply adding transistors, encouraging them to employ other unused resources as a manner …
Massive data-centric parallelism in the chiplet era
Recent works have introduced task-based parallelization schemes to accelerate graph
search and sparse data-structure traversal, where some solutions scale up to thousands of …
search and sparse data-structure traversal, where some solutions scale up to thousands of …
An architecture interface and offload model for low-overhead, near-data, distributed accelerators
The performance and energy costs of coordinating and performing data movement have led
to proposals adding compute units and/or specialized access units to the memory hierarchy …
to proposals adding compute units and/or specialized access units to the memory hierarchy …