Organizing the last line of defense before hitting the memory wall for CMPs

C Liu, A Sivasubramaniam… - … Symposium on High …, 2004 - ieeexplore.ieee.org
The last line of defense in the cache hierarchy before going to off-chip memory is very critical
in chip multiprocessors (CMPs) from both the performance and power perspectives. We …

Analysis, transformation and optimization for high perfomance parallel computing

AA Prihozhy - 2019 - rep.bntu.by
This book studies hardware and software specifications at algorithmic level from the point of
measuring and extracting the potential parallelism hidden in them. It investigates the …

Auto-partitioning heterogeneous task-parallel programs with streamblocks

M Emami, E Bezati, JW Janneck, JR Larus - Proceedings of the …, 2022 - dl.acm.org
FPGAs play an increasing role in the reconfigurable accelerator landscape. A key challenge
in designing FPGA-based systems is partitioning computation between processor cores and …

Pipeline synthesis and optimization from branched feedback dataflow programs

A Prihozhy, S Casale-Brunet, E Bezati… - Journal of Signal …, 2020 - Springer
Large dataflow designs are a result of behavioral specification of modern complex digital
systems and/or a result of unfolding and transforming looped and branched programs. Since …

Turnus: A design exploration framework for dataflow system design

SC Brunei, M Mattavelli… - 2013 IEEE International …, 2013 - ieeexplore.ieee.org
While research on the design of heterogeneous concurrent systems has a long and rich
history, a unified design methodology and tool support has not emerged so far, and thus the …

Turnus: a unified dataflow design space exploration framework for heterogeneous parallel systems

S Casale-Brunet, C Alberti, M Mattavelli… - 2013 Conference on …, 2013 - ieeexplore.ieee.org
This paper presents the main features of the TURNUS co-exploration environment, an
unified design space exploration framework suitable for heterogeneous parallel systems …

Buffer optimization based on critical path analysis of a dataflow program design

SC Brunet, M Mattavelli… - 2013 IEEE International …, 2013 - ieeexplore.ieee.org
The trade-off between throughput and memory constraints is a common design problem in
embedded systems, and especially for streaming applications, where the memory in …

Streamblocks: A compiler for heterogeneous dataflow computing (technical report)

E Bezati, M Emami, J Janneck, J Larus - arxiv preprint arxiv:2107.09333, 2021 - arxiv.org
To increase performance and efficiency, systems use FPGAs as reconfigurable accelerators.
A key challenge in designing these systems is partitioning computation between processors …

Synthesis and optimization of pipelines for HW implementations of dataflow programs

A Prihozhy, E Bezati, AAH Ab Rahman… - IEEE Transactions on …, 2015 - ieeexplore.ieee.org
This paper introduces a new methodology for pipeline synthesis with applications to data
flow high-level system design. The pipeline synthesis is applied to dataflow programs whose …

Partitioning and optimization of high level stream applications for multi clock domain architectures

SC Brunet, E Bezati, C Alberti, M Mattavelli… - SiPS 2013 …, 2013 - ieeexplore.ieee.org
In this paper we propose a design methodology to partition dataflow applications on a multi
clock domain architecture. This work shows how starting from a high level dataflow …