[BOK][B] Embedded system design: embedded systems foundations of cyber-physical systems, and the internet of things

P Marwedel - 2021 - library.oapen.org
A unique feature of this open access textbook is to provide a comprehensive introduction to
the fundamental knowledge in embedded systems, with applications in cyber-physical …

Ithemal: Accurate, portable and fast basic block throughput estimation using deep neural networks

C Mendis, A Renda, S Amarasinghe… - … on machine learning, 2019 - proceedings.mlr.press
Predicting the number of clock cycles a processor takes to execute a block of assembly
instructions in steady state (the throughput) is important for both compiler designers and …

Rethinking SIMD vectorization for in-memory databases

O Polychroniou, A Raghavan, KA Ross - Proceedings of the 2015 ACM …, 2015 - dl.acm.org
Analytical databases are continuously adapting to the underlying hardware in order to
saturate all sources of parallelism. At the same time, hardware evolves in multiple directions …

Neurovectorizer: End-to-end vectorization with deep reinforcement learning

A Haj-Ali, NK Ahmed, T Willke, YS Shao… - Proceedings of the 18th …, 2020 - dl.acm.org
One of the key challenges arising when compilers vectorize loops for today's SIMD-
compatible architectures is to decide if vectorization or interleaving is beneficial. Then, the …

Cg: A system for programming graphics hardware in a C-like language

WR Mark, RS Glanville, K Akeley… - ACM SIGGRAPH 2003 …, 2003 - dl.acm.org
The latest real-time graphics architectures include programmable floating-point vertex and
fragment processors, with support for data-dependent control flow in the vertex processor …

The architecture of the DIVA processing-in-memory chip

J Draper, J Chame, M Hall, C Steele, T Barrett… - Proceedings of the 16th …, 2002 - dl.acm.org
The DIVA (Data IntensiVe Architecture) system incorporates a collection of Processing-In-
Memory (PIM) chips as smart-memory co-processors to a conventional microprocessor. We …

Synergistic processing in cell's multicore architecture

M Gschwind, HP Hofstee, B Flachs, M Hopkins… - IEEE micro, 2006 - ieeexplore.ieee.org
Eight synergistic processor units enable the Cell Broadband Engine's breakthrough
performance. The SPU architecture implements a novel, pervasively data-parallel …

Auto-vectorization of interleaved data for SIMD

D Nuzman, I Rosen, A Zaks - ACM SIGPLAN Notices, 2006 - dl.acm.org
Most implementations of the Single Instruction Multiple Data (SIMD) model available today
require that data elements be packed in vector registers. Operations on disjoint vector …

Vectorization for SIMD architectures with alignment constraints

AE Eichenberger, P Wu, K O'brien - Acm sigplan notices, 2004 - dl.acm.org
When vectorizing for SIMD architectures that are commonly employed by today's multimedia
extensions, one of the new challenges that arise is the handling of memory alignment. Prior …

When polyhedral transformations meet SIMD code generation

M Kong, R Veras, K Stock, F Franchetti… - Proceedings of the 34th …, 2013 - dl.acm.org
Data locality and parallelism are critical optimization objectives for performance on modern
multi-core machines. Both coarse-grain parallelism (eg, multi-core) and fine-grain …