Cere: Llvm-based codelet extractor and replayer for piecewise benchmarking and optimization

PDO Castro, C Akel, E Petit, M Popov… - ACM Transactions on …, 2015 - dl.acm.org
This article presents Codelet Extractor and REplayer (CERE), an open-source framework for
code isolation. CERE finds and extracts the hotspots of an application as isolated fragments …

Type-based gradual ty** performance optimization

JP Campora, MW Khan, S Chen - Proceedings of the ACM on …, 2024 - dl.acm.org
Gradual ty** has emerged as a popular design point in programming languages,
attracting significant interests from both academia and industry. Programmers in gradually …

[HTML][HTML] A DSL-based runtime adaptivity framework for Java

T Carvalho, J Bispo, P Pinto, JMP Cardoso - SoftwareX, 2023 - Elsevier
This article presents Kadabra, a Java source-to-source compiler that allows users to make
code queries, code analysis and code transformations, all user-programmable using the …

I/O Optimisation and elimination via partial evaluation

CSF Smowton - 2014 - cl.cam.ac.uk
Computer programs commonly repeat work. Short programs go through the same
initialisation sequence each time they are run, and long-running servers may be given a …

Fast Template-Based Code Generation for MLIR

F Drescher, A Engelke - Proceedings of the 33rd ACM SIGPLAN …, 2024 - dl.acm.org
Fast compilation is essential for JIT-compilation use cases like dynamic languages or
databases as well as development productivity when compiling static languages. Template …

[PDF][PDF] Efficient and scalable bit-matrix multiplication in bit-slice format

D Van Amstel - ACM SAC, 2012 - helcaraxan.eu
The bit-matrix multiplication (BMM) has until now only been implemented on the Cray
supercomputers. Since then multiple publications have proved the usefulness of this …

Microtools: Automating program generation and performance measurement

JC Beyler, N Triquenaux, V Palomares… - 2012 41st …, 2012 - ieeexplore.ieee.org
Tuning an application to a given architecture has become a complex procedure.
Sophisticated hardware obfuscates the path to easily writing peak-performance applications …

Improving performance through deep value profiling and specialization with code transformation

MA Khan - Computer Languages, Systems & Structures, 2011 - Elsevier
Specialization of code is used to improve the performance of the applications. However,
specialization based on ineffective profiles deteriorates the performance. Existing value …

Improving performance of optimized kernels through fast instantiations of templates

MA Khan, HP Charles, D Barthou - … and Computation: Practice …, 2009 - Wiley Online Library
To fully exploit the instruction‐level parallelism offered by modern processors, compilers
need the necessary information available during the execution of the program. This …

Feedback-directed specialization of code

MA Khan - Computer Languages, Systems & Structures, 2010 - Elsevier
Based on feedback information, a large number of optimizations can be performed by the
compiler. This information actually indicates the changing behavior of the applications and …