- Academic Search

Unleashing fine-grained parallelism on embedded many-core accelerators with lightweight OpenMP tasking

G Tagliavini, D Cesarini… - IEEE Transactions on …, 2018 - ieeexplore.ieee.org

In recent years, programmable many-core accelerators (PMCAs) have been introduced in
embedded systems to satisfy stringent performance/Watt requirements. This has increased …

Save Cite Cited by 37 Related articles All 8 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] nsf.gov

Comparison of threading programming models

S Salehian, J Liu, Y Yan - 2017 IEEE International Parallel and …, 2017 - ieeexplore.ieee.org

In this paper, we provide comparison of language features and runtime systems of
commonly used threading parallel programming models for high performance computing …

Save Cite Cited by 23 Related articles All 3 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[HTML] diva-portal.org

Grain graphs: OpenMP performance analysis made easy

A Muddukrishna, PA Jonsson, A Podobas… - Proceedings of the 21st …, 2016 - dl.acm.org

Average programmers struggle to solve performance problems in OpenMP programs with
tasks and parallel for-loops. Existing performance analysis tools visualize OpenMP task …

Save Cite Cited by 33 Related articles All 6 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] usenix.org

{Callisto-RTS}:{Fine-Grain} Parallel Loops

T Harris, S Kaestle - … Annual Technical Conference (USENIX ATC 15), 2015 - usenix.org

We introduce Callisto-RTS, a parallel runtime system designed for multi-socket shared-
memory machines. It supports very fine-grained scheduling of parallel loops—down to …

Save Cite Cited by 24 Related articles All 12 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Parallel Cholesky Factorization for Banded Matrices Using OpenMP Tasks

F Liu, A Fredriksson, S Markidis - European Conference on Parallel …, 2023 - Springer

Cholesky factorization is a method for solving linear systems involving symmetric, positive-
definite matrices, and can be an attractive choice in applications where a high degree of …

Save Cite Cited by 2 Related articles All 5 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

The tiny-tasks granularity trade-off: Balancing overhead versus performance in parallel systems

S Bora, B Walker, M Fidler - IEEE Transactions on Parallel and …, 2023 - ieeexplore.ieee.org

Models of parallel processing systems typically assume that one has workers and jobs are
split into an equal number of tasks. Splitting jobs into smaller tasks, ie using “tiny tasks”, can …

Save Cite Cited by 5 Related articles All 7 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] osti.gov

Analyzing the performance trade-off in implementing user-level threads

S Iwasaki, A Amer, K Taura… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org

User-level threads have been widely adopted as a means of achieving lightweight
concurrent execution without the costs of OS-level threads. Nevertheless, the costs of …

Save Cite Cited by 7 Related articles All 3 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Solvers for Electronic Structure in the Strong Scaling Limit

N Bock, M Challacombe, LV Kalé - SIAM Journal on Scientific Computing, 2016 - SIAM

We present a hybrid OpenMP/Charm\tt++ framework for solving the O(N) self-consistent-field
eigenvalue problem with parallelism in the strong scaling regime, P≫N, where P is the …

Save Cite Cited by 12 Related articles All 11 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] github.io

Lessons learned from analyzing dynamic promotion for user-level threading

S Iwasaki, A Amer, K Taura… - … Conference for High …, 2018 - ieeexplore.ieee.org

A performance vs. practicality trade-off exists between user-level threading techniques. The
community has settled mostly on a black-and-white perspective; fully fledged threads …

Save Cite Cited by 8 Related articles All 7 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[HTML] cyberleninka.ru

[HTML][HTML] Exploring Fine-grained Task Parallelism on Simultaneous Multithreading Cores

D Los, I Petushkov - International Journal of Open Information …, 2024 - cyberleninka.ru

Nowadays, latency-critical, high-performance applications are parallelized even on power-
constrained client systems to improve performance. However, an important scenario of fine …

Create alert

Cite

Advanced search

Saved to My library

A comparative performance study of common and popular task‐centric programming frameworks

Unleashing fine-grained parallelism on embedded many-core accelerators with lightweight OpenMP tasking

Comparison of threading programming models

Grain graphs: OpenMP performance analysis made easy

{Callisto-RTS}:{Fine-Grain} Parallel Loops

Parallel Cholesky Factorization for Banded Matrices Using OpenMP Tasks

The tiny-tasks granularity trade-off: Balancing overhead versus performance in parallel systems

Analyzing the performance trade-off in implementing user-level threads

Solvers for Electronic Structure in the Strong Scaling Limit

Lessons learned from analyzing dynamic promotion for user-level threading

[HTML][HTML] Exploring Fine-grained Task Parallelism on Simultaneous Multithreading Cores