- Academic Search

A Ros, S Kaxiras - Proceedings of the 21st international conference on …, 2012 - dl.acm.org

Much of the complexity and overhead (directory, state bits, invalidations) of a typical
directory coherence implementation stems from the effort to make it" invisible" even to the …

Enregistrer Citer Cité 167 fois Autres articles Les 19 versions Free GPT-4

[Free GPT-4]

[PDF] usenix.org

A Tale of Two Paths: Toward a Hybrid Data Plane for Efficient {Far-Memory} Applications

L Chen, S Liu, C Wang, H Ma, Y Qiao, Z Wang… - … USENIX Symposium on …, 2024 - usenix.org

With rapid advances in network hardware, far memory has gained a great deal of traction
due to its ability to break the memory capacity wall. Existing far memory systems fall into one …

Enregistrer Citer Cité 2 fois Autres articles Les 4 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] googleapis.com

System and method for simplifying cache coherence using multiple write policies

S Kaxiras, A Ros - US Patent 9,274,960, 2016 - Google Patents

Abstract System and methods for cache coherence in a multi-core processing environment
having a local/shared cache hierarchy. The system includes multiple processor cores, a …

Enregistrer Citer Cité 107 fois Autres articles Les 4 versions Free GPT-4 En cache

[Free GPT-4]

[PDF] nsf.gov

Locality-centric data and threadblock management for massive GPUs

M Khairy, V Nikiforov, D Nellans… - 2020 53rd Annual IEEE …, 2020 - ieeexplore.ieee.org

Recent work has shown that building GPUs with hundreds of SMs in a single monolithic chip
will not be practical due to slowing growth in transistor density, low chip yields, and …

Enregistrer Citer Cité 35 fois Autres articles Les 8 versions Free GPT-4

[Free GPT-4]

[PDF] psu.edu

Compiler support for selective page migration in NUMA architectures

G Piccoli, HN Santos, RE Rodrigues, C Pousa… - Proceedings of the 23rd …, 2014 - dl.acm.org

Current high-performance multicore processors provide users with a non-uniform memory
access model (NUMA). These systems perform better when threads access data on memory …

Enregistrer Citer Cité 53 fois Autres articles Les 8 versions Free GPT-4

[Free GPT-4]

[PDF] psu.edu

A software approach for combating asymmetries of non-volatile memories

Y Li, Y Chen, AK Jones - Proceedings of the 2012 ACM/IEEE …, 2012 - dl.acm.org

The recent advances in non-volatile memory technologies promise the delivery of future
high performance and low power computing systems. While these technologies provide …

Enregistrer Citer Cité 58 fois Autres articles Les 3 versions Free GPT-4

[Free GPT-4]

[PDF] wiley.com Full View

Locality‐Aware Task Scheduling and Data Distribution for OpenMP Programs on NUMA Systems and Manycore Processors

A Muddukrishna, PA Jonsson… - Scientific …, 2015 - Wiley Online Library

Performance degradation due to nonuniform data access latencies has worsened on NUMA
systems and can now be felt on‐chip in manycore processors. Distributing data across …

Enregistrer Citer Cité 39 fois Autres articles Les 9 versions Free GPT-4

[Free GPT-4]

[PDF] archive.org

Practically private: Enabling high performance cmps through compiler-assisted data classification

Y Li, R Melhem, AK Jones - … of the 21st international conference on …, 2012 - dl.acm.org

State-of-the-art chip multiprocessor (CMP) proposals emphasize optimization to deliver
computing power across many types of applications. Potentially significant performance …

Enregistrer Citer Cité 46 fois Autres articles Les 4 versions Free GPT-4

[Free GPT-4]

[PDF] um.es

Racer: TSO consistency via race detection

A Ros, S Kaxiras - 2016 49th Annual IEEE/ACM International …, 2016 - ieeexplore.ieee.org

Several recent efforts aim to simplify coherence and its associate costs (eg, directory size,
complexity) in multicores. The bulk of these efforts rely on program data-race-free (DRF) …

Enregistrer Citer Cité 24 fois Autres articles Les 7 versions Free GPT-4

[Free GPT-4]

[PDF] upv.es

Temporal-aware mechanism to detect private data in chip multiprocessors

A Ros, B Cuesta, ME Gómez… - … on Parallel Processing, 2013 - ieeexplore.ieee.org

Most of the data referenced by sequential and parallel applications running in current chip
multiprocessors are referenced by only one thread and can be considered as private data. A …

Enregistrer Citer Cité 35 fois Autres articles Les 11 versions Free GPT-4

Créer l'alerte

Citer

Recherche avancée

Enregistré dans Ma bibliothèque

Compiler-assisted data distribution for chip multiprocessors

Complexity-effective multicore coherence

A Tale of Two Paths: Toward a Hybrid Data Plane for Efficient {Far-Memory} Applications

System and method for simplifying cache coherence using multiple write policies

Locality-centric data and threadblock management for massive GPUs

Compiler support for selective page migration in NUMA architectures

A software approach for combating asymmetries of non-volatile memories

Locality‐Aware Task Scheduling and Data Distribution for OpenMP Programs on NUMA Systems and Manycore Processors

Practically private: Enabling high performance cmps through compiler-assisted data classification

Racer: TSO consistency via race detection

Temporal-aware mechanism to detect private data in chip multiprocessors