{FaRM}: Fast remote memory

A Dragojević, D Narayanan, M Castro… - 11th USENIX Symposium …, 2014 - usenix.org
We describe the design and implementation of FaRM, a new main memory distributed
computing platform that exploits RDMA to improve both latency and throughput by an order …

[BOOK][B] Parallel computer architecture: a hardware/software approach

D Culler, JP Singh, A Gupta - 1999 - books.google.com
The most exciting development in parallel computer architecture is the convergence of
traditionally disparate approaches on a common machine structure. This book explains the …

Memory coherence in shared virtual memory systems

K Li, P Hudak - ACM Transactions on Computer Systems (TOCS), 1989 - dl.acm.org
The memory coherence problem in designing and implementing a shared virtual memory on
loosely coupled multiprocessors is studied in depth. Two classes of algorithms, centralized …

Parallel metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference

G Altekar, S Dwarkadas, JP Huelsenbeck… - …, 2004 - academic.oup.com
Motivation: Bayesian estimation of phylogeny is based on the posterior probability
distribution of trees. Currently, the only numerical method that can effectively approximate …

Remote regions: a simple abstraction for remote memory

MK Aguilera, N Amit, I Calciu, X Deguillard… - 2018 USENIX Annual …, 2018 - usenix.org
We propose an intuitive abstraction for a process to export its memory to remote hosts, and
to access the memory exported by others. This abstraction provides a simpler interface to …

Efficient distributed memory management with RDMA and caching

Q Cai, W Guo, H Zhang, D Agrawal, G Chen… - Proceedings of the …, 2018 - dl.acm.org
Recent advancements in high-performance networking interconnect significantly narrow the
performance gap between intra-node and inter-node communications, and open up …

MagPIe: MPI's collective communication operations for clustered wide area systems

T Kielmann, RFH Hofman, HE Bal, A Plaat… - Proceedings of the …, 1999 - dl.acm.org
Writing parallel applications for computational grids is a challenging task. To achieve good
performance, algorithms designed for local area networks must be adapted to the …

Scale-out NUMA

S Novakovic, A Daglis, E Bugnion, B Falsafi… - ACM SIGPLAN …, 2014 - dl.acm.org
Emerging datacenter applications operate on vast datasets that are kept in DRAM to
minimize latency. The large number of servers needed to accommodate this massive …

MPI versus MPI+ OpenMP on the IBM SP for the NAS Benchmarks

F Cappello, D Etiemble - SC'00: Proceedings of the 2000 ACM …, 2000 - ieeexplore.ieee.org
The hybrid memory model of clusters of multiprocessors raises two issues: programming
model and performance. Many parallel programs have been written by using the MPI …

Exploiting distributed version concurrency in a transactional memory cluster

K Manassiev, M Mihailescu, C Amza - Proceedings of the eleventh ACM …, 2006 - dl.acm.org
We investigate a transactional memory runtime system providing scaling and strong
consistency, ie, 1-copy serializability on commodity clusters for both distributed scientific …