{FaRM}: Fast remote memory
We describe the design and implementation of FaRM, a new main memory distributed
computing platform that exploits RDMA to improve both latency and throughput by an order …
computing platform that exploits RDMA to improve both latency and throughput by an order …
[BOOK][B] Parallel computer architecture: a hardware/software approach
The most exciting development in parallel computer architecture is the convergence of
traditionally disparate approaches on a common machine structure. This book explains the …
traditionally disparate approaches on a common machine structure. This book explains the …
Memory coherence in shared virtual memory systems
The memory coherence problem in designing and implementing a shared virtual memory on
loosely coupled multiprocessors is studied in depth. Two classes of algorithms, centralized …
loosely coupled multiprocessors is studied in depth. Two classes of algorithms, centralized …
Parallel metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference
Motivation: Bayesian estimation of phylogeny is based on the posterior probability
distribution of trees. Currently, the only numerical method that can effectively approximate …
distribution of trees. Currently, the only numerical method that can effectively approximate …
Remote regions: a simple abstraction for remote memory
We propose an intuitive abstraction for a process to export its memory to remote hosts, and
to access the memory exported by others. This abstraction provides a simpler interface to …
to access the memory exported by others. This abstraction provides a simpler interface to …
Efficient distributed memory management with RDMA and caching
Recent advancements in high-performance networking interconnect significantly narrow the
performance gap between intra-node and inter-node communications, and open up …
performance gap between intra-node and inter-node communications, and open up …
MagPIe: MPI's collective communication operations for clustered wide area systems
Writing parallel applications for computational grids is a challenging task. To achieve good
performance, algorithms designed for local area networks must be adapted to the …
performance, algorithms designed for local area networks must be adapted to the …
Scale-out NUMA
Emerging datacenter applications operate on vast datasets that are kept in DRAM to
minimize latency. The large number of servers needed to accommodate this massive …
minimize latency. The large number of servers needed to accommodate this massive …
MPI versus MPI+ OpenMP on the IBM SP for the NAS Benchmarks
The hybrid memory model of clusters of multiprocessors raises two issues: programming
model and performance. Many parallel programs have been written by using the MPI …
model and performance. Many parallel programs have been written by using the MPI …
Exploiting distributed version concurrency in a transactional memory cluster
K Manassiev, M Mihailescu, C Amza - Proceedings of the eleventh ACM …, 2006 - dl.acm.org
We investigate a transactional memory runtime system providing scaling and strong
consistency, ie, 1-copy serializability on commodity clusters for both distributed scientific …
consistency, ie, 1-copy serializability on commodity clusters for both distributed scientific …