Cornflakes: Zero-copy serialization for microsecond-scale networking

D Raghavan, S Ravi, G Yuan, P Thaker… - Proceedings of the 29th …, 2023 - dl.acm.org
Data serialization is critical for many datacenter applications, but the memory copies
required to move application data into packets are costly. Recent zero-copy APIs expose …

Breakfast of champions: towards zero-copy serialization with NIC scatter-gather

D Raghavan, P Levis, M Zaharia, I Zhang - Proceedings of the Workshop …, 2021 - dl.acm.org
Microsecond I/O will make data serialization a major bottleneck for datacenter applications.
Serialization is fundamentally about data movement: serialization libraries coalesce and …

PetPS: supporting huge embedding models with persistent memory

M **e, Y Lu, Q Wang, Y Feng, J Liu, K Ren… - Proceedings of the VLDB …, 2023 - dl.acm.org
Embedding models are effective for learning high-dimensional sparse data. Traditionally,
they are deployed in DRAM parameter servers (PS) for online inference access. However …

Configurable algorithms for all-to-all collectives

K Fan, S Petruzza, T Gilray… - ISC High Performance …, 2024 - ieeexplore.ieee.org
MPI_Alltoall is a commonly used collective that allows a fixed-size data block to be
exchanged between every pair of processes. The function can be implemented through a …

HINT: Designing Cache-Efficient MPI_Alltoall using Hybrid Memory Copy Ordering and Non-Temporal Instructions

B Ramesh, N Contini, N Alnaasan… - 2024 IEEE …, 2024 - ieeexplore.ieee.org
Modern multi/many-core processors in HPC systems have hundreds of cores with deep
memory hierarchies. HPC applications run at high core counts often experience contention …

Using arm scalable vector extension to optimize open mpi

D Zhong, P Shamis, Q Cao, G Bosilca… - 2020 20th IEEE/ACM …, 2020 - ieeexplore.ieee.org
As the scale of high-performance computing (HPC) systems continues to grow, increasing
levels of parallelism must be implored to achieve optimal performance. Recently, the …

Configurable Non-uniform All-to-all Algorithms

K Fan, J Domke, S Ba, S Kumar - arxiv preprint arxiv:2411.02581, 2024 - arxiv.org
MPI_Alltoallv generalizes the uniform all-to-all communication (MPI_Alltoall) by enabling the
exchange of data blocks of varied sizes among processes. This function plays a crucial role …

Improving MPI Language Support Through Custom Datatype Serialization

J Tronge, J Schuchart, L Dalcin… - SC24-W: Workshops of …, 2024 - ieeexplore.ieee.org
Exascale applications are being increasingly written in modern languages such as Python,
Julia, C++, and Rust. The Message-Passing Interface (MPI), the de facto standard for …

Collective communication system and methods

R Graham, L Levi, G Bloch, D Marcovitch… - US Patent …, 2024 - Google Patents
2021-10-07 Assigned to MELLANOX TECHNOLOGIES TLV LTD. reassignment MELLANOX
TECHNOLOGIES TLV LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT …

[BUKU][B] Efficient Serialization for Datacenter Applications

D Raghavan - 2024 - search.proquest.com
Software serialization is critical for many datacenter applications, but serialization is costly in
today's datacenters. Datacenter networks have become at least 20x faster in the last 15 …