A survey of end-system optimizations for high-speed networks

N Hanford, V Ahuja, MK Farrens, B Tierney… - ACM Computing …, 2018 - dl.acm.org
The gap is widening between the processor clock speed of end-system architectures and
network throughput capabilities. It is now physically possible to provide single-flow …

Pktgen: Measuring performance on high speed networks

D Turull, P Sjödin, R Olsson - Computer communications, 2016 - Elsevier
Pktgen is a tool for high-speed packet generation and testing. It runs in the Linux kernel, and
is designed to accommodate a wide range of network performance tests. Pktgen consists of …

Managing the topology of heterogeneous cluster nodes with hardware locality (hwloc)

B Goglin - 2014 International Conference on High Performance …, 2014 - ieeexplore.ieee.org
Modern computing platforms are increasingly complex, with multiple cores, shared caches,
and NUMA architectures. Parallel applications developers have to take locality into account …

mdtmFTP and its evaluation on ESNET SDN testbed

L Zhang, W Wu, P DeMar, E Pouyoul - Future Generation Computer …, 2018 - Elsevier
To address the high-performance challenges of data transfer in the big data era, we are
develo** and implementing mdtmFTP: a high-performance data transfer tool for big data …

pioman: a pthread-based Multithreaded Communication Engine

A Denis - 2015 23rd Euromicro International Conference on …, 2015 - ieeexplore.ieee.org
Recent cluster architectures include dozens of cores per node, with all cores sharing the
network resources. To program such architectures, hybrid models mixing MPI+ threads, and …

Characterization of input/output bandwidth performance models in NUMA architecture for data intensive applications

T Li, Y Ren, D Yu, S **… - 2013 42nd International …, 2013 - ieeexplore.ieee.org
Data-intensive applications frequently rely on multicore computer systems, in which Non-
Uniform Memory Access (NUMA) is a dominant architecture. To transfer data into and out …

Dodging non-uniform I/O access in hierarchical collective operations for multicore clusters

B Goglin, S Moreaud - 2011 IEEE International Symposium on …, 2011 - ieeexplore.ieee.org
The increasing number of cores led to scalability issues in modern servers that were
addressed by using non-uniform memory interconnects such as Hyper Transport and QPI …

A scalable and generic task scheduling system for communication libraries

F Trahay, A Denis - 2009 IEEE International Conference on …, 2009 - ieeexplore.ieee.org
Since the advent of multi-core processors, the physionomy of typical clusters has
dramatically evolved. This new massively multi-core era is a major change in architecture …

Tyche: An efficient Ethernet-based protocol for converged networked storage

P González-Férez, A Bilas - Big Data Management and …, 2017 - taylorfrancis.com
Tyche is a network storage protocol directly on top of raw Ethernet, which does not require
any hardware support from the network interface. It provides high I/O throughput and low I/O …

Towards the Structural Modeling of the Topology of next-generation heterogeneous cluster Nodes with hwloc

B Goglin - 2016 - inria.hal.science
Parallel computing platforms are increasingly complex, with multiple cores, shared caches,
and NUMA memory interconnects, as well as asymmetric I/O access. Upcoming …