Towards an Efficient Communication Overlap through Hardware Offloading

Date and Time: 
Wednesday 2018 Apr 4th
Overlapping Communication with computation Symposium
Julien Jaeger

With machines now gathering millions of cores, efficient data movement is becoming the challenge to feed computing units. One effective way of hiding the communication cost is to recover them with computation. On this basis, existing MPI runtimes and networks often provide a limited overlap. In this presentation, we propose to introduce our overlapping implementation, integrated in the MPC runtime. We show how this approach achieves 50%+ of fully offloaded direct buffer to buffer (zero copy) messaging on whole runs. To do so, it heavily relies on data copy offloading and low-level matching through a dedicated network. In a second time, we discuss how collectives can benefit MPI programs, illustrated by low-level triggered operations in the hardware — providing improved collective overlap. Eventually, we conclude our presentation with discussion and ongoing work on how such HPC specific designs may have benefits outside of the HPC field. In particular, we consider how these technologies would profit to network backbones, allowing fully offloaded requests (RPCs, Active Messages) to common services such as Web-servers and databases — insisting on the key role of hardware matching in these matters.

Speaker Description: 

Julien Jaeger is Research Engineer at CEA (Commissariat a l'energie atomique et aux energies alternatives, French Alternative Energies and Atomic Energy Commission). He works on the MPC (Multi-Processor Computing) framework provides a unified parallel runtime designed to improve the scalability and performances of applications running on clusters of (very) large multiprocessor/multicore NUMA nodes.

PDF icon MPC_seaconf18.pdf3.18 MB

Event Category: