With the increasing core count on current and future processors, as well as the growing adoption of multi-level parallelism in HPC applications, realizing the full performance of RDMA capable networks presents a significant challenge to MPI implementations and associated lower level network software components. Optimizing MPI one-sided operations is an obvious path to obtaining maximum network performance in such environments, as the need for tag matching either in software or hardware is removed, as well as the reduced ordering constraints imposed on the network when using a put/get style communication paradigm. Furthermore, a recent survey led by the ECP OMPI-X project revealed that a significant number of ECP applications are planning to make use of MPI one-sided communication in exa-scale versions of their applications. In this talk, we cover recent work done in Open MPI’s RMA implementation, focusing in particular on efforts to improve the performance of one-sided MPI operations within multi-threaded regions of applications both by removing serialization points with Open MPI's RMA path, as well as taking better advantage of available network resources to increase small transfer throughput.
Howard Pritchard Jr. is Research Scientist at Los Alamos National Laboratory
Attachment | Size |
---|---|
ompi_rma-seaconf2018.pdf | 800.59 KB |