Fault tolerance in Fortran 2015

Date and Time: 
2017 January 26th at 3pm
ML main seminar
Alessandro Fanfarillo

For High Performance Computing applications, scaling across multiple processors is the most viable way to reach Peta and Exascale performance.
For an application that scales on hundreds of thousand of cores a major treat is represented by node failures.
In fact, on a machine equipped with 1,000,000 nodes, each of which failing on average every 100 years, there will be a failure every 53 minutes.
The coarray definition provided by the Fortran 2008 standard allows one to write a fully functional parallel program without directly invoking external communication libraries (e.g. MPI, OpenMP), but it does not provide any functionality to deal with failures.
The Fortran 2015 standard introduces several new features to meet challenges that are expected to dominate massively parallel programming in the coming exascale era, including fault tolerance capabilities.
In this seminar, a general view of the fault tolerant features of Fortran 2015 will be provided, with particular focus on the changes needed to make a regular application resilient.

Speaker Description: 

Dr. Alessandro Fanfarillo is a postdoctoral researcher at the NCAR, his research focuses on how to exploit heterogeneous architectures CPU+Accelerators and Partitioned Global Address Space (PGAS) languages (in particular coarray Fortran) for scientific purposes.
He is also the lead developer of OpenCoarrays, the open-source library that implements the coarray support in the GNU Fortran compiler.

Event Category: