Python Tools for Parallel Analysis of Extremely Large GCM Output

Date and Time: 
2014 April 11th @ 1pm
Location: 
CG - room TBD
Speaker: 
Ryan Abernathey

Ocean General Circulation Models (GCMs) are resolving finer and finer scales, meaning that the size of the computational domain is growing rapidly. While the parallel achitecture of GCM codes themselves scales well, the typical tools we use to analyze the output (where the actual science happens) do not. I will report on a toolkit I am developing for the parallel analysis of extremely large global GCM simulations (https://github.com/rabernat/MITgcm_parallel_analysis). The simulations themselves use the MITgcm in a lat/lon/polar cap configuration with an effective horizontal resolution of less than 2km. A single scalar variable requires approx. 18 GB of memory. The toolkit uses the NumPy / SciPy Python scientific computing stack and takes advantage of the IPython parallel framework to split tasks into many tiles. This allows us to efficiently process and visualize the GCM output. The tutorial will cover:

  • A review of the basics of NumPy array manipulation
  • The numpy.memmap class for working with huge files
  • IPython's parallel functionality
  • Using IPython parallel environemnt in mpi mode on a cluster

While the talk focuses on an ocean modelling application, I hope that these approaches will be useful to anyone working with "big" gridded datasets.

Speaker Description: 

Assistant Professor of Earth and Environmental Sciences @ Columbia University / Lamont Doherty Earth Observatory

prevously: Postdoc at Scripps Institution of Oceanographu Ph.D. in Climate Physics and Chemistry at MIT

interests: Global ocean circulation, mesoscale eddy dynamics, transport and mixing in turbulent flows

Event Category: