Rapid, Flexible, and Open Source Big Data Technologies for the U.S. National Climate Assessment

Date and Time: 
2013 Monday, April 1
CG1 Auditoriums
Chris Mattmann

Authors: Chris A. Mattmann, Tom Painter, Duane Waliser, Cameron Goodale, Paul Ramirez, Andrew Hart, Paul Zimdars, Dan Crichton

The National Aeronautics and Space Administration's (NASA) Center response to the 2013 U.S. National Climate Assessment involved funding various projects including the development of data products for climatologies, tool support for performing model evaluations and other activities, and the application of data products to transition from science research into decision making for policy makers wanting to make use of the data.

Our team was funded on two complementary projects: (1) A Regional Climate Model Evaluation System (RCMES), funding a toolkit to easily compare remote sensing observations from NASA satellites to model outputs of regional spatial and temporal scale (Waliser/Kim/Mattmann/Mearns); and (2) A Snow and Ice Climatology for the Western US and Alaska (Painter/Mattmann). Though complementary, the efforts brought with them different challenges that had to be overcome via the development of innovative Big Data technologies.

The challenges for RCMES revolved around technology integration, platform as a service and infrastructure as a service -- bringing together several of the modern climate toolkits, and data format handling technologies (NetCDF; HDF-5; CF metadata; HDF-EOS metadata; NCAR NCL; PyNGL, PyNIO, Matplotlib, Scipy, Numpy, etc.) -- and then delivering those capabilities in the form of a scalable cloud database technology based on Apache Hadoop/HIVE, MongoDB and MySQL, and an associated analysis toolkit. We have spent a great deal of time exploring delivering mechanisms for RCMES, integrating it into the Coordinated Regional Downscaling Experiment (CORDEX) including the Africa, North America (NARCCAP), Arctic, East Asia, South Asia, and Australia regions.

The challenges for the Snow and Ice Climatology project involved the rapid development of a Science Computing Facility based on Apache OODT; the processing of a decade's worth of snow covered area/grain size products, and dust radiative forcing products for the entire EOS era, and the integration of the complex algorithms required to generate those products. The algorithms, based in IDL, were made to interact various cloud and cluster computing topologies, data products were cataloged, and made available for search in Apache Solr and in other technologies, and provenance and pedigree were tracked allowing the investigators to explore changes in algorithm versions, scientific assumptions, etc. In addition, all of the data products were transformed into GIS compatible formats (GeoTIFF), and made available to decision makers using GeoServer, Leaflet and other open source GIS technologies.

This talk will discuss the open source technologies used in these projects, the lessons we learned, and provide a roadmap for future data systems activities in the U.S. National Climate Assessment.

Speaker Description: 

Chris Mattmann has a wealth of experience in software design, and in the construction of large-scale data-intensive systems. His work has infected a broad set of communities, ranging from helping NASA unlock data from its next generation of earth science system satellites, to assisting graduate students at the University of Southern California (his Alma mater) in the study of software architecture, all the way to helping industry and open source as a member of the Apache Software Foundation. When he's not busy being busy, he's spending time with his lovely wife and son braving the mean streets of Southern California.

File NCAR-SEA-Mattmann-2013.pptx22.59 MB
Video recorded: 

Event Category: