Bursts and Cascades: Scaling Up Scientific Data Analysis

Date and Time: 
Tuesday 2018 Apr 3rd
CG Auditorium
Seth McGinnis

When scientists are working on an analysis or a data transformation, they often work with a single test case or dataset at a time, and scale up to multiple cases by simply running their code multiple times. This approach works well up to a point, but new approaches are necessary in the realm of Big Data. Many data analysis tasks are embarrasingly parallel and well-suited to scaling up via parallelism. I will describe the process of shifting from serial to parallel analysis on NCAR cyberinfrastructure from the user perspective, outline one approach to "bursting" computation, and discuss how it relates to HPC and cloud computing. Some analyses are too complex and hands-on to be easily repeated many times. In these cases, scaling up requires a method of targeting the analyses to maximize the information gained for effort invested. FACETS is a DOE-sponsored program to develop a framework for evaluating different climate models and downscaling methodologies and their added value for decision-making related to climate impacts, adaptation, and mitigation. Central to FACETS is the concept of a hierarchical cascade of metrics, where simpler analyses are distilled into quantitative metrics that are then used to identify regions of interest for the application of more complex analyses. I will discuss how we have begun to apply this hierarchical cascade approach to regional climate simulations from the NA-CORDEX and NARCCAP programs to work our way up from standard descriptive statistics to process- and phenomena-level evaluation of the credibility of different simulations.

Speaker Description: 

Seth McGinnis is an Associate Scientist IV with joint appointments in CISL and RAL. As the Data Manager for the NARCCAP and NA-CORDEX data collections, he makes the output from regional climate models usable by and available to people who need information about climate change in North America. His research focuses on bias correction, interpolation, data access, and other issues affecting the practical use of model output by non-specialists.

PDF icon McGinnis_seaconf18.pdf46.03 MB

Event Category: