What Climate Science Knows About Big Data

Date and Time: 
Monday, April 4th, 2016
Center Green
Seth McGinnis

Big Data refers to any data set that's too big to be handled by conventional tools and techniques. This criterion is relative, context-dependent, and changes over time. It is often taken to mean data with high storage volume on the terabyte to petabyte scale (currently), but there are other ways for data to be Big, and an array of different approaches are needed to wrestle such data sets into tractability.

Climate scientists have been struggling with Big Data for many years: global and regional climate models run for weeks at a time on state of the art supercomputers and generate huge data archives that require custom storage solutions, while observational data sets for climate impacts span a multitude of instruments, data types, and scientific disciplines.

This talk will present the "Big V" framework for thinking about Big Data sets in terms of Volume, Velocity, Variety, Value, and other important characteristics. It will describe the methods and mechanisms that the climate science community has developed to deal with Big Volume Data by standardization for parallel and automated analysis, and Big Variety Data by agile metadata curation to maximize discoverability. Finally, it will discuss future directions and new approaches in development.

Speaker Description: 

Seth McGinnis is an Associate Scientist IV in the Institute for Mathematics Applied to Geosciences (IMAGe) at NCAR. As the Data Manager and User Community Manager for NARCCAP, the North American Regional Climate Change Assessment Program, he makes the output from climate models usable by and available to people who need information about climate change. His research focuses on bias correction, interpolation, and other issues affecting the practical use of model output by non-specialists.

File SEA16-McGinnis-BigData.pptx8.09 MB

Event Category: