Making earth science data more accessible: experience with chunking

Date and Time: 
2013 Tuesday, April 2
CG1 Auditoriums
Russ Rew

Authors: Russ Rew

Chunking reorders gridded data to support efficient access along multiple axes. For example, rechunking a dataset that stores data with time as the most slowly-varying dimension can support relatively efficient access of two-dimensional spatial slices while significantly speeding up time series retrieval. Experience with rechunking large datasets has led to some insights for use in tools, such as the netCDF nccopy utility for compressing and rechunking data. Benchmark results show results of rechunking real data.

Speaker Description: 

Russ Rew has been engineering software since software engineers were called computer programmers, beginning at Ball Brothers as a student assistant, moving to NCAR (while obtaining a in computer science Ph.D.), and finally ending up at Unidata. At NCAR, he helped procure the first Unix systems, gained lot of experience with reusable software, and struggled with pretty big data. At Unidata, he led development of event-oriented client-server software for Unidata's Internet Data Distribution system; the netCDF data model, format, and libraries; and netCDF-4 on an HDF5 storage layer. He continues to develop and support netCDF, and display it (in all caps, unfortunately) on his license plate.

PDF icon chunking.pdf639.09 KB
Video recorded: 

Event Category: