Data Rods: High Speed, Time-Series Analysis of Massive Data Sets Using Pure Object Database Methods

Date and Time: 
2012 Thursday. February 23rd
Location: 
ML-132 Main Seminar
Speaker: 
David Gallaher

Abstract:

Change over time, is the central driver of climate change detection. The goal is to diagnose the underlying causes, and make projections into the future. In an effort to optimize this process we have developed the Data Rod model, an object-oriented approach that provides the ability to query grid cell changes and their relationships to neighboring grid cells through time. The time series data is organized in time-centric structures called “data rods." A single data rod can be pictured as the multi-spectral data history at one grid cell: a vertical column of data through time. This resolves the long-standing problem of managing time-series data and opens new possibilities for temporal data analysis. This structure enables rapid time-centric analysis at any grid cell across multiple sensors and satellite platforms. Collections of data rods can be spatially and temporally filtered, statistically analyzed, and aggregated for use with pattern matching algorithms. Likewise, individual image pixels can be extracted to generate multi-spectral imagery at any spatial and temporal location. The Data Rods project has created a series of prototype databases to store and analyze massive datasets containing multi-modality remote sensing data. Using object-oriented technology, this method overcomes the operational limitations of traditional relational databases.

To demonstrate the speed and efficiency of time-centric analysis using the Data Rods model, we have developed a sea ice detection algorithm. This application determines the concentration of sea ice in a small spatial region across a long temporal window. If performed using traditional analytical techniques, this task would typically require extensive data downloads and spatial filtering. Using Data Rods databases, the exact spatio-temporal data set is immediately available No extraneous data is downloaded, and all selected data querying occurs transparently on the server side. Moreover, fundamental statistical calculations such as running averages are easily implemented against the time-centric columns of data.

Speaker Description: 

David Gallaher is a Geoscientist and Manager of Information Technology Services, National Snow and Ice Data Center (NSIDC), Cooperative Institute for Environmental Sciences (CIRES), University of Colorado, Boulder, CO.

He is leading the technical evolution of NSIDC systems and architecture to meet the needs of our scientific communities and stakeholders. At the same time, he is focusing on evolving our internal and external systems integration, and on refining technologies and infrastructures to be more user-friendly, efficient, cost-effective, and scalable, while continuing to support our core data ingest and distribution functions.

He is currently developing a “green data center” at NSIDC with the goal of reducing the power consumption for cooling by 95%. Dave is the lead investigator on a NSF grant for project to design and prototype a process (through creation of "Data Rods") for addressing time-series data as pure objects that will enable time-centric change analysis of massive multi-modality cryospheric data. He is also the Project Manager on a project to build web services and analysis application for determining changes to the Greenland ice sheet. His latest project is to recover 1960’s Nimbus satellite data to determine the sea ice extent during that decade.

AttachmentSize
Office presentation icon DataRods_SEA_2012.ppt7.02 MB

Event Category: