Operational Dataset Update Functionality Included in the NCAR Research Data Archive Management System

Date and Time: 
2012 Thursday. February 23rd
Location: 
ML-132 Main Seminar
Speaker: 
Zaihua Ji

Growing complexity, volume, and reliance on operationally and routinely created datasets pose challenges for centers tasked with archiving and curating this information. Past tools focused on data delivered via media, such as tape, or data downloaded by ftp scripting customized for single datasets. Presently most data are acquired using network transfers that can happen many times per day. Past archive management technologies do not scale to this new paradigm. The Research Data Archive Management System (RDAMS) meets this new challenge and supports archive management across the entire Research Data Archive (RDA) at the National Center for Atmospheric Research (NCAR). RDAMS uses open source databases and locally written utilities to manage the complete data archive cycle by fetching, interrogating, archiving, and providing long-term research data stewardship. A good example of RDAMS functionality is the Dataset Update (DSUPDT) utility. 


DSUPDT accomplishes four major tasks: 1) contact and query a remote server to determine if new or modified data are available, 2) transfer data from a remote host to the local system and verify the data integrity, 3) execute any dataset-specific processing steps including standard metadata harvesting, and 4) archive data to local disk and permanent archive systems. This utility has the functionality that is necessary for the highly diverse datasets in the RDA. It can be configured to handle multiple web protocols, automatically recover from system outages at the remote or local sites, execute 3rd party data manipulation software, and accommodate irregular data delivery schedules.

Over 150 RDA dataset products are managed under DSUPDT control, with update schedules running at selectable hourly, daily, weekly, monthly, and yearly intervals. DSUPDT is design to be fully scalable and continues to support addition of all new data streams. This poster will introduce the powerful functionality

Authors: Zaihua Ji, Doug Schuster & Steven Worley

Speaker Description: 

Zaihua Ji, short name Hua
Software Engineer III in Data Support Section, CISL, NCAR; starting in February 2004.
Graduated from University of South Florida in 1997 with a M.S. in Computer Science and Ph.D. in Physical Oceanography.

AttachmentSize
File dsupdt_zaihua_ji.pptx853.73 KB

Event Category: