User Driven Automatic Data Request Service - Providing User Access to TB-sized Datasets

Date and Time: 
2013 Monday, April 1
Location: 
CG1 Auditoriums
Speaker: 
Zaihua Ji

Authors: Zaihua Ji

For data service centers, it is a challenging task to provide user access to TB-sized datasets. Traditional data services allow users to access the data either via online web services or on computer nodes with direct connections to the data storage devices. Data files served this way are formatted and built in advance, and normally in large sizes. In reality, users may need data files built much differently, such as a subset or in a different data format. It is essential for data service centers to provide dynamic data services based on user requests.

In the RDA, Research Data Archive, at NCAR, we maintain many historical and ongoing, observational and model-derived, atmospheric and oceanographic, data products. The data files are stored for long term archiving onto a tape-based archiving system, HPSS, High Performance Storage Systems, while the most active data files are staged on disk-based file systems for easy access. A stable, scalable and distributed controller, DataSet ReQueST (DSRQST), has been designed and implemented for auto-processing user requests, including data subsetting, format converting, and data staging specific for individual users on disk-based systems that are convenient for access. Multiple services can be combined, such as subsetting and format converting. The control is flexible and can be simply configured to use and distribute work across nodes in a computer cluster. New data services can also be easily added and control within the DSRQST work flow, for example regridding of model data and algorithmic application to native parameter fields to produce additional products.

Currently we have about 160 data products that are configured to serve data dynamically and automatically under control of DSRQST. In this presentation we introduce briefly our data repositories and the detailed integrated strategy of the automatic data services and how we implemented the strategy with examples.

Speaker Description: 

Zaihua Ji, short name Hua
Software Engineer III in Data Support Section, CISL, NCAR, starting in February 2004.
Graduated from University of South Florida in 1997 with a M.S. in Computer Science and Ph.D. in Physical Oceanography.

AttachmentSize
PDF icon dsrqst04.sea2013.pdf6.93 MB
Video recorded: 

Event Category: