User Driven Automatic Data Request Service - Providing User Access to TB-sized Datasets

2013 Monday, April 1
Zaihua Ji

For data service centers, it is a challenging task to provide user access to TB-sized datasets. Traditional data services allow users to access the data either via online web services or on computer nodes with direct connections to the data storage devices. Data files served this way are formatted and built in advance, and normally in large sizes. In reality, users may need data files built much differently, such as a subset or in a different data format. It is essential for data service centers to provide dynamic data services based on user requests.

In the RDA, Research Data Archive, at NCAR, we maintain many historical and ongoing, observational and model-derived, atmospheric and oceanographic, data products. The data files are stored for long term archiving onto a tape-based archiving system, HPSS, High Performance Storage Systems, while the most active data files are staged on disk-based file systems for easy access. A stable, scalable and distributed controller, DataSet ReQueST (DSRQST), has been designed and implemented for auto-processing user requests, including data subsetting, format converting, and data staging specific for individual users on disk-based systems that are convenient for access. Multiple services can be combined, such as subsetting and format converting. The control is flexible and can be simply configured to use and distribute work across nodes in a computer cluster. New data services can also be easily added and control within the DSRQST work flow, for example regridding of model data and algorithmic application to native parameter fields to produce additional products.

Currently we have about 160 data products that are configured to serve data dynamically and automatically under control of DSRQST. In this presentation we introduce briefly our data repositories and the detailed integrated strategy of the automatic data services and how we implemented the strategy with examples.

Zaihua Ji, short name Hua
Software Engineer III in Data Support Section, CISL, NCAR, starting in February 2004.
Graduated from University of South Florida in 1997 with a M.S. in Computer Science and Ph.D. in Physical Oceanography.

