Processing NASA Earth Science Data on Nebula Cloud

Date and Time: 
2012 Thursday. February 23rd
Location: 
ML-132 Main Seminar
Speaker: 
Aijun Chen

Keywords:

Earth Science Data Processing, Cloud Computing, Nebula, AIRS

Abstract:

NASA Goddard Earth Science Data and Information Service Center (GES DISC) offers lots of atmospheric-related satellite data and tools used for access to, managing, and (online) analyzing those data. Producing huge volume of various levels of data products involves lots of high-performance computing resources that require very expensive cost. Cloud Computing, being popular with the appearance of Eucalyptus since 2008, has been used to offer high-performance and low-cost computing and storage resources for both scientific research and business services. Several Cloud computing platforms/services have been implemented in the commercial arena, e.g. Amazon’s EC2 & S3, Microsoft’s Azure, and Google App Engine. There are also some research and application programs being launched in academia and governments to utilize Cloud Computing. NASA launched the Nebula Cloud Computing platform in 2008, which is an Infrastructure as a Service (IaaS) to deliver on-demand distributed virtual computers. Nebula users can receive required computing resources as a fully outsourced service.

NASA GES DISC did some work to migrate several GES DISC’s applications to the Nebula as a proof of concept. This work aims to evaluate the practicability and adaptability of the Nebula. We migrated S4PM (the Simple, Scalable, Script-based Science Processor for Measurements) to Nebula as an infrastructure for processing scientific data and AIRS (Atmospheric Infrared Sounder) data process workflow for processing AIRS raw data to produce AIRS Level 3 data products. After installed several supporting libraries and fixed some bugs in the processing codes caused by running environment, the workflow is able to process AIRS data in a similar fashion to its current (non-cloud) configuration. We compared the performance of processing 2 days of AIRS level 0 data using a Nebula virtual computer and a local Linux computer. The result shows that Nebula has significantly better performance than the local machine. Much of the difference was due to newer equipment in the Nebula than the legacy computer, which is suggestive of a potential economic advantage beyond elastic power. We also compared the processing cost based on the above performance results and the performance of GES DISC real processing system. We developed a tutorial for migrating GES DISC applications to Nebula and summarized the advantages and challenges of porting complex processing to the Cloud.

Speaker Description: 

Dr. Aijun Chen got his Ph.D. in Peking University, China, in 2000, Major in Remote Sensing & GIS. He came to George Mason University as a post-doc research associate in 2002.
Currently, Dr. Chen is a research associate professor in the Center of Spatial Information Science and Systems, George Mason University. He is contracted to work at NASA Goddard Earth Science Data and Information Service Center. Dr. Chen has published more than 60 academic papers in journals and international conferences.
He led several NASA projects, including:
1. Utilizing Nebula Cloud Computing to promote NASA Earth Science data processing”, main investigator
2. Using Google Earth to enhance and promote the use, usefulness and usability of NASA GES DISC atmospheric data for scientific research and public use
3. The Integration of Grid Technology with OGC Web Services (OWS) in NWGISS for NASA HDF-EOS Geospatial Data

Event Category: