Brown Dog: An Elastic Data Cyberinfrastrure for Autocuration and Digital Preservation

Date and Time: 
Tuesday, April 5th, 2016
Location: 
Center Green
Speaker: 
Jay Alameda

Smruti Padhy, Jay Alameda, Rui Liu, Edgar Black, Liana Diesendruck, Mike Dietze, Greg Jansen, Praveen Kumar, Rob Kooper, Jong Lee, Richard Marciano, Luigi Marini, Dave Mattson, Barbara Minsker, Chris Navarro, Marcus Slavenas, William Sullivan, Jason Votava, Inna Zharnitsky, Kenton McHenry
National Center for Supercomputing Applications
University of Illinois at Urbana-Champaign

Speaker: Jay Alameda

Abstract
In modern-day “Big Data” science, the diversity of data (unstructured, uncurated, and of different formats) and software provides major challenges for scientific research, especially with the reproducibility of results. The NSF DIBBs Brown Dog Project [1] aims to build cyberinfrastructure to aid autocuration, indexing, and search of unstructured and uncurated digital data. It is focusing on an initial set of science use cases (green infrastructure, critical zone studies, ecology, social science) to guide the overarching design, with user-accessible extensibility as an important driving requirement for the project development. The Brown Dog is composed of two highly extensible services, Data Access Proxy (DAP) and Data Tilling Service (DTS). These services aim to leverage/reuse any existing pieces of code, libraries, services, or standalone software (past or present), accessible through an easy-to-use and programmable interface. DAP focuses on file format conversions; DTS does content based analysis/extraction on/from a file. These services wrap relevant conversion and extraction operations within arbitrary software with reusability purpose, manage their deployment in an elastic manner, and manage job execution from behind a deliberately compact REST API. 

The DAP and DTS service components require both the support of heterogeneous OSes and applications as well as auto-scaling based on user demands. We use an elasticity approach with two deployment methods using VM and Docker in a cloud infrastructure. Each of these deployment methods emphasizes: i) fast response, scaling at two granularity levels: VM-level (start/suspend/resume a VM/Docker VM) and application instance /container level (automated login into a VM/ Docker VM to add/reduce application instances/containers); ii) support of heterogeneous OSes and applications; iii) scaling multiple applications simultaneously when required. The current implementation of this elasticity module is in a small production status within Brown Dog. We present the Brown Dog services and the elasticity architecture, design considerations, algorithm, and initial evaluation results for a real-world use case. We show several libraries and client applications that have been/are being developed to reduce the overhead of using the provided functionalities making more users accessible.

Reference:
[1] S. Padhy, G. Jansen, J. Alameda, E. Black, et al., ``Brown Dog: Leveraging Everything Towards Autocuration", IEEE Big Data 2015, Santa Clara, CA, USA, October 29-November 1, 2015.

Speaker Description: 

Jay Alameda is the lead for Advanced Application Support at the National Center for Supercomputing Applications. In this role, he works with the Extreme Science and Engineering Discovery Environment (XSEDE) which is a collaboration of NSF-funded high performance computing (HPC) resource providers, working to provide a common set of services, including the provisioning of advanced user support, to the science and engineering community. In particular, Jay leads the Extended Support for Training, Education, and Outreach Service of XSEDE, which provides the technical expertise to support Training, Education, and Outreach activities organized by XSEDE. He also was the lead of the recently completed NSF funded SI2 project, “A Productive and Accessible Development Workbench for HPC Applications Using the Eclipse Parallel Tools Platform”, which improved the Eclipse Parallel Tools Platform (PTP) to serve as a platform for development of HPC applications.

AttachmentSize
PDF icon BrownDog_SEA_Alameda.pdf1.78 MB

Event Category: