Jupyter Ascending: a practical hand guide to galactic scale, reproducible data science

Date and Time: 
Tuesday, April 5th, 2016
Center Green
John Fonner

Scientific reproducibility must be as much about accessibility and clearly communicating ideas as it is about making calculations consistent. As computation plays an ever increasing role in research, packaging computations in a way that supports simple, transparent recreation of the results is critical for many scientific domains. A number of software tools and container technologies such as Jupyter notebooks and Docker containers provide key elements toward this end, but they also have limitations both in capability and ease of use. For example, these tools lend themselves toward execution on a single systems, however embedding distributed computations on cloud or supercomputing resources is not natively available. Additionally, there are common, practical misconceptions within many scientific communities on what constitutes responsible practices for data accessibility and reproducibility.

In this talk we will present our approach to using open-source tools and freely available commercial tools, including Jupyter, GitHub, and the Agave API, as a way to comprehensively capture data exploration, analysis, and visualization that may span data storage and compute systems. A key feature of the approach is the ability to move the same analyses between desktop, cloud, and supercomputing resources. We will discuss the approach within the context of reproducibility best practices and also provide runnable demonstrations of the concepts we discuss

Speaker Description: 

John Fonner is a research associate in Life Sciences Computing at the Texas Advanced Computing Center (TACC). He earned a Ph.D. in Biomedical Engineering at the University of Texas at Austin, where he used a blend of experimental and computational techniques to study binding interactions between peptides and conducting polymers for implant applications in the nervous system. Since joining TACC in 2011, John has served on a number of projects that help life sciences researchers leverage advanced computing resources, both through training and through the development of better tools and cyberinfrastructure.

PDF icon jfonner-JupyterAscending-sea2016.pdf2.27 MB

Event Category: