Data Analysis with Python

Date and Time: 
2014 April 11th - AM and PM
Location: 
CG - room TBD
Speaker: 
Monte Lunacek

Over the past several years, Python has become an increasingly popular language for data analysis. Certainly part of the reason is the remarkable growth of the Python ecosystem, which has now filled in almost every aspect of the data analysis process. The ways you can interact with python are changing too. The IPython Notebook is a open-source, web-based interactive development environment that facilitates documentation and sharing. In addition to executable code, you can also include text and Latex math formulas, as well as embed graphics and other HTML5 dynamic elements. The result is a single executable document that is easily converted to a number of different formats for sharing. The IPython Notebook combined with modules such as numpy, pandas, stat models, scikit-learn, and matplotlib, creates a powerful, new way to approach data analysis in Python.

This is a hands-on, two-part tutorial.

Session One: The first session will introduce the IPython Notebook, cover the basic concepts of the python language, and introduce the pandas library, which provides a data structure that resembles a spreadsheet. The pandas library also provides efficient methods to clean, load, slice, reshape, query, summarize, and visualize your data. We will additionally cover the basics of array computing with numpy and visualization with matplotlib. Finally we conclude with a discussion on the many ways you can share your Notebook results.

Session Two: The second session continues with a deeper look at the pandas library. Then we introduce one of python’s most popular machine learning libraries, ski-kit learn. We provide tips for running a notebook on a remote cluster and discuss different ways to parallelize your workflow. We conclude this session with some advanced tips for creating interactive visualizations.

Speaker Description: 

Monte Lunacek is an HPC Application Specialist in the Research Computing group at the University of Colorado. Prior to joining CU, Monte was a postdoc in the Computational Science group at the National Renewable Energy Lab. His expertise are in high performance computing and parameter optimization. Monte received his PhD in Computer Science from Colorado State University.

Video recorded: 

Slides, git repository, and more:

http://researchcomputing.github.io/ucar_sea_2014/

Event Category: