conference-talk

Using open source tools to reduce large data sets in a distributed environment.

Date and Time: 
Monday, April 4th, 2016
Location: 
Center Green
Speaker: 
Scott Collis

Radar data is complex and with the archive of NOAA NEXRAD data approaching the petabyte range, managing and extracting geophysical insight from this data is truly challenging. This presentation will discuss an approach using Python, specifically using resources available through project Jupyter, to solve pleasantly parallel problems. This is especially pertinent given the increasing prevalence of distributed computing resources.

Speaker Description: 

Scott Collis is radar metereologist at Argonne National Lab.

Event Category:

On the potential of Big Data capabilities for the validation of a Weather Forecasting System

Date and Time: 
Monday, April 4th, 2016
Location: 
Center Green
Speaker: 
Giuseppe Iannitto
A key component of the EO projects is the validation of the EO data products through a Ground Truth Validation. In the validation process data can be collected from various ground-based sources and sensors (in situ measurements, instruments, crowd-sourcing, open source platform), then quality-controlled, and finally compared with the satellite products in order to get validated retrievals.
Speaker Description: 
Currently 3rd year PhD student at Tor Vergata University (Rome) for the geoinformation department, studying on big data “A Map-Reduce approach for management and exploitation of Earth Observation Big Data”
Senior IT Consultant/Manager, Mr. Iannitto worked as Technical Director for an IT Consulting organization, with over 10 years of hands on system integrations, management and software development experience in the Information and Communication Technology Industry.
Ten years experienced as system integrator: responsible of system integration and acceptance test on a Europe-wide project of a Combat Management System for the Italian and French Navy.
His expertise as project manager encompasses any project phase from pre-project sales, strategic planning, project planning and project control, as well as post-project evaluation with the client sponsor to assure satisfaction and to identify any process improvements that can be made to the client.
Also, proficient in most client server platforms and numerous programming languages. Very good analytical problem-solving ability with excellent verbal and written communication skills.

Event Category:

What Climate Science Knows About Big Data

Date and Time: 
Monday, April 4th, 2016
Location: 
Center Green
Speaker: 
Seth McGinnis

Big Data refers to any data set that's too big to be handled by conventional tools and techniques. This criterion is relative, context-dependent, and changes over time. It is often taken to mean data with high storage volume on the terabyte to petabyte scale (currently), but there are other ways for data to be Big, and an array of different approaches are needed to wrestle such data sets into tractability.

Speaker Description: 

Seth McGinnis is an Associate Scientist IV in the Institute for Mathematics Applied to Geosciences (IMAGe) at NCAR. As the Data Manager and User Community Manager for NARCCAP, the North American Regional Climate Change Assessment Program, he makes the output from climate models usable by and available to people who need information about climate change. His research focuses on bias correction, interpolation, and other issues affecting the practical use of model output by non-specialists.

Event Category:

Big Data from CESM - Past, Present and the near Future

Date and Time: 
Monday, April 4th, 2016
Location: 
Center Green
Speaker: 
Gary Strand

The Community Earth System Model (CESM) has been in the realm of "big data" since before it became a buzzword. I'll talk about past experiences with big data (primarily CMIP5), and the lessons learned from those experiences. Those lessons, which have informed and improved big data practices within CESM since that time, will be discussed as well as the preparations being made for the upcoming CMIP6. I'll also mention the various aspects of good data management and how the CESM project is addressing them.

Speaker Description: 

Gary Strand is a software engineer who has become the *de facto* data manager and sometime data scientists for the CESM project.

Event Category:

Welcoming remarks

Date and Time: 
2015 April 13 @ 9:15am
Location: 
FL2-1022 Large Auditorium
Speaker: 
Jim Hurrel
Speaker Description: 

Jim Hurrell is the Director of NCAR.

Event Category:

Video recorded: 

If you use a non-flash enabled device, you may download the video here

Project Jupyter: a language-independent architecture for open computing and data science

Date and Time: 
2015 April 13 @ 9:30am
Location: 
FL2-1022 Large Auditorium
Speaker: 
Fernando Perez

IPython began its life in Boulder in 2001, as an environment for interactive scientific computing and data analysis, motivated by my needs as a physics graduate student. Over the years, it evolved into one of the main elements of the collaboratively developed ecosystem of open source tools for science in Python. In recent years, IPython has evolved into Project Jupyter: an architecture that takes the foundations of IPython and extends them to any programming language. Jupyter offers interactive terminals as well as a popular web-based notebook environment.

Speaker Description: 

Fernando Pérez (@fperez_org) is a staff scientist at Lawrence Berkeley National Laboratory and a founding investigator of the Berkeley Institute for Data Science, created in 2013. He received a PhD in particle physics, followed by postdoctoral research in applied mathematics, developing numerical algorithms. Today, his research focuses on creating tools for modern computational research and data science across domain disciplines, with an emphasis on high-level languages, literate computing and reproducible research. He created IPython while a graduate student in 2001 and continues to lead it as it evolves into the Jupyter Project, now as a collaborative effort with a talented team that does all the hard work. He regularly lectures about scientific computing and data science, and is a member of the Python Software Foundation as well as a founding member of the Numfocus Foundation. He is the recipient of the 2012 Award for the Advancement of Free Software from the Free Software Foundation.

Event Category:

Video recorded: 

If you use a non-flash enabled device, you may download the video here

Open Space

Date and Time: 
2015 April 15 @ 3pm
Location: 
FL2 Cafeteria Atrium
Speaker: 
n/a

Open Space is an approach that enables groups of any size to address complex, important issues and achieve meaningful results quickly. In Open Space meetings and events, participants create and manage their own agenda of parallel working sessions around a central theme of strategic importance. Conference participants will have the opportunity to apply Open Space technique to self-organize and discuss relevant topics with other meeting attendees. A space to write down ideas, questions, and discussion topics will be available throughout the first two days of the conference.

Speaker Description: 

Facilitator:
Nathan Wilhelmi is a software engineer at the National Center for Atmospheric Research where he leads the Software and Web Engineering Group. Prior to working at NCAR Wilhelmi worked in software development across a range of private sector domains. His primary focus is in the area of application development using Agile project management and engineering practices.

Event Category:

TACC Stats: A Comprehensive and Transparent Resource Usage Monitoring Tool for HPC Systems

Date and Time: 
2015 April 14 @ 1:30pm
Location: 
FL2-1022 Large Auditorium
Speaker: 
Todd Evans

We have developed and deployed the transparent and comprehensive resource usage monitoring and analysis tool TACC Stats at the Texas Advanced Computing Center (TACC). This tool is currently used to aid TACC’s system administrators and HPC consultants in the diagnosis and resolution of application and system issues and to identify jobs with poor performance characteris- tics or inefficient resource usage utilization. TACC Stats automatically collects resource usage data at regular time intervals and computes performance metrics for every job run on an HPC system.

Speaker Description: 

Dr. Todd Evans is an HPC Research Associate at the Texas Advanced Computing Center and Research Scientist Lecturer in the Department of Statistics & Data Science at UT Austin. Dr. Evans received his Ph.D. in Physics from the University of Illinois at Urbana-Champaign in 2008 and has been staff at UT Austin since 2013. Evans current research interests include the development of tools for transparent job-level monitoring and performance analysis of HPC systems.

Event Category:

Video recorded: 

If you use a non-flash enabled device, you may download the video here

Development of a Python GUI Interface to a YAML Configuration File for Propagation of Largely Identical Database Records between Field Project Entries

Date and Time: 
2015 April 14 @ 8:30am
Location: 
FL2-1022 Large Auditorium
Speaker: 
Soo Rin Park

The EOL Metadata Database and Cyberinfrastructure (EMDAC) is a comprehensive metadata database and integrated cyberinfrastructure which provides a public data portal to all of EOL’s field project data holdings. This paper demonstrates the use of the Python programming language to create a GUI (Graphical User Interface) tool that generates or edits a YAML (YAML Ain’t Markup Language) based metadata configuration file to automate data loading into the EOL metadata and (internal) data tracking system databases through user input and data serialization.

Speaker Description: 

Soo Park is currently pursuing an undergraduate degree in Electrical and Computer Engineering with a minor in Technology, Arts & Media at University of Colorado Boulder, and is expected to graduate May 2016. She has been working as a Software Assistant at NCAR/EOL since June 2014.

Event Category:

User Environment Tracking and Problem Detection with XALT

Date and Time: 
2015 April 14 @ 11:00am
Location: 
FL2-1022 Large Auditorium
Speaker: 
Robert McLay

XALT is a product to help sites understand individual users’ software needs, then leverages that understanding to help stakeholders conduct business in a more efficient, effective, and systematic way. It builds on work that is already improving the user experience and enhancing support programs for thousands of users on twelve supercomputers across the United States and Europe. XALT instruments individual jobs on high-end computers to generate a picture of the compilers, libraries, and other software that users need to run their jobs successfully.

Speaker Description: 

Doctor Robert McLay received bachelors and masters degree from the Massachusetts Institute of Technology and his Ph.D in Engineering Mechanics from The University of Texas at Austin. His research include C++ software development, regression testing, and software tools, all related to large parallel numerical simulation codes. In particular, he has done work in parallel finite-element programs solving incompressible fluid flow and heat transfer.

His interest in software tools and support of HPC programming environments has lead to his development of Lmod, a modern replacement for Environment Modules system. Lmod's major advantages are protect all users from loading incompatible software without hindering experts. This work as lead to an interest in tracking the software usage through the module system.

Event Category:

Video recorded: 

If you use a non-flash enabled device, you may download the video here

Pages

Subscribe to conference-talk