Future Directions in Large Scale Systems Monitoring

Date and Time: 
2010 Sept 30th @ 3:15pm
Location: 
ML - Main Seminar Room
Speaker: 
Mike Lowe

As HPC systems have grown in size and complexity, monitoring of these systems hasn't kept pace. Current systems either don't scale or are the wrong fit, some systems are comprised of scripts systems administrators have migrated from machine to machine. Attempts to select a monitoring solution are further complicated by requirements for sharing data across administrative boundaries and existing monitoring systems. The current state of monitoring HPC resources will be discussed along with the motivations for finding new solutions. Ongoing experiments involving message buses, column store databases, micro formats, python, and failure prediction will also be discussed.

Website and Mailing list

The website and mailing list Mike mentions in his talk are:
https://sites.google.com/site/hpcmonitoring/
http://groups.google.com/group/hpc-monitoring

(note that authorization is required to see both, but NCAR employees will be easily able to receive permission)

Speaker Description: 

Michael Lowe received a BS from Purdue's School of Electrical and Computer Engineering with a focus on chip and embedded design. He has been employed for the past five years as a systems administrator at Indiana University. Prior experience includes a four year term as a network administrator at a Fortune 500 company.

Video recorded: 
alt text

Event Category: