Scalable realtime architectures using Python on Storm

Date and Time: 
2013 November 14th @ 3pm
Location: 
CG1-1214 North Auditorium
Speaker: 
Jim Baker
Description
 
Open-sourced by Twitter and starting to see wide adoption for big data applications, Storm supports scalable realtime processing of data streams. You can use Storm to build a variety of realtime architectures, whether that's an analytics pipeline or driving a realtime dashboard integrating a variety of data sources. You can also now use Python - through the Jython implementation - to develop any desired deep integration with Storm, whether that's a custom bolt gluing together PyPI libraries or guaranteed exactly-once processing. Along with these code-rich examples, this talk will also show how Java can now directly use classes generated by Jython.
 
Abstract
 
Storm supports distributed, scalable, and robust realtime processing of data streams. Open sourced by Twitter, it's beginning to see widespread use, especially since Storm complements Hadoop for batch processing. In addition, Yahoo recently released support for Storm to use the YARN resource manager, allowing Storm to share the same resource manager as Hadoop.
 
Using Python with Storm is a natural fit. Storm is a classic framework in that it does a lot of plumbing on your behalf, such as managing retries to ensure all tuples are processed or moving tuples around. This infrastructure, mostly written in Clojure but also with some Java, has very good performance. On the other hand, it's arguably the case that it's much easier to write spouts and bolts - the components of a Storm topology - in Python. Python is an excellent language for writing "glue", and there also many high quality modules available in PyPI. The specific implementation of Python we will use is Jython, which is also used in such Hadoop projects as Pig and Cascading; Jython enables the necessary deep integration with Storm, going well beyond the capabilities of shell bolts and spouts.
 
This talk will provide a code-rich presentation of how you can use such integration, by looking at specific examples. In addition, this talk will also be of interest to any Jython developer, because it will present new and widely applicable techniques for supporting Java integration, especially in large distributed systems. 
Speaker Description: 
 Jim is a committer on Jython, for which he has worked on nearly every aspect from compilation to Unicode, and a co-author of the Definitive Guide to Jython. Jim is a senior software developer at Rackspace, where he works at the intersection of big data and cloud computing. He is also a lecturer in computer science at the University of Colorado at Boulder, where he teaches Principles of Programming Languages. He is a graduate of Harvard College and Brown University and is a nominated member of the Python Software Foundation.
Video recorded: 

If you use a non-flash enabled device, you may download the video here

Event Category: