Open-sourced by Twitter and starting to see wide adoption for big data applications, Storm supports scalable realtime processing of data streams. You can use Storm to build a variety of realtime architectures, whether that's an analytics pipeline or driving a realtime dashboard integrating a variety of data sources. You can also now use Python - through the Jython implementation - to develop any desired deep integration with Storm, whether that's a custom bolt gluing together PyPI libraries or guaranteed exactly-once processing. Along with these code-rich examples, this talk will also show how Java can now directly use classes generated by Jython.
Storm supports distributed, scalable, and robust realtime processing of data streams. Open sourced by Twitter, it's beginning to see widespread use, especially since Storm complements Hadoop for batch processing. In addition, Yahoo recently released support for Storm to use the YARN resource manager, allowing Storm to share the same resource manager as Hadoop.
Using Python with Storm is a natural fit. Storm is a classic framework in that it does a lot of plumbing on your behalf, such as managing retries to ensure all tuples are processed or moving tuples around. This infrastructure, mostly written in Clojure but also with some Java, has very good performance. On the other hand, it's arguably the case that it's much easier to write spouts and bolts - the components of a Storm topology - in Python. Python is an excellent language for writing "glue", and there also many high quality modules available in PyPI. The specific implementation of Python we will use is Jython, which is also used in such Hadoop projects as Pig and Cascading; Jython enables the necessary deep integration with Storm, going well beyond the capabilities of shell bolts and spouts.
This talk will provide a code-rich presentation of how you can use such integration, by looking at specific examples. In addition, this talk will also be of interest to any Jython developer, because it will present new and widely applicable techniques for supporting Java integration, especially in large distributed systems.