Apache Storm is a open source, distributed real-time computation system for processing fast, large streams of data. With Storm and MapReduce running together in Hadoop on YARN, a Hadoop cluster can resourcefully process a full range of workloads from real-time to batch.
Real-Time Analytics with Apache Storm – Topics covered in the Presentation:
- Introduction to Apache Storm & importance of Real-Time processing
- How Apache Storm overcomes Hadoop’s shortcomings?
- Real world applications of Apache Storm.
- What makes Storm ideal for real-time processing?
- Architecture of a Storm cluster.
- How Storm and Hadoop fits together?
- Data ingesting techniques in Storm.
- Managing Hadoop and Storm cluster with Apache Ambari.
Presentation:
Characteristics of Storm that makes it Ideal for Real-Time Data Processing:
- Fast – Processes one million 100 byte messages per second per node
- Scalable – Parallel calculations that run across a cluster of machines
- Fault-tolerant – Automatic restart when a worker or node dies.
- Reliable – Guarantees to process each unit of data at least once or exactly once.
- Easy to Operate – Standard configurations suitable for production from day one.
Feel free to drop us a line for any clarifications.
Related Posts:
What is Apache Storm all about?