The above video is the recorded webinar session on the topic “Big Data Processing with Spark and Scala”, held on 27th July’14.
Introduction to Spark & Scala:
Apache Spark is a fast and general engine for large-scale data processing, originally developed in the AMPLab at UC Berkeley. Spark is a good fit for the Hadoop open-source community as its built on top of the Hadoop Distributed File System (HDFS). But Spark has the added advantage of not being tied to the two-stage MapReduce paradigm and Apache Spark addresses the limitations of Hadoop MapReduce, by generalizing the MapReduce computation model, while dramatically improving performance and ease of use. Spark provides primitives for in-memory cluster computing that enables user programs to load data into a cluster’s memory and query it repeatedly, making it well suited to machine learning algorithms.
- What is Big Data?
- What is Spark?
- Why Spark?
- Spark Ecosystem
- A note about Scala
- Why Scala?
- Hello Spark
Spark Features:
- Fast Analytics
- Real-Time Stream Processing
- Fault Tolerant
Powerful and Integrated Data Processing
Easy to use
Please visit this link for more details about our course ‘Big Data Processing with Scala and Spark.’
Feel free to drop us a line for any clarifications.