27 Jul 2014

Big Data Processing with Spark and Scala

The above video is the recorded webinar session on the topic “Big Data Processing with Spark and Scala”, held on 27th July’14.

Introduction to Spark & Scala:

Apache Spark is a fast and general engine for large-scale data processing, originally developed in the AMPLab at UC Berkeley. Spark is a good fit for the Hadoop open-source community as its built on top of the Hadoop Distributed File System (HDFS). But Spark has the added advantage of not being tied to the two-stage MapReduce paradigm and Apache Spark addresses the limitations of Hadoop MapReduce, by generalizing the MapReduce computation model, while dramatically improving performance and ease of use. Spark provides primitives for in-memory cluster computing that enables user programs to load data into a cluster’s memory and query it repeatedly, making it well suited to machine learning algorithms.

Scala is an acronym for ‘Scalable Language’ Scala is a object-oriented language and its scalability is the result of a careful integration of object-oriented and functional language concepts. The language supports advanced component architectures through classes and traits. Scala also includes first-class functions and a library with resourceful immutable data structures.

Topics covered in the Video & Presentation:

What is Big Data?
What is Spark?
Why Spark?
Spark Ecosystem
A note about Scala
Why Scala?
Hello Spark

Spark Features:

Fast Analytics
Real-Time Stream Processing
Fault Tolerant
Powerful and Integrated Data Processing
Easy to use

Please visit this link for more details about our course ‘Big Data Processing with Scala and Spark.’
Feel free to drop us a line for any clarifications.

ol/u/0/

Big Data Processing with Spark and Scala

Introduction to Spark & Scala:

Playlist & Videos

Related Blogs