Contents of the Webinar
1. Low Latency
2. Streaming support
3. Machine Learning and Graph
4. Data Frame API Introduction
5. Spark Integration with Hadoop
Spark Architecture
Similar to Hadoop, Spark is a framework as well. In the image below, Spark core is a processing engine which is the core spark API, that is internally written in Scala.
Low Latency
Spark cuts down read/write I/O to Disk
Spark stores its data in the form of RDDs and they’re nothing but in memory collection of the data which are distributed across the machines, however, there are limitations. The unique feature of spark is it stores data depending on the kind of infrastructure.
Streaming support
Event Processing
Used for processing real-time streaming data.
It uses the D-stream: A series of RDDs, to process the real-time data support.
Cyclic Data flows
1. All jobs in Spark comprise a series of operators and run on a set of data.
2. All the operators in a job are used to construct a DAG.
3. The DAG is optimized by rearranging and combining operators where its possible.
Support for data frames
Data frame features
- Ability to scale from KBS to PBS.
- Support for a wide array of data formats and storage systems.
- Seemless integration with all big data tooling and infrastructure via spark.
Questions asked during the webinar
Mesos Vs YARN
Mesos and YARN are resource managers. YARN is popular because of Hadoop, mesos is not, although its functionality is the same.
Got a question for us? Please mention them in the comments section and we will get back to you.
Related Posts:
Get Started with Apache Spark and Scala
Apache Spark will replace Hadoop. Know why