First of all, Let us get a few clarifications done.
- Hadoop and Spark both were designed to handle big data but they both are designed for different purposes and with different capabilities.
- Hadoop was built to store data in a distributed environment and Spark was built to process data in a Distributed Environment.
- Spark does not have its own storage system, it needs to depend on Hadoop components for storage.
- while Hadoop has its own data processing units like MapReduce.
Spark can run With/Without the Hadoop components and you can run it in three different modes.
- Standalone
- Pseudo-Distributed
- Fully Distributed