Some of the key differences between an RDD and Distributed Storage are as follows:
- A Resilient Distributed Dataset (RDD) is the primary abstraction of data for the Apache Spark framework.
- Distributed Storage is simply a file system which works on multiple nodes.
- RDDs store data in-memory (unless explicitly cached).
- Distributed Storage stores data in persistent storage.
- RDDs can re-compute itself in the case of failure or data loss.
- If data is lost from the Distributed Storage system it is gone forever (unless there is an internal replication system).
I hope this helps you !!