The full form of RDD is a resilient distributed dataset. It is a representation of data located on a network which is:
Immutable – You can operate on the RDD to produce another RDD but you can’t alter it.
Partitioned / Parallel – The data located on RDD is operated in parallel. Any operation on RDD is done using multiple nodes.
Resilience – If one of the nodes hosting the partition fails, other nodes takes its data.
You can always think of RDD as a big array which is under the hood spread over many computers which are completely abstracted. So, RDD is made up many partitions each partition on different computers.