Spark revolves around the concept of a resilient distributed dataset (RDD), which is a fault-tolerant collection of elements that can be operated on in parallel. RDDs support two types of operations: transformations, which create a new dataset from an existing one, and actions, which return a value to the driver program after running a computation on the dataset.
Below image will help you understand how spark works internally:
data:image/s3,"s3://crabby-images/a213f/a213f734f96dd81c8ecd44d8383fed3d7f7b0e14" alt="image"