HDFS is a distributed file system that runs on commodity hardware and can handle massive data collections. It is used to scale an Apache Hadoop cluster from a few nodes to hundreds (or even thousands) of nodes. HDFS is one of Apache Hadoop's primary components, along with MapReduce and YARN.