The function of Distributes File System is to partitions the data, store and manage the data across different machines. DFS can handle the large volume of data, but Hadoop framework will help to process the large amount of data.
Large data is divided into several blocks and stored in different commodity hardware (storing data into distributed way).
Lets take an example in general:
If we want to process those data, first we will go to commodity hardware and will copy the data to the processing unit and finally, we will do the processing.
This process has some complications, for large data.
- When transferring large data some data may lose and make trouble.
- But if we use HADOOP, we don't need to take the data to the processing unit.
- Instead, we will take our processing unit to the commodity hardware, where our data is stored.
- Processing can be done over there and we can take the output.
This is how Hadoop framework is more beneficial then DFS.