Hi,
We know, In HDFS data is divided into blocks by the parameter dfs.block.size in the config file named hdfs-site.xml. The default block size is 64MB or 128MB, depending on your machine. And the default replicas is 3 by the parameter dfs.replication.
We know, HDFS uses rack aware data placement strategy that means if the blocks are placed in one rack then their copy will be placed in another rack so as to achieve fault tolerance when there is node failure or switch failure.
Default block placement policy present in HDFS are given below:
-
When a client uploads data to HDFS, then the first replica of the block stores either local node or on a random node depending upon the HDFS client running in the cluster.
-
Second replica of the block is stores on a rack other than first replica placement.
-
Third replica of the block stores in the rack where the second replica is placed.
-
If there are replicas remaining; distribute them randomly across the racks present in network with the restriction that, in the same rack there are no more than two replicas.
Hope this will help you to clear your doubt.
Thank You