How Namenode handles data node failures

I am working on a Hadoop cluster and in my cluster I've 5 datanodes but one of my data node gets down. Now my question is how Namenode handles the datanode failures?

Can someone please explain?

Thanks in advance!
Jul 11, 2018
Let me explain you the whole scenario.

NameNode periodically receives a Heartbeat and a Blockreport from each of the DataNodes in the cluster. Receipt of a Heartbeat implies that the DataNode is functioning properly. A Blockreport contains a list of all blocks on a DataNode. When NameNode notices that it has not recieved a hearbeat message from a data node after a certain amount of time, the data node is marked as dead. Since blocks will be under replicated the system begins replicating the blocks that were stored on the dead datanode. The NameNode Orchestrates the replication of data blocks from one datanode to another. The replication data transfer happens directly between datanodes and the data never passes through the namenode. 
NoteIf the Name Node stops receiving heartbeats from a Data Node it presumes it to be dead and any data it had to be gone as well. Based on the block reports it had been receiving from the dead node, the Name Node knows which copies of blocks died along with the node and can make the decision to re-replicate those blocks to other Data Nodes.It will also consult the Rack Awareness data in order to maintain the two copies in one rack, one copy in another rack replica rule when deciding which Data Node should receive a new copy of the blocks.


answered Jul 11, 2018 by nitinrawat895
Hi nitinrawat895,

Good Explanation, Very informative !!

I'm trying to understand the working mechanism of the Naemode in case of HA failure. The working mechanism of under replicated block totally understood but What is the process of handling the missing blocks?  Does Namenode periodically stores the block report information received from data nodes in a particular location? If so, then whenever active Namenode shut down because of any reason zookeeper make active as a stand by via vera for other. In that case, Stand by Namenode should have all the block information right ?. In my case, HA happened smoothly but active Namenode could not get out from safe mode he was busy in process of discovering millions of blocks for a while It was around 2-3 hours. So, my question is if he has the block information for requiring data nodes it should be up early as expected. Why is trying to discover those missing blocks?

