How to Set Up Hadoop Cluster with HDFS High Availability

SacTiw says:
Dec 12, 2017 at 1:05 pm GMT
Normally a client would send a get/put file request to a particular “namenode” right? So once a failover has happened how would client get to know about it?
Assuming it is client responsibility to perform the retry on failure in that case is there a way client can first query for currently active namenode and then send a request to that one?
Reply
Barış says:
Nov 29, 2017 at 7:59 am GMT
It would be really good to show how to restart this system.
Thank you for sharing this valuable information.
Reply
- EdurekaSupport says:
  Jan 5, 2018 at 11:38 am GMT
  Thank you @Baris for appreciating our work. We will look into your suggestions as well. Cheers :)
  Reply
Hassan Asghar says:
Nov 4, 2017 at 1:41 pm GMT
my hadoop cluster is setup, and working fine:
i ran word count example:
can anybody provide me the following formulas to calculate some parameters:
Response Time:
Throughput:
Average I/o Rate:
Execution Time:
Thanks in advance
Reply
Den Kushnerik says:
Jan 9, 2017 at 9:42 am GMT
Hello. Its a very helpful instruction for me!
Do we need to format the ZKFC on Standby NameNode too?
According to this page: http://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html#Initializing_HA_state_in_ZooKeeper we must do it one time: “…next step is to initialize required state in ZooKeeper. You can do so by running the following command from one of the NameNode hosts.”
Reply
aagnasoft says:
Dec 15, 2016 at 5:20 am GMT
Wow, It is a very helpful information. Thank you so much.
Reply
Sanjay says:
Nov 19, 2016 at 12:37 pm GMT
Normally when we setup a hadoop cluster (non HA), we need to configure yarn by modifying its yarn-site.xml . For HA, don’t we require any HA specific modification to yarn-site.xml ?
Reply
- Ashish Bakshi says:
  Nov 29, 2016 at 8:11 am GMT
  Thanks Sanjay for going through the blog.
  In this blog, we are modifying hdfs-site.xml because we are enabling HA feature only for NameNode. And yes you are absolutely correct, you can have HA for ResourceManager as well where you will have to modify the yarn-site.xml similarly. You can follow the Hadoop documentations to setup HA for ResouceManager which is given below:
  https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html
  Reply
Rakibul hassan Rakib says:
Sep 5, 2016 at 7:17 am GMT
I am just correcting your HA Architecture image
Reply
Rakibul hassan Rakib says:
Sep 3, 2016 at 5:10 am GMT
After killing active or standby namenode I am not getting web view of killing namenode. Is it possible to getting web view after killing namenode ?. But you have seen two namenode web view after killing one namenode. How it is possible? I am facing some problem in my namenode.
Thank you
Rakib
Reply
- Mani says:
  Sep 9, 2016 at 7:36 pm GMT
  Hey Rakib,
  If the namenode is manualy transitioned from active to standby you should be able to see the WEB UI of the namenode as it is still active. But if there is a failover in the active namenode and the it got a automatic transition to the standby namenode you can’t have the web ui because of the obvious reason that the namenode is down. Once you fix the dead namenode you can see the UI with STANDBY mentioned in the UI. Hope this helps
  Thanks,
  MK
  Reply
- EdurekaSupport says:
  Sep 15, 2016 at 6:55 am GMT
  Hey Rakibul, thanks for checking out the blog. Please follow the steps given below:
  -> Please Check your hdfs-site.xml configuration file and make sure that you have set up the automatic failover as per given in the blog.
  -> In case you are still facing the issue, change the directory for namenode, datanode, JN and zookeeper and give the permission 755 for these directories
  chmod 755 directory_path
  -> Format the Active Namenode and start the services as per given in the blog
  Hope this helps.
  Reply
anil kumar says:
Dec 10, 2015 at 5:50 am GMT
am inistaling high avalability like nn1 & nn2 and dn1 …. in that nn1 and nn2 both are standby mode only what i do now
Reply
- Mani says:
  Jun 9, 2016 at 10:39 am GMT
  Hope you got the solution by now anil. It might be the reason that you did not enable automatic failover property in hdfs-site.xml. According to what you are saying that your cluster is in manual failover mode. In this scenario you have to individually designate which name node should be active or standby.
  hdfs haadmin -transitionToActive nn1
  (nn1 – Active , nn2 – Standby)
  hdfs haadmin -transitionToStandby nn1
  (nn1 – Standby , nn2 – Standby)
  hdfs haadmin -transitionToActive nn2
  (nn1 – Standby , nn2 – Active)
  hdfs haadmin -transitionToStandby nn2
  (nn1 – Standby , nn2 – Standby)
  Check your name node service status using the command:
  hdfs haadmin -getServiceStatus
  If you by mistake make both of them active you might encounter scenario of split-brain where on both nodes edits will be in progress resulting in corrupted metadata.
  Hope this helps!
  Thanks,
  MK
  Reply
sureseh says:
Nov 8, 2015 at 9:00 am GMT
Getting below error when i follow the above configuration settings.
15/11/08 01:58:34 ERROR namenode.FSNamesystem: FSNamesystem initialization failed.
java.io.IOException: Invalid configuration: a shared edits dir must not be specified if HA is not enabled.
and i dont find solution for this from google.
Can someone help
regards
suresh bk
Reply
- EdurekaSupport says:
  Nov 19, 2015 at 11:05 am GMT
  Hi Suresh bk
  Thank you for reaching out to us.
  You can connect with our 24/7 support team with all your queries and doubts regarding Hadoop once you enroll for the course.
  You can also get in touch with us by contacting our sales team on +91-8880862004 (India) or 1800 275 9730 (US toll free). You can mail us on sales@edureka.co.
  Reply

1 2 Next »

Virtual machine	IP address	Host name
Active NameNode	192.168.1.81	nn1.cluster.com or nn1
Standby NameNode	192.168.1.58	nn2.cluster.com or nn2
DataNode	192.168.1.82	dn1.cluster.com or dn1

Introduction to Big Data

Introduction to Hadoop

Hadoop Distributed File System

Hadoop Installation

YARN & MapReduce

Data Loading Tools

Apache Pig

Apache Hive

DynamoDB vs MongoDB: Which One Meets Your Business Needs Better?

How To Install MongoDB On Windows Operating System?

How To Install MongoDB On Ubuntu Operating System?

How To Install MongoDB on Mac Operating System?

How To Create User In MongoDB?

Apache HBase

Apache Oozie

Hadoop Interview Questions

Career Guidance

Big Data

How to Set Up Hadoop Cluster with HDFS High Availability

HDFS 2.x High Availability Cluster Architecture

Introduction:

NameNode Availability:

HDFS HA Architecture:

Implementation of HA Architecture:

1. Using Quorum Journal Nodes:

Fencing of NameNode:

2. Using Shared Storage:

Automatic Failover:

Setting Up and Configuring High Availability Cluster in Hadoop:

Recommended videos for you

Is Hadoop A Necessity For Data Science?

Apache Kafka With Spark Streaming: Real-Time Analytics Redefined

Pig Tutorial – Know Everything About Apache Pig Script

MapReduce Design Patterns – Application of Join Pattern

Filtering on HBase Using MapReduce Filtering Pattern

Tailored Big Data Solutions Using MapReduce Design Patterns

Apache Spark For Faster Batch Processing

Python for Big Data Analytics

Secure Your Hadoop Cluster With Kerberos

HBase Tutorial – A Complete Guide On Apache HBase

What is Apache Storm all about?

Power of Python With BigData

Bulk Loading Into HBase With MapReduce

Boost Your Data Career with Predictive Analytics! Learn How ?

Big Data – XML Parsing With MapReduce

Improve Customer Service With Big Data

Big Data Processing With Apache Spark

Streaming With Apache Spark and Scala

Introduction to Hadoop Administration

5 Scenarios: When To Use & When Not to Use Hadoop

Recommended blogs for you

Top Hadoop Interview Questions To Prepare In 2025 – HDFS

Hadoop MapReduce Interview Questions In 2025

Top 50+ Apache Spark Interview Questions and Answers for 2025

Hadoop Job Opportunities 101: Your Guide To Bagging Top Hadoop Jobs In 2020

How To Install MongoDB On Windows Operating System?

Everything About Cloudera Certified Administrator for Apache Hadoop (CCAH)

4 Practical Reasons to Learn Hadoop 2.0

Increasing Demand for ‘ Hadoop and NoSQL Skills ’

Spark MLlib – Machine Learning Library Of Apache Spark

Big Data Characteristics: Know the 5’Vs of Big Data

Install Hadoop: Setting up a Single Node Hadoop Cluster

How to Run Hive Scripts?

Introduction to Lambda Architecture

Brief Introduction to Oozie

Commissioning and Decommissioning Nodes in a Hadoop Cluster

RDD using Spark : The Building Block of Apache Spark

How to Plan the Capacity of a Hadoop Cluster?

Why SAP HANA is a Game Changer?

Pig Programming: Apache Pig Script in Local Mode

Big Data Applications in Healthcare

Join the discussionCancel reply

Trending Courses in Big Data

Microsoft Azure Data Engineering Training Cou ...

PySpark Certification Training Course

Microsoft Fabric Data Engineer Associate Trai ...

Apache Kafka Certification Training Course

Big Data Hadoop Certification Training Course

Applied Data Engineering on Azure Cloud Cours ...

Splunk Certification Training: Power User and ...