How to Set Up Hadoop Cluster with HDFS High Availability

SacTiw says:
Dec 12, 2017 at 1:05 pm GMT
Normally a client would send a get/put file request to a particular “namenode” right? So once a failover has happened how would client get to know about it?
Assuming it is client responsibility to perform the retry on failure in that case is there a way client can first query for currently active namenode and then send a request to that one?
Reply
Barış says:
Nov 29, 2017 at 7:59 am GMT
It would be really good to show how to restart this system.
Thank you for sharing this valuable information.
Reply
- EdurekaSupport says:
  Jan 5, 2018 at 11:38 am GMT
  Thank you @Baris for appreciating our work. We will look into your suggestions as well. Cheers :)
  Reply
Hassan Asghar says:
Nov 4, 2017 at 1:41 pm GMT
my hadoop cluster is setup, and working fine:
i ran word count example:
can anybody provide me the following formulas to calculate some parameters:
Response Time:
Throughput:
Average I/o Rate:
Execution Time:
Thanks in advance
Reply
Den Kushnerik says:
Jan 9, 2017 at 9:42 am GMT
Hello. Its a very helpful instruction for me!
Do we need to format the ZKFC on Standby NameNode too?
According to this page: http://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html#Initializing_HA_state_in_ZooKeeper we must do it one time: “…next step is to initialize required state in ZooKeeper. You can do so by running the following command from one of the NameNode hosts.”
Reply
aagnasoft says:
Dec 15, 2016 at 5:20 am GMT
Wow, It is a very helpful information. Thank you so much.
Reply
Sanjay says:
Nov 19, 2016 at 12:37 pm GMT
Normally when we setup a hadoop cluster (non HA), we need to configure yarn by modifying its yarn-site.xml . For HA, don’t we require any HA specific modification to yarn-site.xml ?
Reply
- Ashish Bakshi says:
  Nov 29, 2016 at 8:11 am GMT
  Thanks Sanjay for going through the blog.
  In this blog, we are modifying hdfs-site.xml because we are enabling HA feature only for NameNode. And yes you are absolutely correct, you can have HA for ResourceManager as well where you will have to modify the yarn-site.xml similarly. You can follow the Hadoop documentations to setup HA for ResouceManager which is given below:
  https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html
  Reply
Rakibul hassan Rakib says:
Sep 5, 2016 at 7:17 am GMT
I am just correcting your HA Architecture image
Reply
Rakibul hassan Rakib says:
Sep 3, 2016 at 5:10 am GMT
After killing active or standby namenode I am not getting web view of killing namenode. Is it possible to getting web view after killing namenode ?. But you have seen two namenode web view after killing one namenode. How it is possible? I am facing some problem in my namenode.
Thank you
Rakib
Reply
- Mani says:
  Sep 9, 2016 at 7:36 pm GMT
  Hey Rakib,
  If the namenode is manualy transitioned from active to standby you should be able to see the WEB UI of the namenode as it is still active. But if there is a failover in the active namenode and the it got a automatic transition to the standby namenode you can’t have the web ui because of the obvious reason that the namenode is down. Once you fix the dead namenode you can see the UI with STANDBY mentioned in the UI. Hope this helps
  Thanks,
  MK
  Reply
- EdurekaSupport says:
  Sep 15, 2016 at 6:55 am GMT
  Hey Rakibul, thanks for checking out the blog. Please follow the steps given below:
  -> Please Check your hdfs-site.xml configuration file and make sure that you have set up the automatic failover as per given in the blog.
  -> In case you are still facing the issue, change the directory for namenode, datanode, JN and zookeeper and give the permission 755 for these directories
  chmod 755 directory_path
  -> Format the Active Namenode and start the services as per given in the blog
  Hope this helps.
  Reply
anil kumar says:
Dec 10, 2015 at 5:50 am GMT
am inistaling high avalability like nn1 & nn2 and dn1 …. in that nn1 and nn2 both are standby mode only what i do now
Reply
- Mani says:
  Jun 9, 2016 at 10:39 am GMT
  Hope you got the solution by now anil. It might be the reason that you did not enable automatic failover property in hdfs-site.xml. According to what you are saying that your cluster is in manual failover mode. In this scenario you have to individually designate which name node should be active or standby.
  hdfs haadmin -transitionToActive nn1
  (nn1 – Active , nn2 – Standby)
  hdfs haadmin -transitionToStandby nn1
  (nn1 – Standby , nn2 – Standby)
  hdfs haadmin -transitionToActive nn2
  (nn1 – Standby , nn2 – Active)
  hdfs haadmin -transitionToStandby nn2
  (nn1 – Standby , nn2 – Standby)
  Check your name node service status using the command:
  hdfs haadmin -getServiceStatus
  If you by mistake make both of them active you might encounter scenario of split-brain where on both nodes edits will be in progress resulting in corrupted metadata.
  Hope this helps!
  Thanks,
  MK
  Reply
sureseh says:
Nov 8, 2015 at 9:00 am GMT
Getting below error when i follow the above configuration settings.
15/11/08 01:58:34 ERROR namenode.FSNamesystem: FSNamesystem initialization failed.
java.io.IOException: Invalid configuration: a shared edits dir must not be specified if HA is not enabled.
and i dont find solution for this from google.
Can someone help
regards
suresh bk
Reply
- EdurekaSupport says:
  Nov 19, 2015 at 11:05 am GMT
  Hi Suresh bk
  Thank you for reaching out to us.
  You can connect with our 24/7 support team with all your queries and doubts regarding Hadoop once you enroll for the course.
  You can also get in touch with us by contacting our sales team on +91-8880862004 (India) or 1800 275 9730 (US toll free). You can mail us on sales@edureka.co.
  Reply

1 2 Next »

Virtual machine	IP address	Host name
Active NameNode	192.168.1.81	nn1.cluster.com or nn1
Standby NameNode	192.168.1.58	nn2.cluster.com or nn2
DataNode	192.168.1.82	dn1.cluster.com or dn1

Introduction to Big Data

Introduction to Hadoop

Hadoop Distributed File System

Hadoop Installation

YARN & MapReduce

Data Loading Tools

Apache Pig

Apache Hive

DynamoDB vs MongoDB: Which One Meets Your Business Needs Better?

How To Install MongoDB On Windows Operating System?

How To Install MongoDB On Ubuntu Operating System?

How To Install MongoDB on Mac Operating System?

How To Create User In MongoDB?

Apache HBase

Apache Oozie

Hadoop Interview Questions

Career Guidance

Big Data

How to Set Up Hadoop Cluster with HDFS High Availability

HDFS 2.x High Availability Cluster Architecture

Introduction:

NameNode Availability:

HDFS HA Architecture:

Implementation of HA Architecture:

1. Using Quorum Journal Nodes:

Fencing of NameNode:

2. Using Shared Storage:

Automatic Failover:

Setting Up and Configuring High Availability Cluster in Hadoop:

Recommended videos for you

Apache Spark For Faster Batch Processing

Big Data Processing with Spark and Scala

Secure Your Hadoop Cluster With Kerberos

Introduction to Big Data TDD and Pig Unit

MapReduce Design Patterns – Application of Join Pattern

Streaming With Apache Spark and Scala

Introduction to Apache Solr-1

Hadoop-A Highly Available And Secure Enterprise Data Warehousing Solution

Webinar: Introduction to Big Data & Hadoop

What Is Hadoop – All You Need To Know About Hadoop

Pig Tutorial – Know Everything About Apache Pig Script

Is It The Right Time For Me To Learn Hadoop ? Find out.

Distributed Cache With MapReduce

Filtering on HBase Using MapReduce Filtering Pattern

Hadoop for Java Professionals

Apache Spark Will Replace Hadoop ! Know Why

What is Big Data and Why Learn Hadoop!!!

HBase Tutorial – A Complete Guide On Apache HBase

Advanced Security In Hadoop Cluster

Power of Python With BigData

Recommended blogs for you

Apache Flink: The Next Gen Big Data Analytics Framework For Stream And Batch Data Processing

Why Scala is getting Popular?

Introduction to Apache MapReduce and HDFS

Big Data Applications in Healthcare

30+ Azure Data Engineer Interview Questions

Introduction to Pig

What is a JavaScript Variable and How to declare it?

Jobs In Hadoop

Rio Olympics 2016: Big Data powers the biggest sporting spectacle of the year!

Hadoop Career: Career in Big Data Analytics

Spark GraphX Tutorial – Graph Analytics In Apache Spark

Basics of HBase

Hive Data Models: Designing Efficient Data Structures

Apache Storm Use Cases

Apache Spark with Hadoop – Why it Matters?

Why do we need Hadoop for Data Science?

Azure Data Engineer Roadmap in 2025

Map Side Join Vs. Join

Splunk Knowledge Objects: Splunk Timechart, Data Models And Alert

Spark Tutorial: Real Time Cluster Computing Framework

Join the discussionCancel reply

Trending Courses in Big Data

Microsoft Azure Data Engineering Training Cou ...

PySpark Certification Training Course

Microsoft Fabric Data Engineer Associate Trai ...

Apache Kafka Certification Training Course

Big Data Hadoop Certification Training Course

Applied Data Engineering on Azure Cloud Cours ...

Apache Spark and Scala Certification Training ...