How to Set Up Hadoop Cluster with HDFS High Availability

SacTiw says:
Dec 12, 2017 at 1:05 pm GMT
Normally a client would send a get/put file request to a particular “namenode” right? So once a failover has happened how would client get to know about it?
Assuming it is client responsibility to perform the retry on failure in that case is there a way client can first query for currently active namenode and then send a request to that one?
Reply
Barış says:
Nov 29, 2017 at 7:59 am GMT
It would be really good to show how to restart this system.
Thank you for sharing this valuable information.
Reply
- EdurekaSupport says:
  Jan 5, 2018 at 11:38 am GMT
  Thank you @Baris for appreciating our work. We will look into your suggestions as well. Cheers :)
  Reply
Hassan Asghar says:
Nov 4, 2017 at 1:41 pm GMT
my hadoop cluster is setup, and working fine:
i ran word count example:
can anybody provide me the following formulas to calculate some parameters:
Response Time:
Throughput:
Average I/o Rate:
Execution Time:
Thanks in advance
Reply
Den Kushnerik says:
Jan 9, 2017 at 9:42 am GMT
Hello. Its a very helpful instruction for me!
Do we need to format the ZKFC on Standby NameNode too?
According to this page: http://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html#Initializing_HA_state_in_ZooKeeper we must do it one time: “…next step is to initialize required state in ZooKeeper. You can do so by running the following command from one of the NameNode hosts.”
Reply
aagnasoft says:
Dec 15, 2016 at 5:20 am GMT
Wow, It is a very helpful information. Thank you so much.
Reply
Sanjay says:
Nov 19, 2016 at 12:37 pm GMT
Normally when we setup a hadoop cluster (non HA), we need to configure yarn by modifying its yarn-site.xml . For HA, don’t we require any HA specific modification to yarn-site.xml ?
Reply
- Ashish Bakshi says:
  Nov 29, 2016 at 8:11 am GMT
  Thanks Sanjay for going through the blog.
  In this blog, we are modifying hdfs-site.xml because we are enabling HA feature only for NameNode. And yes you are absolutely correct, you can have HA for ResourceManager as well where you will have to modify the yarn-site.xml similarly. You can follow the Hadoop documentations to setup HA for ResouceManager which is given below:
  https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html
  Reply
Rakibul hassan Rakib says:
Sep 5, 2016 at 7:17 am GMT
I am just correcting your HA Architecture image
Reply
Rakibul hassan Rakib says:
Sep 3, 2016 at 5:10 am GMT
After killing active or standby namenode I am not getting web view of killing namenode. Is it possible to getting web view after killing namenode ?. But you have seen two namenode web view after killing one namenode. How it is possible? I am facing some problem in my namenode.
Thank you
Rakib
Reply
- Mani says:
  Sep 9, 2016 at 7:36 pm GMT
  Hey Rakib,
  If the namenode is manualy transitioned from active to standby you should be able to see the WEB UI of the namenode as it is still active. But if there is a failover in the active namenode and the it got a automatic transition to the standby namenode you can’t have the web ui because of the obvious reason that the namenode is down. Once you fix the dead namenode you can see the UI with STANDBY mentioned in the UI. Hope this helps
  Thanks,
  MK
  Reply
- EdurekaSupport says:
  Sep 15, 2016 at 6:55 am GMT
  Hey Rakibul, thanks for checking out the blog. Please follow the steps given below:
  -> Please Check your hdfs-site.xml configuration file and make sure that you have set up the automatic failover as per given in the blog.
  -> In case you are still facing the issue, change the directory for namenode, datanode, JN and zookeeper and give the permission 755 for these directories
  chmod 755 directory_path
  -> Format the Active Namenode and start the services as per given in the blog
  Hope this helps.
  Reply
anil kumar says:
Dec 10, 2015 at 5:50 am GMT
am inistaling high avalability like nn1 & nn2 and dn1 …. in that nn1 and nn2 both are standby mode only what i do now
Reply
- Mani says:
  Jun 9, 2016 at 10:39 am GMT
  Hope you got the solution by now anil. It might be the reason that you did not enable automatic failover property in hdfs-site.xml. According to what you are saying that your cluster is in manual failover mode. In this scenario you have to individually designate which name node should be active or standby.
  hdfs haadmin -transitionToActive nn1
  (nn1 – Active , nn2 – Standby)
  hdfs haadmin -transitionToStandby nn1
  (nn1 – Standby , nn2 – Standby)
  hdfs haadmin -transitionToActive nn2
  (nn1 – Standby , nn2 – Active)
  hdfs haadmin -transitionToStandby nn2
  (nn1 – Standby , nn2 – Standby)
  Check your name node service status using the command:
  hdfs haadmin -getServiceStatus
  If you by mistake make both of them active you might encounter scenario of split-brain where on both nodes edits will be in progress resulting in corrupted metadata.
  Hope this helps!
  Thanks,
  MK
  Reply
sureseh says:
Nov 8, 2015 at 9:00 am GMT
Getting below error when i follow the above configuration settings.
15/11/08 01:58:34 ERROR namenode.FSNamesystem: FSNamesystem initialization failed.
java.io.IOException: Invalid configuration: a shared edits dir must not be specified if HA is not enabled.
and i dont find solution for this from google.
Can someone help
regards
suresh bk
Reply
- EdurekaSupport says:
  Nov 19, 2015 at 11:05 am GMT
  Hi Suresh bk
  Thank you for reaching out to us.
  You can connect with our 24/7 support team with all your queries and doubts regarding Hadoop once you enroll for the course.
  You can also get in touch with us by contacting our sales team on +91-8880862004 (India) or 1800 275 9730 (US toll free). You can mail us on sales@edureka.co.
  Reply

1 2 Next »

Virtual machine	IP address	Host name
Active NameNode	192.168.1.81	nn1.cluster.com or nn1
Standby NameNode	192.168.1.58	nn2.cluster.com or nn2
DataNode	192.168.1.82	dn1.cluster.com or dn1

Introduction to Big Data

Introduction to Hadoop

Hadoop Distributed File System

Hadoop Installation

YARN & MapReduce

Data Loading Tools

Apache Pig

Apache Hive

DynamoDB vs MongoDB: Which One Meets Your Business Needs Better?

How To Install MongoDB On Windows Operating System?

How To Install MongoDB On Ubuntu Operating System?

How To Install MongoDB on Mac Operating System?

How To Create User In MongoDB?

Apache HBase

Apache Oozie

Hadoop Interview Questions

Career Guidance

Big Data

How to Set Up Hadoop Cluster with HDFS High Availability

HDFS 2.x High Availability Cluster Architecture

Introduction:

NameNode Availability:

HDFS HA Architecture:

Implementation of HA Architecture:

1. Using Quorum Journal Nodes:

Fencing of NameNode:

2. Using Shared Storage:

Automatic Failover:

Setting Up and Configuring High Availability Cluster in Hadoop:

Recommended videos for you

Ways to Succeed with Hadoop in 2015

New-Age Search through Apache Solr

HBase Tutorial – A Complete Guide On Apache HBase

Introduction to Apache Solr-1

Python for Big Data Analytics

Big Data Processing with Spark and Scala

Big Data Tutorial – Get Started With Big Data And Hadoop

Hadoop Tutorial – A Complete Tutorial For Hadoop

Improve Customer Service With Big Data

Logistic Regression In Data Science

Distributed Cache With MapReduce

5 Scenarios: When To Use & When Not to Use Hadoop

Apache Spark Redefining Big Data Processing

Power of Python With BigData

What is Apache Storm all about?

Advanced Security In Hadoop Cluster

Is Hadoop A Necessity For Data Science?

What Is Hadoop – All You Need To Know About Hadoop

Apache Spark Will Replace Hadoop ! Know Why

Hadoop Architecture – Hadoop Tutorial on HDFS Architecture

Recommended blogs for you

DynamoDB vs MongoDB: Which One Meets Your Business Needs Better?

ELK Stack Tutorial – Discover, Analyze And Visualize Your Data Efficiently

Spark GraphX Tutorial – Graph Analytics In Apache Spark

Introduction of Hadoop Architecture

DBInputFormat to Transfer Data From SQL to NoSQL Database

MapReduce Tutorial – Fundamentals of MapReduce with MapReduce Example

Big Data Applications in Healthcare

Apache Spark Ecosystem

Big Data Tutorial: All You Need To Know About Big Data!

Splunk Lookup and Fields: Splunk Knowledge Objects

Hadoop YARN Tutorial – Learn the Fundamentals of YARN Architecture

Top Apache Kafka Interview Questions To Prepare In 2025

Introduction to Apache Hive

Implementing Hadoop & R Analytic Skills in Banking Domain

What is a Data Engineer? – A Comprehensive Guide

Azure Synapse: Unlocking the Power of Your Data

Drilling Down On Apache Drill, The New-Age Query Engine (Part 2)

What is Scala? A Complete Guide to Scala Programming

Splunk vs. ELK vs. Sumo Logic: Which Works Best For You?

Oozie Tutorial: Learn How to Schedule your Hadoop Jobs

Join the discussionCancel reply

Trending Courses in Big Data

Microsoft Azure Data Engineering Training Cou ...

Microsoft Fabric Data Engineer Associate Trai ...

PySpark Certification Training Course

Apache Kafka Certification Training Course

Big Data Hadoop Certification Training Course

Applied Data Engineering on Azure Cloud Cours ...

Splunk Certification Training: Power User and ...