How to install and configure a multi-node Hadoop cluster

Question

I am new to Big Data & Hadoop. I have 4 commodity grade PC, which I am planning for setting up a Multi Node Hadoop Cluster. I will install Linux on them. So, I would like to keep 1 master machine and 3 slave machines. Can you suggest me which Operating system should I use & how to setup a Hadoop multi node cluster using them?

Shubham · Answer 1 · Mar 22, 2018

I would recommend you to install Cent OS on all of your machines

I am giving you an example of two machines (master and slave) with IP:

Master IP: 192.168.56.102

Slave IP: 192.168.56.103

STEP 1: Check the IP address of all machines.

Command: ip addr show (you can use the ifconfig command as well)

STEP 2: Disable the firewall restrictions.

Command: service iptables stop

Command: sudo chkconfig iptables off

STEP 3: Open hosts file to add master and data node with their respective IP addresses.

Command: sudo nano /etc/hosts

192.168.56.102 master

192.168.56.103 slave1

Same properties will be displayed in the master and slave hosts files.

STEP 4: Restart the sshd service.

Command: service sshd restart

STEP 5: Create the SSH Key in the master node.

Command: ssh-keygen -t rsa -P “”

STEP 6: Copy the generated ssh key to master node’s authorized keys.

Command: cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

STEP 7: Copy the master node’s ssh key to slave’s authorized keys.

Command: ssh-copy-id -i $HOME/.ssh/id_rsa.pub hadoop@slave

STEP 8: Download the Java 8 Package. Save this file in your home directory.

STEP 9: Extract the Java Tar File on all nodes.

Command: tar -xvf jdk-8u101-linux-i586.tar.gz

STEP 10: Download the Hadoop 2.7.3 Package on all nodes.

Command: wget https://archive.apache.org/dist/hadoop/core/hadoop-2.7.3/hadoop-2.7.3.tar.gz

STEP 11: Extract the Hadoop tar File on all nodes.

Command: tar -xvf hadoop-2.7.3.tar.gz

STEP 12: Add the Hadoop and Java paths in the bash file (.bashrc) on all nodes.

Open. bashrc file. Now, add Hadoop and Java Path as shown below:

Command: sudo gedit .bashrc

export HADOOP_HOME=/usr/lib/hadoop-2.7.3

export HADOOP_CONF_DIR=/usr/lib/hadoop-2.7.3/etc/hadoop

export HADOOP_MAPRED_HOME=/usr/lib/hadoop-2.7.3

export HADOOP_COMMON_HOME=/usr/lib/hadoop-2.7.3

export HADOOP_HDFS_HOME=/usr/lib/hadoop-2.7.3

export YARN_HOME=/usr/lib/hadoop-2.7.3

export PATH=$PATH:/usr/lib/hadoop-2.7.3/bin

export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_144

export PATH=$JAVA_HOME/bin:$PATH

Then, save the bash file and close it.

For applying all these changes to the current Terminal, execute the source command.

Command: source .bashrc

To make sure that Java and Hadoop have been properly installed on your system and can be accessed through the Terminal, execute the java -version and hadoop version commands.

Command: java -version

Command: hadoop version

STEP 13: Create masters file and edit in both master and slave machines in /etc/Hadoop directory:

Command: sudo gedit masters

STEP 14: Edit slaves file in master machine as follows:

Command: sudo gedit /home/hadoop/hadoop-2.7.3/etc/hadoop/slaves

STEP 15: Edit slaves file in slave machine as follows:

Command: sudo gedit /home/hadoop/hadoop-2.7.3/etc/hadoop/slaves

STEP 16: Edit core-site.xml on both master and slave machines as follows:

Command: sudo gedit /home/hadoop/hadoop-2.7.3/etc/hadoop/core-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<name>fs.default.name</name>

<value>hdfs://master:9000</value>

</property>

</configuration>

STEP 7: Edit hdfs-site.xml on master as follows:
Command: sudo gedit /home/hadoop/hadoop-2.7.3/etc/hadoop/hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<name>dfs.replication</name>

</property>

<name>dfs.permissions</name>

<value>false</value>

</property>

<name>dfs.namenode.name.dir</name>

<value>/home/hadoop/hadoop-2.7.3/namenode</value>

</property>

<name>dfs.datanode.data.dir</name>

<value>/home/hadoop/hadoop-2.7.3/datanode</value>

</property>

</configuration>

STEP 18: Edit hdfs-site.xml on slave machine as follows:

Command: sudo gedit /home/hadoop/hadoop-2.7.3/etc/hadoop/hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<name>dfs.replication</name>

</property>

<name>dfs.permissions</name>

<value>false</value>

</property>

<name>dfs.datanode.data.dir</name>

<value>/home/hadoop/hadoop-2.7.3/datanode</value>

</property>

</configuration>

STEP 19: Copy mapred-site from the template in configuration folder and the edit mapred-site.xml on both master and slave machines as follows:

Command: cp mapred-site.xml.template mapred-site.xml

Command: sudo gedit /home/hadoop/hadoop-2.7.3/etc/hadoop/mapred-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<name>mapreduce.framework.name</name>

</property>

</configuration>

STEP 20: Edit yarn-site.xml on both master and slave machines as follows:

Command: sudo gedit /home/hadoop/hadoop-2.7.3/etc/hadoop/yarn-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>

<value>org.apache.hadoop.mapred.ShuffleHandler</value>

</property>

</configuration>

STEP 21: Format the namenode (Only on master machine).

Command: hadoop namenode -format

STEP 22: Start all daemons (Only on master machine).

Command: ./sbin/start-all.sh

STEP 23: Check all the daemons running on both master and slave machines.

Command: jps

At last, open the browser and go to master:50070/dfshealth.html on your master machine, this will give you the NameNode interface.

How to install and configure a multi-node Hadoop cluster

Your comment on this question:

1 answer to this question.

Your answer

Your comment on this answer:

Related Questions In Big Data Hadoop

Can anyone help me in installing and configuring a Multi-Node Hadoop Cluster?

How to retrieve the list of sql (Hive QL) commands that has been executed in a hadoop cluster?

How to delete a directory from Hadoop cluster which is having comma(,) in its name?

How to access different directories in a Hadoop cluster?

Hadoop Mapreduce word count Program

hadoop fs -put command?

Hadoop dfs -ls command?

Is there a way to copy data from one one Hadoop distributed file system(HDFS) to another HDFS?

How to run example codes of Hadoop Definitive Guide book?

How can I download hadoop documentation for a specific version?

Subscribe to our Newsletter, and get personalized recommendations.

TRENDING CERTIFICATION COURSES

TRENDING MASTERS COURSES

COMPANY

WORK WITH US

DOWNLOAD APP

CATEGORIES

CATEGORIES

TRENDING BLOG ARTICLES

TRENDING BLOG ARTICLES