Help m setting up a multi node hadoop cluster

Question

I am a fresher to Big Data systems having completed a few Coursera certifications. I plan to have my own personal Hadoop cluster using 4 commodity grade PCs. at present all run Windows, but I am ok to install Linux on them. I searched a lot on the internet for the setup process but found none(found many to spin on AWS). At this time, I am not restricted to any platform but would like all the tech to be free \ open source. With 4 PCs I can have 1 master node and other 3 data nodes. Would appreciate detailed steps (at least the broad contours) on how to spin this bare metal Hadoop cluster

ravikiran · Answer 1 · Jun 20, 2019

I can help you on this one.

Requirement: 1 master 3 slaves (installation of hadoop setup on multiple node cluster)

Step 1: Get rid of windows. Currently Hadoop is available for Linux machines. You can have ubuntu 14.04 or later versions (or CentOS, Redhat etc)

Step 2: Install and setup Java $ sudo apt-get install python-software-properties $ sudo add-apt-repository ppa:ferramroberto/java $ sudo apt-get update $ sudo apt-get install sun-java6-jdk

# Select Sun's Java as the default on your machine.
# See 'sudo update-alternatives --config java' for more information.    
#
$ sudo update-java-alternatives -s java-6-sun

Step 3: Set the path in .bashrc file (open this file using text editor(vi/nano) and append the below text)

export JAVA_HOME=/usr/local/jdk1.7.0_71
export PATH=PATH:$JAVA_HOME/bin

Step 4: Add a dedicated user (While that’s not required it is recommended)

# useradd hadoop 
# passwd hadoop

Step 5: Edit hosts file in /etc/ folder on all nodes, specify the IP address of each system followed by their host names.( open the file in using vi /etc/hosts and append the text below --

<ip address of master node> hadoop-master 
<ip address of slave node 1> hadoop-slave-1 
<ip address of slave node 2> hadoop-slave-2
<ip address of slave node 3> hadoop-slave-3

Step 6: Setup ssh in every node such that they can communicate with one another without any prompt for password.

$ su hadoop
$ ssh-keygen -t rsa 
$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@hadoop-master 
$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop_tp1@hadoop-slave-1 
$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop_tp2@hadoop-slave-2
$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop_tp3@hadoop-slave-3
$ chmod 0600 ~/.ssh/authorized_keys 
$ exit

Step 7: In master server download and install Hadoop.

# mkdir /opt/hadoop 
# cd /opt/hadoop/ 
# wget 
# tar -xzf hadoop-1.2.0.tar.gz 
# mv hadoop-1.2.0 hadoop
# chown -R hadoop /opt/hadoop 
# cd /opt/hadoop/hadoop/

Installation is finished here!

Next step is : Configuring Hadoop

Step 1: Open core-site.xml and edit it as below :

<configuration>
<property> 
  <name>fs.default.name</name> 
  <value>hdfs://hadoop-master:9000/</value> 
</property> 
<property> 
  <name>dfs.permissions</name> 
  <value>false</value> 
</property> 
</configuration>

Step 2: open hdfs-site.xml and edit it as below :

<configuration>
<property> 
  <name>dfs.data.dir</name> 
  <value>/opt/hadoop/hadoop/dfs/name/data</value> 
  <final>true</final> 
</property> 

<property> 
  <name>dfs.name.dir</name> 
  <value>/opt/hadoop/hadoop/dfs/name</value> 
  <final>true</final> 
</property> 
 <property> 
  <name>dfs.name.dir</name> 
  <value>/opt/hadoop/hadoop/dfs/name</value> 
  <final>true</final> 
</property> 

<property> 
  <name>dfs.replication</name> 
  <value>3</value> 
</property> 
</configuration>

Step 3: open mapred-site.xml and edit --

<configuration>
<property> 
  <name>mapred.job.tracker</name> 
  <value>hadoop-master:9001</value> 
</property> 
</configuration>

Step 4: Append below text in hadoop-env.sh

export JAVA_HOME=/opt/jdk1.7.0_17 export 
HADOOP_OPTS=Djava.net.preferIPv4Stack=true export 
HADOOP_CONF_DIR=/opt/hadoop/hadoop/conf

Step 5: Configure master --

$ vi etc/hadoop/masters 
hadoop-master

Step 5: Install it on slave nodes as well --

# su hadoop 
$ cd /opt/hadoop 
$ scp -r hadoop hadoop-slave-1:/opt/hadoop 
$ scp -r hadoop hadoop-slave-2:/opt/hadoop
$ scp -r hadoop hadoop-slave-3:/opt/hadoop

Step 6: Configure slaves --

$ vi etc/hadoop/slaves
hadoop-slave-1 
hadoop-slave-2
hadoop-slave-3

Step 7: format the nodes (ONLY ONE TIME OTHERWISE ALL THE DATA WILL BE LOST PERMANENTLY)

# su hadoop 
$ cd /opt/hadoop/hadoop 
$ bin/hadoop namenode –format

You are all set!!

You can start the services as follows --

$ cd $HADOOP_HOME/sbin
$ start-all.s

Help m setting up a multi node hadoop cluster

Your comment on this question:

1 answer to this question.

Your answer

Your comment on this answer:

Related Questions In Big Data Hadoop

Setting up Hadoop Multi-node cluster on windows 10 systems

How to install and configure a multi-node Hadoop cluster?

Hadoop single node cluster set up issues

How to retrieve the list of sql (Hive QL) commands that has been executed in a hadoop cluster?

How do I verify the kernel parameters to check for the pre-requisities of Oracle 11 + versions?

I get an error stating- Hadoop: «ERROR : JAVA_HOME is not set». How to resolve this?

How to run Map Reduce program using Ubuntu terminal?

Copy files to all Hadoop DFS directories

Can anyone help me in installing and configuring a Multi-Node Hadoop Cluster?

Can anyone help me with the installation and configuration procedures of Hadoop Multi-Node Cluster?

Subscribe to our Newsletter, and get personalized recommendations.

TRENDING CERTIFICATION COURSES

TRENDING MASTERS COURSES

COMPANY

WORK WITH US

DOWNLOAD APP

CATEGORIES

CATEGORIES

TRENDING BLOG ARTICLES

TRENDING BLOG ARTICLES