I can help you on this one.
Requirement: 1 master 3 slaves (installation of hadoop setup on multiple node cluster)
Step 1: Get rid of windows. Currently Hadoop is available for Linux machines. You can have ubuntu 14.04 or later versions (or CentOS, Redhat etc)
Step 2: Install and setup Java $ sudo apt-get install python-software-properties $ sudo add-apt-repository ppa:ferramroberto/java $ sudo apt-get update $ sudo apt-get install sun-java6-jdk
# Select Sun's Java as the default on your machine.
# See 'sudo update-alternatives --config java' for more information.
#
$ sudo update-java-alternatives -s java-6-sun
Step 3: Set the path in .bashrc file (open this file using text editor(vi/nano) and append the below text)
export JAVA_HOME=/usr/local/jdk1.7.0_71
export PATH=PATH:$JAVA_HOME/bin
Step 4: Add a dedicated user (While that’s not required it is recommended)
# useradd hadoop
# passwd hadoop
Step 5: Edit hosts file in /etc/ folder on all nodes, specify the IP address of each system followed by their host names.( open the file in using vi /etc/hosts and append the text below --
<ip address of master node> hadoop-master
<ip address of slave node 1> hadoop-slave-1
<ip address of slave node 2> hadoop-slave-2
<ip address of slave node 3> hadoop-slave-3
Step 6: Setup ssh in every node such that they can communicate with one another without any prompt for password.
$ su hadoop
$ ssh-keygen -t rsa
$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@hadoop-master
$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop_tp1@hadoop-slave-1
$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop_tp2@hadoop-slave-2
$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop_tp3@hadoop-slave-3
$ chmod 0600 ~/.ssh/authorized_keys
$ exit
Step 7: In master server download and install Hadoop.
# mkdir /opt/hadoop
# cd /opt/hadoop/
# wget
# tar -xzf hadoop-1.2.0.tar.gz
# mv hadoop-1.2.0 hadoop
# chown -R hadoop /opt/hadoop
# cd /opt/hadoop/hadoop/
Installation is finished here!
Next step is : Configuring Hadoop
Step 1: Open core-site.xml and edit it as below :
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://hadoop-master:9000/</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
Step 2: open hdfs-site.xml and edit it as below :
<configuration>
<property>
<name>dfs.data.dir</name>
<value>/opt/hadoop/hadoop/dfs/name/data</value>
<final>true</final>
</property>
<property>
<name>dfs.name.dir</name>
<value>/opt/hadoop/hadoop/dfs/name</value>
<final>true</final>
</property>
<property>
<name>dfs.name.dir</name>
<value>/opt/hadoop/hadoop/dfs/name</value>
<final>true</final>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
Step 3: open mapred-site.xml and edit --
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>hadoop-master:9001</value>
</property>
</configuration>
Step 4: Append below text in hadoop-env.sh
export JAVA_HOME=/opt/jdk1.7.0_17 export
HADOOP_OPTS=Djava.net.preferIPv4Stack=true export
HADOOP_CONF_DIR=/opt/hadoop/hadoop/conf
Step 5: Configure master --
$ vi etc/hadoop/masters
hadoop-master
Step 5: Install it on slave nodes as well --
# su hadoop
$ cd /opt/hadoop
$ scp -r hadoop hadoop-slave-1:/opt/hadoop
$ scp -r hadoop hadoop-slave-2:/opt/hadoop
$ scp -r hadoop hadoop-slave-3:/opt/hadoop
Step 6: Configure slaves --
$ vi etc/hadoop/slaves
hadoop-slave-1
hadoop-slave-2
hadoop-slave-3
Step 7: format the nodes (ONLY ONE TIME OTHERWISE ALL THE DATA WILL BE LOST PERMANENTLY)
# su hadoop
$ cd /opt/hadoop/hadoop
$ bin/hadoop namenode –format
You are all set!!
You can start the services as follows --
$ cd $HADOOP_HOME/sbin
$ start-all.s