I would recommend you to install Cent OS on all of your machines
I am giving you an example of two machines (master and slave) with IP:
Master IP: 192.168.56.102
Slave IP: 192.168.56.103
STEP 1: Check the IP address of all machines.
Command: ip addr show (you can use the ifconfig command as well)
STEP 2: Disable the firewall restrictions.
Command: service iptables stop
Command: sudo chkconfig iptables off
STEP 3: Open hosts file to add master and data node with their respective IP addresses.
Command: sudo nano /etc/hosts
192.168.56.102 master
192.168.56.103 slave1
Same properties will be displayed in the master and slave hosts files.
STEP 4: Restart the sshd service.
Command: service sshd restart
STEP 5: Create the SSH Key in the master node.
Command: ssh-keygen -t rsa -P “”
STEP 6: Copy the generated ssh key to master node’s authorized keys.
Command: cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
STEP 7: Copy the master node’s ssh key to slave’s authorized keys.
Command: ssh-copy-id -i $HOME/.ssh/id_rsa.pub hadoop@slave
STEP 8: Download the Java 8 Package. Save this file in your home directory.
STEP 9: Extract the Java Tar File on all nodes.
Command: tar -xvf jdk-8u101-linux-i586.tar.gz
STEP 10: Download the Hadoop 2.7.3 Package on all nodes.
Command: wget https://archive.apache.org/dist/hadoop/core/hadoop-2.7.3/hadoop-2.7.3.tar.gz
STEP 11: Extract the Hadoop tar File on all nodes.
Command: tar -xvf hadoop-2.7.3.tar.gz
STEP 12: Add the Hadoop and Java paths in the bash file (.bashrc) on all nodes.
Open. bashrc file. Now, add Hadoop and Java Path as shown below:
Command: sudo gedit .bashrc
export HADOOP_HOME=/usr/lib/hadoop-2.7.3
export HADOOP_CONF_DIR=/usr/lib/hadoop-2.7.3/etc/hadoop
export HADOOP_MAPRED_HOME=/usr/lib/hadoop-2.7.3
export HADOOP_COMMON_HOME=/usr/lib/hadoop-2.7.3
export HADOOP_HDFS_HOME=/usr/lib/hadoop-2.7.3
export YARN_HOME=/usr/lib/hadoop-2.7.3
export PATH=$PATH:/usr/lib/hadoop-2.7.3/bin
export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_144
export PATH=$JAVA_HOME/bin:$PATH
Then, save the bash file and close it.
For applying all these changes to the current Terminal, execute the source command.
Command: source .bashrc
To make sure that Java and Hadoop have been properly installed on your system and can be accessed through the Terminal, execute the java -version and hadoop version commands.
Command: java -version
Command: hadoop version
STEP 13: Create masters file and edit in both master and slave machines in /etc/Hadoop directory:
Command: sudo gedit masters
STEP 14: Edit slaves file in master machine as follows:
Command: sudo gedit /home/hadoop/hadoop-2.7.3/etc/hadoop/slaves
STEP 15: Edit slaves file in slave machine as follows:
Command: sudo gedit /home/hadoop/hadoop-2.7.3/etc/hadoop/slaves
STEP 16: Edit core-site.xml on both master and slave machines as follows:
Command: sudo gedit /home/hadoop/hadoop-2.7.3/etc/hadoop/core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
</configuration>
STEP 7: Edit hdfs-site.xml on master as follows:
Command: sudo gedit /home/hadoop/hadoop-2.7.3/etc/hadoop/hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/hadoop-2.7.3/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/hadoop-2.7.3/datanode</value>
</property>
</configuration>
STEP 18: Edit hdfs-site.xml on slave machine as follows:
Command: sudo gedit /home/hadoop/hadoop-2.7.3/etc/hadoop/hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/hadoop-2.7.3/datanode</value>
</property>
</configuration>
STEP 19: Copy mapred-site from the template in configuration folder and the edit mapred-site.xml on both master and slave machines as follows:
Command: cp mapred-site.xml.template mapred-site.xml
Command: sudo gedit /home/hadoop/hadoop-2.7.3/etc/hadoop/mapred-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration> |
STEP 20: Edit yarn-site.xml on both master and slave machines as follows:
Command: sudo gedit /home/hadoop/hadoop-2.7.3/etc/hadoop/yarn-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration> |
STEP 21: Format the namenode (Only on master machine).
Command: hadoop namenode -format
STEP 22: Start all daemons (Only on master machine).
Command: ./sbin/start-all.sh
STEP 23: Check all the daemons running on both master and slave machines.
Command: jps
At last, open the browser and go to master:50070/dfshealth.html on your master machine, this will give you the NameNode interface.