AWS Four node cluster on Hadoop

Question

To deploy a 4 node cluster of Hadoop in AWS which instance type can be used?

Meci Matt · Answer 1 · May 31, 2018

First let’s understand what actually happens in a Hadoop cluster, the Hadoop cluster follows a master slave concept. The master machine processes all the data, slave machines store the data and act as data nodes. Since all the storage happens at the slave, a higher capacity hard disk would be recommended and since master does all the processing, a higher RAM and a much better CPU is required. Therefore, you can select the configuration of your machine depending on your workload. For e.g. – In this case c4.8xlarge will be preferred for master machine whereas for slave machine we can select i2.large instance. If you don’t want to deal with configuring your instance and installing hadoop cluster manually, you can straight away launch an Amazon EMR (Elastic Map Reduce) instance which automatically configures the servers for you. You dump your data to be processed in S3, EMR picks it from there, processes it, and dumps it back into S3.

answered May 31, 2018 by Meci Matt
• 9,460 points

Priyaj · Answer 2 · Aug 21, 2018

Follow the following step one by one and you are good to go:-

Install Java And Hadoop

$ sudo apt-get update && sudo apt-get dist-upgrade

Install OpenJDK

Installing latest java

$ sudo apt-get install openjdk-8-jdk

Installing Hadoop

Download Hadoop from one of these mirrors. Select appropriate version number. Below command will download gzip file and copies it to Downloads directory, which is created using -P paramter.

$ wget http://apache.mirrors.tds.net/hadoop/common/hadoop-2.8.1/hadoop-2.8.1.tar.gz -P ~/Downloads

We will now try to extract it to /usr/local.

$ sudo tar zxvf ~/Downloads/hadoop-* -C /usr/local

Renaming the hadoop-* to hadoop under /usr/local directory.

$ sudo mv /usr/local/hadoop-* /usr/local/hadoop

Setting up Environmental Variables

To know where the java is installed (where the java executable is), execute the below command. Path may be different for you.

Open .bashrc file in your home directory with your favorite editor. Include the below lines .

$ vi ~/.bashrc

For Java:

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export PATH=$PATH:$JAVA_HOME/bin

For Hadoop:

export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin

For Hadoop Configuration directory:

export HADOOP_CONF_DIF=/usr/local/hadoop/etc/hadoop

for further steps follow:-

https://medium.com/@jeevananandanne/setup-4-node-hadoop-cluster-on-aws-ec2-instances-1c1eeb4453bd

AWS Four node cluster on Hadoop

To deploy a 4 node cluster of Hadoop in AWS which instance type can be used?

Your comment on this question:

2 answers to this question.

Your answer

Your comment on this answer:

Your comment on this answer:

Related Questions In Cloud Computing

Is there a way to install apache drill on an EMR cluster on AWS that is already Running?

AWS S3 cli isn’t working on Windows server

Running JAR file on Amazon EMR created using Hadoop 2.7.5

How can install MongoDB on AWS?

How to upload files on aws elastic beanstalk?

Can we host website on AWS EFS

AWS: Can we Disable Redis Instance Swap on ElastiCache

AWS: Performance parameters when you launch instances in cluster placement group

AWS node JS: Creating AWS credential file

Can Java Enterprise Edition applications on AWS EC2

Subscribe to our Newsletter, and get personalized recommendations.

TRENDING CERTIFICATION COURSES

TRENDING MASTERS COURSES

COMPANY

WORK WITH US

DOWNLOAD APP

CATEGORIES

CATEGORIES

TRENDING BLOG ARTICLES

TRENDING BLOG ARTICLES