Introduction to Apache MapReduce and HDFS

Karthik Mannepalli says:
Jul 1, 2016 at 4:39 am GMT
I am sorry, typo in my previous question :
If the assumption is Write Once only and Read many times, does it mean, we cannot use HDFS for transactional data?
Reply
Karthik Mannepalli says:
Jul 1, 2016 at 4:38 am GMT
If the assumption is Write Once only and Read many times, does it mean, we can use HDFS for transactional data?
Reply
Khalid says:
Jun 1, 2016 at 4:23 am GMT
I expected to see here concise discussions on HDFS components: Namenode, Datanode and Secondary Namenode, but there isn’t.
Reply
Khalid says:
Jun 1, 2016 at 4:11 am GMT
Under 5. Data Replication and Fault Tolerance, it is pointed out the default HDFS block size being 64 MB. This is in fact true with Hadoop 1.x, but since Hadoop 2.0 it’s been 128 MB. This blog was posted in May 2013 and apparently have not been updated since. So, I guess it’d be good if it was updated.
Reply
Kumar says:
Feb 4, 2015 at 7:47 am GMT
Hi edureka, I want some resume formats for hadoop developer. Please forward if ur having that. Im new to this technology.
this is my mail id: akumarhadoop@gmail.com
Thanks in advance.
Kumar
Reply
- EdurekaSupport says:
  Feb 9, 2015 at 11:55 am GMT
  Hi Kumar, the sample resumes will be shared with you, by our support team only after you enroll for our ‘Big Data & Hadoop’ course.
  Reply
  - Kushal says:
    Jun 22, 2016 at 5:20 am GMT
    Hi edureka team, Please share some resume formats for hadoop developer that relate to Big Data course. You can send me at kbvprasad@gmail.com ; Kushal.alester@gmail.com
    @EdurekaSupport:disqus
    Reply
Abhishek says:
Jan 27, 2015 at 2:07 pm GMT
Can you please elaborate point #3?
“As HDFS is designed more for batch processing rather than interactive
use by users. The emphasis is on high throughput of data access rather
than low latency of data access. HDFS focuses not so much on storing the
data but how to retrieve it at the fastest possible speed, especially
while analyzing logs. In HDFS, reading the complete data is more
important than the time taken to fetch a single record from the data.”
Reply
- EdurekaSupport says:
  Jul 6, 2015 at 6:42 am GMT
  Hi Abhishek, batch processing is a technique which helps us to process the jobs without any manual information after submitting the job with required information ( input, program name) . It keeps a track of jobs submitted and executes them in first come first serve fashion.
  In Interactivity mode, User uses an interface to interact with system. It take the inputs from the user and output the result to the user using an
  interface.
  In Hadoop, once the job is submitted it takes the inputs and stores the results from/to the location we have given in the command. Hence
  we call it as batch processing.
  Throughput is nothing but the number of processed completed in a unit amount of time whereas Latency is the delay from the time we submit the job and get the desired outcome.
  In Hadoop, we concentrate on increasing the throughput than decreasing the latency while processing a job as we need to retrieve the output at fast possible speed irrespective of size of data.
  Hope this helps!
  Reply
  - Karthik Mannepalli says:
    Jul 1, 2016 at 4:50 am GMT
    @EdurekaSupport – Doesn’t increasing throughput reduce the latency? Both will go hand in hand right? Please correct me if I am wrong
    Reply
Dr M. NAGABHUSHANA RAO says:
Jan 22, 2015 at 8:50 am GMT
Nice to see edureka blog, edureak is trying to spread knowledge on big data more. thank’s to it’s team for hardworking.
Reply
- EdurekaSupport says:
  Jan 22, 2015 at 11:43 am GMT
  Thanks a lot, Dr. Rao. Please feel free to go through our other blog posts as well.
  Reply
Deepak Sharma says:
Jan 18, 2015 at 12:44 pm GMT
Could you please elaborate on point #7 a bit more?
and also the line “Apache HDFS provides interfaces for applications to relocate themselves nearer to where the data is located”
Reply
- EdurekaSupport says:
  Jan 19, 2015 at 8:13 am GMT
  Hi Deepak,
  Let us assume that we have a submitted a job and now jobtracker need to choose to which tasktracker node the job need to be allocated.
  While assigning this job to the tasktracker, the jobtracker first finds out on which nodes the data resides and checks whether if that nodes are available to run the job/task. If yes, then it will assign the task to that
  tasktracker nodes and then transfer the computed results to the other
  nodes whichever are required. If not, it will assign that task to the
  tasktracker nodes which are nearest to the nodes where the data resides. The reason why jobtracker tries to assign to the nodes
  where the data resides because as the data in HDFS will be huge, it
  may consume more amount of time due to network congestion/any other issues just to transfer the data instead of actual computation (the
  actual thing which is important/required). Hence it is better to move the computed results ( less data) instead of the actual data ( huge data).
  Hope this help!!!
  Reply
Sushobhit Rajan says:
Jul 12, 2014 at 3:50 pm GMT
Nicely Explained
Reply
- EdurekaSupport says:
  Jul 24, 2014 at 1:45 pm GMT
  Thanks Sushobhit!!! Feel free to go through our other blog posts as well.
  Reply
  - srini says:
    Mar 29, 2019 at 5:37 am GMT
    Hi Team,
    Can you please share the sample Hadooop Resumes, i laredy enrolled, Please share to my mail ID : srenivas35@gmail.com
    Reply
Gaurav Dighe says:
Jan 23, 2014 at 12:05 pm GMT
Very nice information information about Hadoop. Keep up the good work.
Hope to see some more topics on DataFlow, Map Reduce.
Reply

Introduction to Apache MapReduce and HDFS

What is HDFS (Hadoop Distributed File System)?

Assumptions and Goals/Objectives behind HDFS:

1. Large Data Sets:

2. Write Once, Read Many Model:

3. Streaming Data Access:

4. Commodity Hardware:

5. Data Replication and Fault Tolerance:

6. High Throughput:

7. Moving Computation is better than Moving Data:

8. File System Namespace:

Recommended videos for you

Tailored Big Data Solutions Using MapReduce Design Patterns

Bulk Loading Into HBase With MapReduce

Apache Spark Redefining Big Data Processing

Introduction to Big Data TDD and Pig Unit

MapReduce Design Patterns – Application of Join Pattern

Hadoop Tutorial – A Complete Tutorial For Hadoop

Advanced Security In Hadoop Cluster

Improve Customer Service With Big Data

Apache Spark Will Replace Hadoop ! Know Why

5 Things One Must Know About Spark

Hive Tutorial – Understanding Hive In Depth

What Is Hadoop – All You Need To Know About Hadoop

Apache Kafka With Spark Streaming: Real-Time Analytics Redefined

Apache Spark For Faster Batch Processing

Distributed Cache With MapReduce

Filtering on HBase Using MapReduce Filtering Pattern

Spark SQL | Apache Spark

Introduction to Apache Solr-1

Top Hadoop Interview Questions and Answers – Ace Your Interview

Streaming With Apache Spark and Scala

Recommended blogs for you

How to Set Up Hadoop Cluster with HDFS High Availability

PySpark Dataframe Tutorial – PySpark Programming with Dataframes

Commissioning and Decommissioning Nodes in a Hadoop Cluster

How to Create a Pipeline in Azure Data Factory Step-by-Step

Hive and Yarn Examples on Spark

Switching Careers: From Java to Big Data / Hadoop

PySpark CheatSheet: Spark RDD with Python

What is a Data Engineer? – A Comprehensive Guide

Splunk Careers – Your Pathway To Hot Big Data Jobs

What is Hadoop? Introduction to Big Data & Hadoop

Apache Spark Ecosystem

Hadoop Admin Responsibilities

Apache Hadoop : Create your First HIVE Script

What is SAP HANA?

Big Data Testing: A Perfect Guide You Need to Follow

Real Time Storm Project

How To Create User In MongoDB?

What Is Elasticsearch – Getting Started With No Constraints Search Engine

Map Side Join Vs. Join

Big Prospects for Big Data

Join the discussionCancel reply

Trending Courses in Big Data

Microsoft Azure Data Engineering Training Cou ...

Microsoft Fabric Data Engineer Associate Trai ...

PySpark Certification Training Course

Apache Kafka Certification Training Course

Big Data Hadoop Certification Training Course

Applied Data Engineering on Azure Cloud Cours ...

Splunk Certification Training: Power User and ...

ELK Stack Training & Certification

Apache Spark and Scala Certification Training ...

Big Data Hadoop Administration Certification ...

Browse Categories

Subscribe to our Newsletter, and get personalized recommendations.

Introduction to Apache MapReduce and HDFS