Introduction to Apache MapReduce and HDFS

Karthik Mannepalli says:
Jul 1, 2016 at 4:39 am GMT
I am sorry, typo in my previous question :
If the assumption is Write Once only and Read many times, does it mean, we cannot use HDFS for transactional data?
Reply
Karthik Mannepalli says:
Jul 1, 2016 at 4:38 am GMT
If the assumption is Write Once only and Read many times, does it mean, we can use HDFS for transactional data?
Reply
Khalid says:
Jun 1, 2016 at 4:23 am GMT
I expected to see here concise discussions on HDFS components: Namenode, Datanode and Secondary Namenode, but there isn’t.
Reply
Khalid says:
Jun 1, 2016 at 4:11 am GMT
Under 5. Data Replication and Fault Tolerance, it is pointed out the default HDFS block size being 64 MB. This is in fact true with Hadoop 1.x, but since Hadoop 2.0 it’s been 128 MB. This blog was posted in May 2013 and apparently have not been updated since. So, I guess it’d be good if it was updated.
Reply
Kumar says:
Feb 4, 2015 at 7:47 am GMT
Hi edureka, I want some resume formats for hadoop developer. Please forward if ur having that. Im new to this technology.
this is my mail id: akumarhadoop@gmail.com
Thanks in advance.
Kumar
Reply
- EdurekaSupport says:
  Feb 9, 2015 at 11:55 am GMT
  Hi Kumar, the sample resumes will be shared with you, by our support team only after you enroll for our ‘Big Data & Hadoop’ course.
  Reply
  - Kushal says:
    Jun 22, 2016 at 5:20 am GMT
    Hi edureka team, Please share some resume formats for hadoop developer that relate to Big Data course. You can send me at kbvprasad@gmail.com ; Kushal.alester@gmail.com
    @EdurekaSupport:disqus
    Reply
Abhishek says:
Jan 27, 2015 at 2:07 pm GMT
Can you please elaborate point #3?
“As HDFS is designed more for batch processing rather than interactive
use by users. The emphasis is on high throughput of data access rather
than low latency of data access. HDFS focuses not so much on storing the
data but how to retrieve it at the fastest possible speed, especially
while analyzing logs. In HDFS, reading the complete data is more
important than the time taken to fetch a single record from the data.”
Reply
- EdurekaSupport says:
  Jul 6, 2015 at 6:42 am GMT
  Hi Abhishek, batch processing is a technique which helps us to process the jobs without any manual information after submitting the job with required information ( input, program name) . It keeps a track of jobs submitted and executes them in first come first serve fashion.
  In Interactivity mode, User uses an interface to interact with system. It take the inputs from the user and output the result to the user using an
  interface.
  In Hadoop, once the job is submitted it takes the inputs and stores the results from/to the location we have given in the command. Hence
  we call it as batch processing.
  Throughput is nothing but the number of processed completed in a unit amount of time whereas Latency is the delay from the time we submit the job and get the desired outcome.
  In Hadoop, we concentrate on increasing the throughput than decreasing the latency while processing a job as we need to retrieve the output at fast possible speed irrespective of size of data.
  Hope this helps!
  Reply
  - Karthik Mannepalli says:
    Jul 1, 2016 at 4:50 am GMT
    @EdurekaSupport – Doesn’t increasing throughput reduce the latency? Both will go hand in hand right? Please correct me if I am wrong
    Reply
Dr M. NAGABHUSHANA RAO says:
Jan 22, 2015 at 8:50 am GMT
Nice to see edureka blog, edureak is trying to spread knowledge on big data more. thank’s to it’s team for hardworking.
Reply
- EdurekaSupport says:
  Jan 22, 2015 at 11:43 am GMT
  Thanks a lot, Dr. Rao. Please feel free to go through our other blog posts as well.
  Reply
Deepak Sharma says:
Jan 18, 2015 at 12:44 pm GMT
Could you please elaborate on point #7 a bit more?
and also the line “Apache HDFS provides interfaces for applications to relocate themselves nearer to where the data is located”
Reply
- EdurekaSupport says:
  Jan 19, 2015 at 8:13 am GMT
  Hi Deepak,
  Let us assume that we have a submitted a job and now jobtracker need to choose to which tasktracker node the job need to be allocated.
  While assigning this job to the tasktracker, the jobtracker first finds out on which nodes the data resides and checks whether if that nodes are available to run the job/task. If yes, then it will assign the task to that
  tasktracker nodes and then transfer the computed results to the other
  nodes whichever are required. If not, it will assign that task to the
  tasktracker nodes which are nearest to the nodes where the data resides. The reason why jobtracker tries to assign to the nodes
  where the data resides because as the data in HDFS will be huge, it
  may consume more amount of time due to network congestion/any other issues just to transfer the data instead of actual computation (the
  actual thing which is important/required). Hence it is better to move the computed results ( less data) instead of the actual data ( huge data).
  Hope this help!!!
  Reply
Sushobhit Rajan says:
Jul 12, 2014 at 3:50 pm GMT
Nicely Explained
Reply
- EdurekaSupport says:
  Jul 24, 2014 at 1:45 pm GMT
  Thanks Sushobhit!!! Feel free to go through our other blog posts as well.
  Reply
  - srini says:
    Mar 29, 2019 at 5:37 am GMT
    Hi Team,
    Can you please share the sample Hadooop Resumes, i laredy enrolled, Please share to my mail ID : srenivas35@gmail.com
    Reply
Gaurav Dighe says:
Jan 23, 2014 at 12:05 pm GMT
Very nice information information about Hadoop. Keep up the good work.
Hope to see some more topics on DataFlow, Map Reduce.
Reply

Introduction to Apache MapReduce and HDFS

What is HDFS (Hadoop Distributed File System)?

Assumptions and Goals/Objectives behind HDFS:

1. Large Data Sets:

2. Write Once, Read Many Model:

3. Streaming Data Access:

4. Commodity Hardware:

5. Data Replication and Fault Tolerance:

6. High Throughput:

7. Moving Computation is better than Moving Data:

8. File System Namespace:

Recommended videos for you

New-Age Search through Apache Solr

Apache Spark Redefining Big Data Processing

Distributed Cache With MapReduce

Introduction to Big Data TDD and Pig Unit

Hadoop-A Highly Available And Secure Enterprise Data Warehousing Solution

Tailored Big Data Solutions Using MapReduce Design Patterns

MapReduce Tutorial – All You Need To Know About MapReduce

Administer Hadoop Cluster

Pig Tutorial – Know Everything About Apache Pig Script

Real-Time Analytics with Apache Storm

MapReduce Design Patterns – Application of Join Pattern

Hadoop Cluster With High Availability

Streaming With Apache Spark and Scala

Improve Customer Service With Big Data

Webinar: Introduction to Big Data & Hadoop

Introduction to Apache Solr-1

What is Big Data and Why Learn Hadoop!!!

Is It The Right Time For Me To Learn Hadoop ? Find out.

Spark SQL | Apache Spark

Top Hadoop Interview Questions and Answers – Ace Your Interview

Recommended blogs for you

Setting Up A Multi Node Cluster In Hadoop 2.X

Splunk Knowledge Objects: Splunk Events, Event Types And Tags

Hadoop MapReduce Interview Questions In 2025

Apache Storm Use Cases

Introduction to Hadoop Job Tracker

Infographics: How Big is Big Data?

Apache Sqoop Tutorial – Import/Export Data Between HDFS and RDBMS

Zookeeper Tutorial: The Guide you need to Master Zookeeper

Big Data Analytics Tools and Technologies with key Features

Hadoop Interview Questions On HBase In 2025

Why Should a Data Warehouse Professional Move to Big Data Hadoop?

How to Create a Pipeline in Azure Data Factory Step-by-Step

Increasing Demand for ‘ Hadoop and NoSQL Skills ’

Big Data Engineer Resume Examples and Tips for 2025

Game Changing Big Data Use Cases

Drilling Down On Apache Drill, The New-Age Query Engine (Part 2)

Introduction to Spark with Python – PySpark for Beginners

How to become an Apache Spark Developer?

Apache Spark Ecosystem

Career Advantages of Hadoop Certification

Join the discussionCancel reply

Trending Courses in Big Data

Microsoft Azure Data Engineering Training Cou ...

PySpark Certification Training Course

Microsoft Fabric Data Engineer Associate Trai ...

Apache Kafka Certification Training Course

Big Data Hadoop Certification Training Course

Applied Data Engineering on Azure Cloud Cours ...

Splunk Certification Training: Power User and ...

Apache Spark and Scala Certification Training ...

ELK Stack Training & Certification

Comprehensive MapReduce Certification Trainin ...

Browse Categories

Subscribe to our Newsletter, and get personalized recommendations.

Introduction to Apache MapReduce and HDFS