Mapreduce in Python

–1 vote
Hey. I am learning hadoop and I am going through the concepts of mapreduce. So far, I have understood the concepts of mapreduce and I have also run the mapreduce code in Java. But I am actually interested in Python scripting. But I dont know how to do mapreduce task in python. Can someone share a sample code?
Dec 21, 2018 in Big Data Hadoop by digger
• 26,740 points
1,052 views

1 answer to this question.

0 votes

mapper.py

#!/usr/bin/python
import sys
#Word Count Example
# input comes from standard input STDIN
for line in sys.stdin:
line = line.strip() #remove leading and trailing whitespaces
words = line.split() #split the line into words and returns as a list
for word in words:
#write the results to standard output STDOUT
print'%s\t%s' % (word,1) #Emit the word


reducer.py

#!/usr/bin/python
import sys
from operator import itemgetter
# using a dictionary to map words to their counts
current_word = None
current_count = 0
word = None
# input comes from STDIN
for line in sys.stdin:
line = line.strip()
word,count = line.split('\t',1)
try:
count = int(count)
except ValueError:
continue
if current_word == word:
current_count += count
else:
if current_word:
print '%s\t%s' % (current_word, current_count)
current_count = count
current_word = word
if current_word == word:
print '%s\t%s' % (current_word,current_count)
answered Dec 21, 2018 by Omkar
• 69,220 points

Related Questions In Big Data Hadoop

0 votes
1 answer

Error running hadoop mapreduce in Python using Hadoop Streaming

Hi As you write mapper and reducer program  ...READ MORE

answered Jan 21, 2020 in Big Data Hadoop by anonymous
2,543 views
0 votes
1 answer

How to include third party library in Python MapReduce?

Problem has been solved by zipimport. Then I zip chardet to ...READ MORE

answered Nov 27, 2018 in Big Data Hadoop by Frankie
• 9,830 points
803 views
0 votes
1 answer

How to use custom FileInputFormat in MapReduce?

You have to override isSplitable method. ...READ MORE

answered Apr 10, 2018 in Big Data Hadoop by Shubham
• 13,490 points
1,212 views
0 votes
1 answer
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
11,028 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
2,536 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
108,831 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,350 points
4,611 views
0 votes
1 answer

In Hadoop MapReduce, how can i set an Object as the Value for Map output?

Try this and see if it works: public ...READ MORE

answered Nov 21, 2018 in Big Data Hadoop by Omkar
• 69,220 points
975 views
+1 vote
1 answer

How to write file in hdfs using python?

#!/usr/bin/python from subprocess import Popen, PIPE cat = Popen(["hadoop", ...READ MORE

answered Dec 6, 2018 in Big Data Hadoop by Omkar
• 69,220 points
8,427 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP