Follow the below steps to execute the WordCount program,
1) Write the program in eclipse
2)Create the jar file of this program and the name of the jar file is WordCount.jar
3) Upload the dataset on hdfs.
hdfs dfs -put wordcountproblem
4) Now, we have our jar file in hdfs. So, now we will execute the hadoop jar command.
The syntax for hadoop jar command is as below,
hadoop jar jarfilename.jar packagename.classname inputfilename outputdirectoryname
For our program, we will execute the below hadoop jar command,
hadoop jar WordCount.jar co.hduser.WordCount wordcountproblem WordCountOutput3
5) Check the output with the below command,
The syntax to check the output is as below,
hdfs dfs -cat outputdirectoryname/part-r-00000
We will check the output of our program with the below command,
hdfs dfs -cat WordCountOutput3/part-r-00000