Eclipse is already installed in the Edureka VM. When you start the VM, you will be able to see Eclipse on the Desktop. Double click on it to launch it.
Here are the steps:
1) Write the program in eclipse. In this program, the package name is co.edureka and the class name is WordCount.
2)Create the jar file of this program and the name of the jar file is WordCount.jar and then uploaded it to FTP on My Lab
3) Upload the dataset on hdfs. We can do this by first uploading the dataset on FTP and then transferring it to hdfs.
Upload the dataset on FTP:
Transfer it to hdfs by the below command:
hdfs dfs -put wordcountproblem
4) Now, we have our jar file in FTP (local file system) and our dataset in hdfs. So, now we will execute the hadoop jar command.
The syntax for hadoop jar command is as below:
hadoop jar jarfilename.jar packagename.classname inputfilename outputdirectoryname
For our program, we will execute the below hadoop jar command:
hadoop jar WordCount.jar co.edureka.WordCount wordcountproblem WordCountOutput3
5) Check the output with the below command. The syntax to check the output is as below:
hdfs dfs -cat outputdirectoryname/part-r-00000
We will check the output of our program with the below command:
hdfs dfs -cat WordCountOutput3/part-r-00000