I understood your issue.
Let me help you with a tool that can do the job for you.
- DistCp is a tool named as a distributed copy which is used to copy data in between clusters.
- This tool basically uses MapReduce in the background to do the job.
- This tool can also manage the jobs of Data Recovery, Error Handling, and Report generation.
Usage Method:
$ hadoop distcp <src> <dst>
Example:
$ hadoop distcp hdfs://loc1:8020/file-x hdfs://loc2:8020/file-y
Here file -x from loc1 HDFS is copied to file-y in loc2 of the second HDFS.
There are two versions of this software which are available, Namely: