1. In order to merge two or more files into one single file and store it in hdfs, you need to have a folder in the hdfs path containing the files that you want to merge.
Here, I am having a folder namely merge_files which contains the following files that I want to merge
Then you can execute the following command to the merge the files and store it in hdfs:
hadoop fs -cat /user/edureka_425640/merge_files/* | hadoop fs -put - /user/edureka_425640/merged_file s
The merged_files folder need not be created manually. It is going to be created automatically to store your output when you are using the above command. You can view your output using the following command. Here my merged_files is storing my output result.
hadoop fs -cat merged_files
Supposing we have a folder with multiple empty files and some non-empty files and if we want to delete the files that are empty, we can use the below command:
hdfs dfs -rm $(hdfs dfs -ls -R /user/A/ | grep -v "^d" | awk '{if ($5 == 0) print $8}')
Here I am having a folder, temp_folder with three files, 2 being empty and 1 file is nonempty. Please refer to the screenshot below: