With MR2, now we should set
- conf.set("mapreduce.map.output.compress", true)
- conf.set("mapreduce.output.fileoutputformat.compress", false)
mapred.compress.map.output: Is the compression of data between the mapper and the reducer. If you use snappy codec this will most likely increase read write speed and reduce network overhead. Don't worry about spitting here. These files are not stored in hdfs. They are temp files that exist only for the map reduce job.