File format is just a way to define how information is stored in the HDFS file system. File format should be well defined and expressive.
For Example, images have several common storage formats like PNG, JPG, GIF. All three of these can store the same image but each of the formats has specific characteristics. For example, JPG files tend to be smaller.
When we deal with Hadoop file system, like other file system the format of the file we can store in HDFS is entirely up to us. In Hadoop's file system not only we have traditional storage format (like JPG, PNG images ) but we also have some Hadoop-focused file formats to use for structured and unstructured data.
Some common storage formats for Hadoop are:
- Plain Text format (CSV)
- Sequence File input format
- Row-Column format