Raw data is the data that hasn’t been processed. It’s also called source data or atomic data. It is basically unstructured or unformatted repository data. It can be in form of files, images or database records. How does raw data look like? A table with rows and columns? Maybe, but this isn’t the case all the time. Lets again understand this better with the Netflix example. As mentioned earlier, there are hundreds of files associated with each episode. These files contain the records of the views that episodes have gotten from different regions and networks and over different time intervals. These files might also contain corrupt, inaccurate, irrelevant or even redundant data which aren’t required. Extracting the right amount of data from each file and representing it in form of tables with the exact intention of using it, is something we call data cleaning.