Data cleaning is the fourth step in the analysis process and it is one of the most underrated steps. Data is not always ready after its processed. Every data has a lot of redundancies, incorrect and irrelevant data as mentioned earlier. This type of data is called dirty data. and Most of the real-world data sets extracted are dirty. It’s impossible to make any sort of analysis through it. Most statistical theories focus on data modeling, visualization and analysis assuming the data they’re using is always in the perfect format. That’s seldom the case. In practice, the time spent on preparing the data for analysis is the highest and considered one of the most tiring tasks.
https://www.edureka.co/community/30399/why-is-data-cleaning-needed?