The Pains of Dirty Data in the Enterprise

Dirty Data has become a BIG impediment to the true outcome of models, skewing insights to uncertainties beyond acceptable levels and a throbbing pain that will continue to haunt organizations and if not fixed; could be a path to its downfall. A simple example of a dirty data is the gender information for Sam Jeff in the illustration below;

Sam Jeff is obviously a man as seen in the illustration above, but he was referred to as ‘Mrs.’ based on the organization’s gender records. Quite disappointing to the customer who must have thought that his best service provider will have his correct data information. More so, the service representative was unable to recognize this error. I’m sure he would have thought to himself that since Sam Jeff’s gender was “Female’ in the company’s database, he might actually be a woman who prefers to look like a man. Funny right?! Yes! but not funny at all :(.

Imagine that an organization has millions and billions of wrongly mapped gender data and urgently needs to create a value proposition for a specific gender type to lead a gender-based market in its industry. This milestone will obviously be tough to achieve due to the inaccuracy and inconsistency of the data.

A SIMPLE DEFINITION

Dirty data is simply data containing wrong information. Wrong information comes in varied formats ranging from inaccuracy, incompleteness, duplicates and outdated. Let’s look at the varied formats of dirty data below;

WHOSE FAULT? THE BLAME GAME.

The first point of entry of data is critical to how the data will turn out to be when retrieved for analysis and insights. A recent research has shown that human error is responsible for the 59% majority of inaccurate data in many organizations.

DIRTY DATA- THE PAINS?

A wrong data input will give a wrong output, skew decisions to very high cost implications which could be devastating to the organization.

Dirty Data affects a business’ ability to gain any kind of competitive edge – hampering its efforts to deliver a personalized cross-channel customer experience.

GOING FORWARD

It is very important that we imbibe a clean data culture and ensure that every data we initiate is consciously done correctly.

For consulting services,

Contact: Mojisola Olawepo, Tel: +2347088487195

25 thoughts on “The Pains of Dirty Data in the Enterprise

  1. This speaks to data structure, especially at point of entry as pointed out. Data approval and cross checks can help. If Mr Jeff had been sent his data earlier on, he would have found out that he is a woman. Maybe then the data would have been cleaned.

    Like

    1. As a former Customer Care Officer of a bank, I can relate to the effects of dirty data in a database. Caution should be consciously exercised at the point of data entry.

      Like

  2. I totally agree to having a clean data as a dirty data will always have an effect mostly negative on the overall goal of the project or vision.

    Liked by 1 person

  3. MY VIEW

    Dirty data, also known as rogue data, are inaccurate, incomplete or inconsistent data, especially in a computer system or database.

    Dirty data can contain such mistakes as spelling or punctuation errors, incorrect data associated with a field, incomplete or outdated data, or even data that has been duplicated in the database.

    They can be cleaned through a process known as “data cleansing” (the process of detecting and correcting or removing corrupt or inaccurate records from a record set, table, or database and refers to identifying incomplete, incorrect, inaccurate or irrelevant parts of the data and then replacing, modifying, or deleting the dirty or coarse data).

    Dirty data in a nutshell contain errors. From my findings, it can be caused by a number of factors such as;

    • Duplicate records
    • Incomplete or outdated data
    • The improper parsing of record fields from disparate systems.

    As stated in the lecture note (on the webpage) that; ” A recent research has shown that human error is responsible for the 59% majority of inaccurate data in many organizations”, I strongly second this.

    From my research online, I realized the following 6 Ways to avoid Dirty Data;

    1. Configuring database correctly
    2. Training of data users
    3. Assigning a dedicated person (data champion)
    4. Checking format
    5. Avoiding duplicate
    6. Identifying early, and stopping widespread of dirty data in the system

    It has been observed and proven that
    Dirty data really hinders business efficiency. It is important that every company execute data quality management to avoid:

    • Loss of revenue
    • Ineffective marketing
    • Wrong decisions etc.

    Liked by 1 person

  4. Actually, I faced a similar problem while writing my final year project as a undergraduate. I made series of mistakes at the point of inputting raw data and it cost me dearly; time, money, energy. However, I believe it’s not impossible to not make mistakes at the point of inputting data into the system, it just requires proper attention, patience and the ride will be hitch-free!

    Like

  5. Dirty data is the inaccuracy or inconsistent information in a data base and it can only be corrected by data cleansing. Just as an eraser is made for a pencil ✏️, so also data cleansing is made for dirty data. So we can say that data cleansing is very key as data elites.

    Liked by 1 person

    1. Dirty data is simply a database records that contains errors which can be caused by several factors of which few of it are:
      •Duplicate Records,
      •Incomplete data
      •Outdated data
      •Improper parsing of record fields from disperate systems
      •Incorrect data
      •Inconsistent data

      In business operation For instance dirty data can lead to
      •loss of revenue
      •Wrong decisions
      •Innefective marketing etc. Which hinders the business efficiency. According to research made by TDWI (The data warehousing institute), they estimated that dirty data costs US business more than $600billion each year so in other to avoid all these before a data is processed there should be a thorough data cleansing (the process of detecting and correcting or removing corrupt or inaccurate records from a database.) in other to Improve the Efficiency of Customer Acquisition Activities, decision Making Process, and yield increase productivity.

      Liked by 1 person

  6. Dirty data is the inaccuracy or inconsistent information in a data base and it can only be corrected by data cleansing. Just as an eraser is made for a pencil ✏️, so also data cleansing is made for dirty data. So we can say that data cleansing is very key for data elites.

    Liked by 1 person

  7. 1. Dirty Data results in solving a business problem that never existed. As the dirty data will give a wrong representation of actual happenings in the business.

    2. Since human error constitutes a major part of dirty data. The process of data collection can be automated with validation process also put in place to tackle the issue from source.

    Like

    1. Your line of thought got me…” automating the data collection processes and the setting business rules for validation. Top notch indeed!

      Like

    2. its always said that Garbage in Garbage out so also is our output a function of whatever we input into the system we therefor have to be carful at inception so we do not have to build our data infrastructure on the wrong foundation . i think data should also be reviewed from time to time and cleansed if need be.

      Liked by 1 person

      1. Hello George, I love your reference to garbage in … garbage out… I’m with you on this… I loved reading your response… Apt!

        Liked by 1 person

  8. Dirty Data becomes an obstruction to making an accurate Business Insight and excellent Value Proposition that makes for a Point of Difference (PoD) in terms of organisation’s competitive advantage.
    Hence, controls must be put in place during Data collection and at every stage of Data Processing. I call these stages ‘Critical Control Points’- CCPs.

    Liked by 1 person

  9. Dirt data is simply wrong data. I will explain using my self as an example.

    About 7 years ago when I was admitted into the university, I was asked to fill in my details online;

    My name, DOB, state of origin, gender, faculty and department. I mistakenly forgot to put in my state of origin.

    When I got my departmental ID card, it was showing Anambra as my state of origin because of my surname and that was a wrong information(this part is called dirt or wrong data).

    I went to meet with the ICT unit for corrections, and because of that, they had to carry out an overall check on the data submitted by student because it affected most students(this part is the data cleansing).

    From this, I can conclude that without dirty or wrong data there won’t be data cleansing.

    Therefore, dirty data and data cleansing goes hand in hand.

    Liked by 1 person

  10. The importance of data cleansing to an organization cannot be underemphasized as the cost of making decisions based on dirty data may not be quantifiable. It is not about the loss of revenue but also about the loss of the company’s good brand or excellent reputation in the global space. Imagine the impact of the use of dirty data on global firms such as Amazon, McKinsey etc.
    Hence, I suggest that every organization should have a dedicated team as well as an automated system to ensure clean data is available ALWAYS.

    Liked by 1 person

  11. Inconsitency and less than accurate(DIRTY) data hurts insight.
    It is imperative that data analyst/ data officer ensure and maintain clean data.
    Some suggested methods of data cleansing include but not limited to;

    # Spell Check
    # Use Find and Replace to Clean Data in Excel.
    # Get rid of Extra Spaces
    # Remove Duplicates
    # Select and Treat All Blank Cells by adding “O” or “N/A” in number and text fields respectively.
    # Convert Numbers Stored as Text into Numbers and vise versa as required.
    # Maintain consistency in text formats, change Text to Lower/Upper/Sentence case

    Liked by 1 person

  12. It is true that sloppy data entry habits are often culprits of dirty data and recognizing dirty data is of high importance.

    For incorrect data – for data to be correct its values must adhere to valid values. For example a month must be in a range of 1-12 or age less than 130. Correctness of data value can be programmatically enforced with edit checks and lockup tables.

    For inaccurate data – I would like to also state that a value data can be correct without being accurate. For example the state code for LG (Lagos) and a city name Ibadan are both correct but when put together the state code is wrong because Ibadan isn’t in Lagos but Oyo. It is difficult to programmatically enforce with simple edits and lockup tables, however accuracy can be validated by manually spotting against paper files or asking a person.

    Non integrated data could also be a source of dirty data because organizations could store data inconsistently across many system with varied storage methods, inconsistent primary key which often don’t match or do not exist. The development of integrated systems solely for data collection and analysis could serve as a solution.

    Liked by 1 person

  13. Ever wondered why you received an error note while getting registered on a new website? Website administrators are gradually taking conscious approaches towards combating challenges of gathering dirty data.

    For a next time, be extra careful to fill in the right details in the appropriate section to make your data an information/insight rather than a problem to cleansed.

    Liked by 1 person

  14. The importance of avoiding human error while inputting data records can’t be over emphasized especially for a big data enterprise. Dirty data can really pose lots of set back and increments in unnecessary down time of a firm thereby affecting their productivity and net profit.
    I will recommend that data should be clean often cos to me clean data will generate clean insights which will bring about clean outcome and thereby clean profit……

    Like

Leave a reply to Olusanya Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.