Home » Garbage In, Garbage Out? Not Necessarily.

Garbage In, Garbage Out? Not Necessarily.

By Douglas Wood, Infoglide Senior Vice President

One of the oldest phrases in computer science seems to still be in vogue. “Garbage in, garbage out” (GIGO) is a term coined during the early days of the computing industry. It pointed out that the value of computer systems of the day were entirely dependent upon their input data. No amount of processing power could produce a right answer from bad data.

Fast forward many decades. The same phrase is still used today to emphasize the importance of data quality in many application areas (e.g., healthcare). While high quality data remains important, two factors influence me to say that GIGO is not the absolute rule that it once was: (1) advancements in the evolution of software and hardware technology, and (2) the emergence of whole classes of new applications targeting fraud detection.

What happens when the quality of data is “enhanced”? Processes like data transformation, data cleansing, and de-duplication filter out information that is unnecessary and confusing. Names, addresses, and other attributes are standardized. Duplicate records are deleted. Links to “bad” data are broken. Master records, aka “golden records”, are created for use by multiple systems.

While this has great value for traditional systems, it can devastate fraud detection efforts. For example, discovering and evaluating multiple addresses during fraud analysis is crucial in finding and prosecuting perpetrators of fraud. Or conversely, standardizing multiple forms and instances of someone’s name held in multiple data sources may remove vital clues and break a forensic chain of evidence.  We sometimes refer to the result as data deterioration.

So “garbage in, garbage out” is still an operative phrase for most software systems, but for entity resolution, we’ve found repeatedly that “one man’s garbage is another man’s treasure.”

Leave a Reply

captcha