Mistaken Identity Resolution Part III: Identity Resolution vs. Data Integration
By Ram Anantha, Infoglide Software
In this blog, we will pick up our conversation from our Mistaken Identity Resolution series and discuss how identity resolution and data integration are related or unrelated.
Data integration is the process of combining data residing at different sources and providing the user with a unified view of this data. Data integration resolves the complex issues that arise from data fragmentation including poor data quality, inconsistencies in the structure and meaning of data, and inadequate data governance. One of the common approaches to integrating data is to bring it all together into a single data warehouse. In this approach, data from disparate sources and locations within an enterprise are loaded into the centralized structure of a data warehouse or dedicated database. Typically, in a data integration exercise, the variations and anomalies in the data present across the various data repositories are combined into one representation.
Since data integration involves presenting a unified view of the data, identity resolution techniques can effectively be used for this purpose. Identity resolution systems have sophisticated matching and linking algorithms that can match two records from different data sources and determine the likelihood of whether they are the same or not.
In contrast to data integration though, identity resolution systems do not require a centralized data warehouse to operate on. Instead, these systems can connect to disparate, multiple databases and operate on the data in its native format without the need to cleanse the data.
Additionally, as pointed out here, identity resolution technologies can be differentiated from data integration technologies because they actually derive significant value from the inconsistency of data drawn from multiple sources.
For example, you may have two individuals with the same exact name and the same exact address. Data integration technologies would see this as the same person and combine them, in effect ‘losing’ one of the records. Identity resolution technology would score these two records as having a very high likelihood of being the same entity but would not remove either record. Then when subsequent data gets entered into the system that shows that one individual has a birth date of 1-23-64 and the other has a birth date of 2-5-88 it becomes clear that these are actually two separate individuals, and it’s likely a parent-child relationship.
In conclusion, while identity resolution technologies are capable of doing much more than just data integration, they can be used for this purpose. Their greatest value though is in being able to create a single view of an entity (person, product, etc.) while maintaining forensic data that may be useful. This allows the view of the entities to evolve as new data is entered. Many organizations can benefit from this ability to provide a unified view of their data that is capable of adapting to new information.




