Courtesy of Information Management
The problems of
master data management sometimes seem to be intractable.
It is difficult to keep master data synchronized with business
reality, deduplication of records that exist for the same entity
instance is a perennial issue, and the quality of non-key data
attributes often cannot be determined with certainty. The
traditional MDM paradigm has been to take data produced in
transaction applications and try to automate its integration and
cleansing. Such "transaction adaptation hubs" have brought
the promise of clever algorithms that could do all the work
to arrive at a single golden record for master data.
Alas, this has not turned out to be so. The algorithms in
traditional MDM hubs have done well, but in many cases, they have
not done well enough. One response has been to add
functionality to these hubs that permits data stewards to detect
and correct data. However, this creates additional issues,
such as when data is not corrected synchronously in the
corresponding upstream transaction applications.
Another approach has been to separate the production of master
data from its distribution. Central hubs remain for
distribution purposes, but master data is created in specialized
environments. The specialized environments are like farms where
crops of master data are grown, and the hubs akin to markets, where
the master data is taken after it is production-ready. This
pattern works reasonably well when the master data is about
specialized, high value entities, such as institutional clients in
brokerage businesses. However, there can still be problems,
such as failing to detect changes in attribute values. This
approach is also very difficult to scale when there many entity
instances to be mastered, such as in retail banking.
Who Owns the Data?
Once we recognize the problems in the traditional approaches to
MDM, what can we do about them? First of all, we have to
ensure that data quality is high when it is first captured by the
enterprise. However, are we taking this idea far enough?
Let us ask a question about customer data: Who owns it? If
I ask an IT person, he or she will likely identify one or more
business users. But this is a problem. To the IT mindset,
"data owner" typically means one or both of two things:
(a) A data owner is someone who has
some form of governance responsibility for the data. It is
almost never explicitly stated what such responsibility
involves. "Owner" is an analogy; the "owner" is expected to
look after the data as if he or she was in possession of it.
However, such an expectation is often unrealistic. How can such an
"owner" of customer data always be expected to know if the data is
right or wrong?
(b) A data owner is somebody who can
give IT requirements so that IT can do their job. In other words,
in data-centric development projects, "data owner" is a mere
substitution for the "business sponsor" in the old systems
development lifecycle.
Let us ask the question again: Who
really owns customer data? We propose that, in reality, each
customer owns it - not any one person in the enterprise. If we
truly subscribe to this viewpoint, it has profound implications for
MDM.
Data Owner Driven Architecture and
Governance
Many data managers might think this is fairly obvious, and it
is. Who knows the customer's basic information better than
the customer themselves? Who finds out earlier about changes
to this information than the customer involved in said changes?
Yet, if it is so obvious, why do our MDM architecture and
governance processes seem to be based off of different paradigms?
MDM hubs that capture data from transaction applications are not
capturing data directly from the customer. By definition, these
hubs cannot be updated with customer data unless the customer is
involved in the transaction.
More importantly, especially in light of evolving laws about
data management, who "owns" the information about a customer, if
not the customer themselves? Surely, Malcolm Chisholm owns the
information about Malcolm Chisholm and Fabio Corzo owns the
information about Fabio Corzo. And if it is true that a
customer owns his or her own information, what does this mean, in
terms of how an enterprise should treat a customer?
Furthermore, with ownership, comes responsibility. What are
the individual customer's responsibilities when it comes to their
own information?
Clearly some of these issues will take many years to sort
out. In the meantime, enterprises can begin to consider what
it might mean to their MDM architecture and governance processes,
if they were to take seriously the idea that customers are the true
owners of their own data.
If this were an accepted idea, it is difficult to see how the
implementation of a transaction adaptation hub would apply this
principle. An enterprise would want to find ways to get as
close to the customer as possible, in terms of customer data
management. If a customer was able to directly manage their
data, there would be no need for probabilistic trust and
survivorship rules. Why should users not be able to provide
demographic information about themselves, or update life events
directly?
Current State and Looking Forward
Many enterprises reason that traditional architecture and
governance approaches are the best approach, given the decades of
organic growth in data architecture. But why? The growth of
the Internet and the rise of social media have brought people into
direct contact with enterprises, in terms of data, and have shown
that individuals are quite willing to maintain their own
profiles. Of course, there are issues to be worked out;
people are not always accurate in describing themselves and may be
sloppy in the way they maintain their personal data. However, this
overall approach is superior to what we have today.
We have focused on the example of customer data here, but this
concept can be applied to other master data subjects. Find the true
owner of the data, rather than the owner of a data store, let them
manage the master data and adjust architecture and governance to
fit this approach.