Courtesy of Jim Harris of Information Management
Since how data quality is defined
has a significant impact on how data quality is perceived,
measured, and managed, in this post I examine the two most
prevalent perspectives on defining data
quality, real-world
alignment and fitness for the purpose of
use, which respectively represent what I refer to
as the danger of data myopia and the
challenge of business relativity.
Real-World Alignment: The Danger of Data Myopia
Whether it's an abstract description of real-world entities
(i.e., master data) or an abstract description of real-world
interactions (i.e., transaction data) among entities, data is an
abstract description of reality. The creation and maintenance of
these abstract descriptions shapes the organization's perception of
the real world, which I philosophically pondered in my post "Plato's Data."
The inconvenient truth is that the real world is not the same
thing as the digital worlds captured within our databases.
And, of course, creating and maintaining these digital worlds is
no easy task, which is exactly the danger inherent with the
real-world alignment definition of data quality - when the
organization's data quality efforts are focused on minimizing the
digital distance between data and the constantly changing real world that
data attempts to describe, it can lead to a hyper-focus on the data
in isolation, otherwise known as data myopia.
Even if we create and maintain perfect real-world alignment,
what value does high-quality data possess independent of its
use?
Real-world alignment reflects the perspective of the data
provider, and its advocates argue that providing a trusted source
of data to the organization will be able to satisfy any and all
business requirements, i.e., high-quality data should be fit to
serve as the basis for every possible use. Therefore, in theory,
real-world alignment provides an objective data foundation independent of
thesubjective uses defined by the
organization's many data consumers.
However, providing the organization with a single system of
record, a single version of the truth, a single view,
a golden copy, or a consolidated repository of
trusted data has long been the rallying cry and siren song
of enterprise data warehousing (EDW), and
more recently, of master data management (MDM). Although
these initiatives can provide significant business value, it is
usually poor data quality that undermines the long-term success and
sustainability of EDW and MDM implementations.
Perhaps the enterprise needs a Ulysses pact to protect it from believing
in EDW or MDM as a miracle exception for data quality?
A significant challenge for the data provider perspective on
data quality is that it is difficult to make a compelling business case on the
basis of trusted data without direct connections to the specific
business needs of data consumers, whose business, data, and
technical requirements are often in conflict with one another.
In other words, real-world alignment does not necessarily
guarantee business-world alignment.
So, if using real-world alignment as the definition of data
quality has inherent dangers, we might be tempted to conclude that
the fitness for the purpose of use definition of data quality is
the better choice. Unfortunately, that is not necessarily the
case.
Fitness for the Purpose of Use: The Challenge of Business
Relativity
In M. C. Escher's famous 1953 lithograph "Relativity," although we observe multiple, and
conflicting, perspectives of reality, from the individual
perspective of each person, everything must appear normal, since
they are all casually going about their daily activities.
I have always thought this is an apt analogy for the multiple
business perspectives on data quality that exists within every
organization.
Like truth, beauty, and art, data quality can be said to be in
the eyes of the beholder, or when data quality is defined as
fitness for the purpose of use - the eyes of the user.
Most data has both multiple uses and users. Data of sufficient
quality for one use or user may not be of sufficient quality for
other uses and users. These multiple, and often conflicting,
perspectives are considered irrelevant from the perspective of an
individual user, who just needs quality data to support their own
business activities.
Therefore, the user (i.e., data consumer) perspective
establishes a relative business context for data quality.
Whereas the real-world alignment definition of data quality can
cause a data-myopic focus, the business-world alignment goal of the
fitness for the purpose of use definition must contend with the
daunting challenge of business relativity. Most data has multiple
data consumers, each with their own relative business context for
data quality, making it difficult to balance the diverse data needs
and divergent data quality perspectives within the conflicting, and
rather Escher-like, reality of the organization.
The data consumer perspective on data quality is often the root
cause of thedata silo problem, the bane of successful
enterprise data management prevalent in most organizations, where
each data consumer maintains their own data silo, customized to be
fit for the purpose of their own use. Organizational culture and
politics also play significant roles since data consumers
legitimately fear that losing their data silos would revert the
organization to a one-size-fits-all data provider perspective on
data quality.
So, clearly the fitness for the purpose of use definition of
data quality is not without its own considerable challenges to
overcome.
How Does Your Organization Define Data Quality?
As I stated at the beginning of this post, how data quality is
defined has a significant impact on how data quality is perceived,
measured, and managed. I have witnessed the data quality efforts of
an organization struggle with, and at times fail because of, either
the danger of data myopia or the challenge of business relativity -
or, more often than not, some combination of both.
Although some would define real-world alignment as data quality
and fitness for the purpose of use as information quality, I have
found adding the nuance ofdata versus information only further
complicates an organization's data quality discussions.
But for now, I will just conclude a rather long (sorry about
that) post by asking for reader feedback on this perennial
debate.
How does your organization define data quality? Please share
your thoughts and experiences by posting a comment below.