Courtesy of Database Journal
In the macro viewpoint, data governance is data governance,
regardless. The micro view however should celebrate the nuances of
data governance. One of those nuances is Data Governance for
Education, which includes some unique challenges that are not
always obvious.
Whether the notepaper being used is kindergarten blue or college
ruled, the approaches to managing data in academia have been
hampered by lack of funds and resources. The end result is that
although educational environments could definitely benefit from the
knowledge that could be found in the data they store, they are
frequently prevented from effectively learning important data
lessons. Challenging distinctions for education include the storage
of massive amounts of unstructured and/or historical data that has
not been computerized, the necessity of a robust privacy approach
to protect student data, maintenance of the syllabi and course
materials, faculty information that must be maintained to ensure
quality instructors, certification and re-certification preparation
information and human resources data, just to name a small
sample.
Despite the complexities, it is easy to understand that
education environments receive overwhelming amounts of data that
must become data assets. The question is, "what is the best
approach to bring universal meaning and order to that data?"
The Voice of Authority
Just as sports teams need a coach to help the team excel, data
governance initiatives need authoritative sponsorship. The data
governance problem as a whole would be overwhelming and the effort
would likely never move beyond the outline stage without a sponsor
to facilitate, mitigate and provide appropriate resources. Once a
sponsor is identified, the scope of the Data Governance initiative
can be determined and a plan of action can be designed.
Defining the Problem
Before the problem can be solved, it needs to be understood. An
example of a problem statement might be, "Data is being stored in a
manner that precludes its usefulness both for day-to-day operations
and planning." From there, the problem statement can be further
defined:
- There is no road map for defining the business terms,
definitions, appropriate data uses or metadata.
- There is no universal understanding of the chaining of data
assets.
- There are no roles and responsibilities directly tied to data
quality, data protection or data governance.
Working the Problem
With the problem statement defined, the data governance focus
should begin with the plan to transform the "As Is" to the "To
Be".
Breaking the process into logical sections, incrementally moving
forward, establishing checkpoints, re-assessing when necessary and
publishing successes to appropriate officials will help ensure that
the work effort remains on track.
Strength from Weakness
Silos of data are typically one of the weaknesses identified
during the majority of Data Governance initiatives. While silos
lead to data inconsistencies and often prevent effective
communication those same silos can become a strength during an
initial Data Governance effort. Since most education environments
have developed logical data silos (departments, schools, etc.)
simply by the nature of their mission, one approach might be to use
those silos to initially subset the task into manageable
modules.
Silos, Experts and Objectives
A Data Governance team is now needed. It should be composed of
members who are familiar with the full spectrum of academic
operations and data flows. These are not typically the information
technology professionals, but instead they are the 'business'
experts. Depending on the type and size of the educational
environment, these members could include employees of the
administrative branches, academic advisors or possibly
representatives of the offices of the Dean or Provost. These should
be the individuals who understand the data, how it is sourced and
how it is used. Employees who are familiar with problems or issues
due to lack of data quality can be instrumental in driving positive
changes, so including those individuals in the team is optimal.
This stage begins with some detective work and may uncover some
unexpected outcomes. The silos that were defined in the first
discussions may not be those that are listed in the final meeting.
That is a benefit of bringing this team together since the initial
assumptions may not match the reality.
What are the Data Assets?
Now that the silos have been identified, the composition of the
Data Governance team may need to change. The members now need to
begin the detailed discovery phase of the initiative. The goal for
the Discovery Phase is to discover business and technical subject
matter experts who can help identify and explain data that is
important to their particular silo.
Identification should include all facets of the data. In
addition to obvious sources of data (such as databases) data assets
should be considered in pseudo-databases which may include
information stored in spreadsheets, hard copy and even small local
databases. If accurate metadata is available, it will be valuable
during this phase. Code reviews may be helpful to determine usage
patterns. Reporting and statistical analysis requirements should
also be reviewed since they may point to data assets that are
critical to ongoing operations.
What is the Source of Those Assets?
After identifying the data assets, it becomes important to know
the source of those assets. How is the data obtained? Is it entered
via an online application, uploaded to a database from a source
list, or generated by a batch process?
An ancillary question that can provide value toward further data
governance steps is how often the data is updated and/or accessed.
Knowing whether data assets are historical in nature or used
frequently can help with later prioritization considerations.
Take a Check Point
After the data assets and their sources of data have been
identified, a compilation of the gathered information is needed.
For each silo, a list of the data assets, the sources of those
assets and their definitions should be prepared. An overarching
analysis of these lists will likely indicate that some data assets
are used by more than one silo. This is possibly the first
opportunity for the Data Governance team to see realizable
opportunities for improvements for data issues such as duplication,
ambiguity, incompleteness or other data concerns.
How are Those Assets Currently Being
Used?
With the overarching list of data assets, a determination of how
those assets are being used can begin. The team will undoubtedly
confront some obstacles in this phase, but without this
information, the foundation of the Data Governance program will be
unstable and each successive step may cause re-work.
Definition scenarios can be especially challenging to decipher.
Consider data assets that are 'named' differently but which
represent the same meaning, both within silos and spanning
different silos. For example, what do the terms 'admission' and
'enrollment' mean? Do they mean the same thing, but are just known
by different data naming conventions? If so, imagine the confusion
of the new employee who sees these terms as presenting two
different concepts because they are named differently. Perhaps
these two terms do mean different things, but each department
ascribes their own individually distinct meaning and there is no
single holistic definition for each term. Consider too that
historically, these terms may have had completely different
meanings than they do now and perhaps some of these older meanings
are still housed in currently used information systems.
How Should Those Assets be Used?
Now that the current use of the data is understood, it is time
to determine how the data 'should' be used. Often the 'currently
used' answer is different than the 'should be used' answer.
Duplication of data, data inconsistencies, and misleading data
definitions are all prime candidates for review.
Predominant data quality standards and validations will
logically begin to be defined during this step. At this point, the
team may want to consider beginning the effort to build an initial
'business glossary' which can provide meaning and definition to the
data asset terms and set standards to facilitate clear
communication for the rest of the Data Governance process.
Who is the Owner?
One of the most critical parts of any data governance approach
is the identification of data stewards. Data stewards are
considered the 'owner' of the data asset. They hold responsibility
for ensuring that the data within their purview meets quality
standards, answers a business need and that it is appropriately
available to authorized users. Data stewards are the champions of
the data. They are the ultimate layer of quality control. Typically
their job function will depend upon the data that they own and
therefore, they will have a vested interest in maintaining it
properly.
Applying a Data Governance Maturity
Approach
All the data lessons learned have been leading up to this point.
This is when the true Data Governance maturity phase can begin.
Depending on the outcome of the previous investigations, a decision
can be made about the nature of the appropriate Data Governance
model and approach. With the approach defined, now is the time to
begin the synchronization efforts that will bring the silos into
the whole.
Learning the Right Lesson
Data Governance is not a onetime event. Data must be
consistently viewed as an asset and the culture must recognize and
support the continuing protection of the Data Governance processes.
With the right sponsor, the evolutionary process to enable an
ongoing governed approach to data quality and protection will
provide significant benefits both for the present and the future.
To enable that future, however, the culture must change and adapt
to one that recognizes the value of the data and as a result,
embraces a Data Governance focused mindset.
Before academia can learn the lessons that the data assets
provide, they have to build the foundational knowledge required.
Data Governance provides that foundation.
Data Governance Resources:
"
The DAMA Guide to the Data Management Body of Knowledge"
"
IBM Data Governance Council Maturity Model"