Courtesy of Sys-Con
Media
Practical
considerations
Customer engagement has long benefited from data and analytics.
Knowing more about each of your customers, their attributes,
preferences, behaviors and patterns, is essential to fostering
meaningful engagement with them. As technologies advance, and more
of people's lives are lived online, more and more data about
customers is captured and made available. At face value, this is
good; more data means better analytics, which means better
understanding of customers and therefore more meaningful
engagement. However, volumes of data measured in terabytes,
petabytes, and beyond are so big they have spawned the terms "Big
Data" and "Big Analytics." At this scale, there are practical
considerations that must be understood to successfully reap the
benefits for customer engagement. This article will explore some of
these considerations and provide some suggestions on how to address
them.
Customer Data Management (CDM), also known as Customer Data
Integration (CDI), is foundational for a Customer Intelligence (CI)
or Customer Engagement (CE) system. CDM is rooted in the principles
of Master Data Management (MDM), which includes the following:
- Acquisition and ingestion of multiple, disparate sources, both
online and offline, of customer and prospect data
- Change Data Capture (CDC)
- Data cleansing, parsing, and standardization
- Entity Modeling
- Entity relationship and hierarchy management
- Entity matching, identity resolution, and persistent key
management for key individual, household,
company/institution/location entities
- Rules-based attribute mastering, "Survivorship" or "Build the
Best Record"
- Data lineage, version history, audit, aging, and
expiration
It's useful to first make the distinction between attributive
and behavioral data. Attributive data, often referred to as profile
data, is discrete fields that describe an entity such as an
individual's name, address, age, eye color, and income. Behavioral
data is a series of events that describe an entity's behavior over
time, such as phone calls, web page visits, and financial
transactions. Admittedly, there is a slippery slope between the
two; a customer's current account balance can be either an
attribute or an aggregation of behavioral transactions.
MDM typically focuses on attributive data. Being based on MDM,
the same is true for CDM. Personally Identifying Information (PII)
such as name, email, address, phone, and username are the primary
drivers behind identity resolution. Other attributes such as
income, number of children, or gender are attributes that are
commonly "mastered" for each of the resolved entities (individual,
household, company).
Enter Big Data. As more devices are developed - and adopted -
that capture and store data, huge quantities of data are generated.
Big Data, by definition, is almost always event-oriented and
temporal, and the subset of Big Data that is relevant to a CE
system is almost always behavioral in nature (clicks, calls,
downloads, purchases, emails, texts, tweets, Facebook posts).
Behavioral data is critical to understanding customers (and
prospects). And, understanding customers is critical for
establishing meaningful and welcome engagement with them.
Therefore, Big Data is, or should be, viewed as an invaluable asset
to any CE system.
Further, this sort of rich, temporal behavioral data is ripe for
analytics. In fact, the term Big Analytics has emerged as a result.
Big Analytics can be defined as the ability to execute analytics on
Big Data. However, there are some real challenges involved in
executing analytics on Big Data, challenges that drive the need for
specialized technologies such as Hadoop or Netezza (or both). These
technologies must support Massively Parallel Processing (MPP) and,
just as importantly if not more so, they must bring the analytics
to the data instead of bringing the data to the analytics. Having
recently completed a course for Hadoop developers (an excellent
course that I highly recommend), I have a heightened appreciation
for the challenges related to managing and analyzing data "at
scale" and the need for specialized technologies that support Big
Data and Big Analytics.
A few significant points regarding Big Analytics should be
considered:
- Big Analytics allow the build of models on an entire data set,
rather than just a sampling or an aggregation. My colleague, Jack
McCush, explains: "When building models on a small subset and then
validating them against a larger set to make sure the assumptions
hold, you can miss the ability to predict rare events. And often
those rare events are the ones that drive profit."
- Big Analytics allow the build of non-traditional models, for
example, social graphs and influencer analytics. Several useful and
inherently big sources of data such as Call Detail Records (CDRs)
generated from mobile/smart phones and web clickstream data both
lend themselves well to these models.
- Big Analytics can take even traditional analytics to the next
level. Big Analytics allows the execution of traditional
correlation and clustering models in a fraction of the time, even
with billions of records and hundreds of variables. As Revolution
Analytics points out in Advanced 'Big Data' Analytics with R
and Hadoop, "Research suggests that a simple algorithm with a large
volume of data is more accurate than a sophisticated algorithm with
little data. The algorithm is not the competitive advantage; the
ability to apply it to huge amounts of data-without compromising
performance-generates the competitive advantage."
Big Data is great for a CE system. It paints a rich behavioral
picture of customers and prospects and takes CE-enabling analytics
to the next level. But what happens when this massive behavioral
data is thrown at a CDM/MDM system that is optimized for
attributive data? A "basketball through the garden hose" effect
might occur. But this doesn't have to happen; there are ways to
gracefully extend CDM to manage Big Data.
The key is data classification. Attributive, or profile, data is
classified separately from behavioral data. While both contain
Source Native Key (e.g., cookie-based visitor id, cell phone
number, device id, account number), attributive data can be
structured only. Behavioral data, on the other hand, can be
structured and unstructured and contains no PII. Big Data almost
always falls under the behavioral category.
Importantly, behavioral data requires different processing than
attributive data. Since the processing is different, the two
streams can be separated just after ingestion, like a fork in the
road, with the attributive data going one way and the behavioral
data going the other. This is the key to integrating Big Data into
a CDM-MDM system without grinding it to a halt. To be fair, the two
streams aren't completely independent. The behavioral stream will
typically require two things from the attributive stream: Dimension
Tables and Master ID-to-Natural Key Cross-References - both of
which can be considered as reference data.
Dimension Tables
For example, the "subscriber" dimension table may be required in
the Big Data world so that it can be joined to the "web clicks"
table. This is done in order to aggregate web clicks by subscriber
gender, which only exists in the subscriber table.
Master ID-to-Natural Key Cross-References
Master IDs are created and managed in the CDM-MDM world, but they
are often needed for linkage and aggregation in the Big Data world.
Shadowing cross-references that map master IDs, such as master
individual id, to "source natural keys" into the Big Data world
solves this problem.
The two classifications of data are separated into two streams
and processed (mostly) independently. How do they come back
together? One way this architecture works is that both streams,
attributive and behavioral, contain a "source natural key." This is
a unique identifier that relates the two streams. For example, web
clickstream data typically has an IP address or a web
application-managed, cookie-based visitor ID. Transactional data
typically has an account number. Mobile data will have a phone
number or device ID. These identifiers don't have to mean anything,
per se, but are critical for stitching the two streams back
together.
It's not just the dimensionalized, aggregated data that is
reunited with the profile data, but also the high-value, behavioral
analytics attributes (predictive scores, micro-segmentations, etc.)
created courtesy of Big Analytics. The attributive data is now
greatly enriched by the output of the Big Data processing stream.
And, to get things really crazy, these enriched behavioral
analytics profile attributes can be used as part of the next cycle
of matching; similar, complex behavior patterns can help tip the
scales, causing two entities to match that might not have matched
otherwise. In the end, CDM-MDM and Big Data can live
together harmoniously; Big Data doesn't replace CDM-MDM, but rather
extends it.