Courtesy
of Gigaom

The IT hype machine has everyone jumping
on the big data bandwagon. Consultants are making millions helping
companies search for nuggets of insight in loosely related data and
the hardware industry is enjoying a mini-boom as more data than
ever is being saved and more storage and computing power is being
sold to process it.
Why bigger doesn't always means better
At the center of much of the discussion is Hadoop. This
confusing collection of distributed computing technology may be
open source, but it's neither cheap nor friendly despite the cute
elephant logo. In fact, Hadoop and big data seem like the dream
ticket for the vendors of big storage and big iron, many of whom
have made expensive acquisitions to get into this lucrative
market.
But before we start saving every scrap of data in the enterprise
for fear that we will miss a nugget of insight, shouldn't we focus
on what we already have? Surely, the real goal is to enable more
people in the company to do more with their existing data before
adding new data of undetermined relevance and quality? Perhaps it
makes more sense to get off the big data bandwagon and focus on
empowering the business user to use the data that they've already
got more effectively - not feed the elephant and its ecosystem of
hangers on.
Often, the big data discussion is framed by the implied premise
that bigger is better and that adding more data will naturally
produce insights. Should you buy into the hype? Big data projects
come with big investments in complex computing systems and the
specialized skills to make them go. Worse, they are burdened by
notoriously long deployment schedules and poor performance.
You don't need more dead data
Maybe some huge enterprises and government departments do need
big data, but what about the rest of us? Can collecting more data
help? Perhaps, but you must first answer: Am I getting
useful, timely answers from the data I already
have. Do I have disciplines in place to operationalize insights and
measure their impact on the business? Sadly, if the answer is no,
you are not alone. According to a recent study by Freeform
Dynamics, only 15 percent of enterprises feel they fully exploit
traditional database information for decision making.
It seems that most of the data already stored for analysis is
going underutilized. To the point, Bill Inmon, father of the data
warehouse, claims that 95 percent of data in a warehouse is
"dormant." Will adding terabytes or petabytes of unstructured data
to your already underutilized data warehouse change this? Probably
not. In fact, there's a good chance that it will make dormant data,
dead data.
What companies need is not dormant or dead data. They need data
that helps them gain operational insights to make their existing
business run better. They need data that empowers their business
users to be more creative and productive. They need live "quick"
data not dormant or dead data. If this makes sense to you,
how do you get there?
End big, but start small
First, take stock of what you already have: Not just data but
also knowledge and skills. Select a project where you can
demonstrate incremental improvement with existing resources. If you
need to hire, think business analyst, not technologist, because a
dollar spent answering a business question is an investment; one
spent on specialized IT skills to support the process is sunk
cost.
Second, consider more agile off-the-shelf tools that will allow
you to think big, but start small and scale fast. Think friendlier
tools, accessible to your existing staff. This approach will
deliver more business insight today and many such tools scale well
unless tested against the most extreme big data problems. The
solution should allow intuitive use by the business manager with
extensibility to support more complex mining by an experienced
analyst. Knowledge of the underlying data structure or processing
platform should not be necessary.
The analytics engine should run on standard servers with no
proprietary hardware or specialized configurations, database
schemas or tuning required to achieve the required performance. And
because loading data into the analytics database can become your
most time consuming effort, connections to your data sources should
be based on industry standards and designed to greatly simplify
data load from multiple formats.
Finally, adopt an agile, iterative approach - don't go big bang
on big data. Successful analytics initiatives are based on an
ongoing dialog with the data meaning that a set of questions is
asked the answers to which set up the next round of discovery. With
each cycle more is understood about what data present, what is
relevant, what needs to be added and how much history is worthwhile
to add. Rapid time-to-answer is the most critical factor in
harvesting the value from your data-big or small.
Maybe big data analytics will someday become a must-have for
every business, but don't be persuaded that this is the case now
just because management consultants and major vendors are throwing
millions of dollars at "What are you missing by not using big data
analytics?" messaging. In all likelihood, you're not missing
anything and your time and money are better spent putting your
existing data in the hands of more business users and giving them
the tools to do deeper and faster analytics now.
One thing that evolution shows us is that small, agile species
tend to do better than large, specialized species. Maybe we should
apply the same thinking to our data?
Fred Gallagher is general manager of Vectorwise at Actian
Corporation.