Courtesy of ZDNet
Big Data in the U.S. Federal government
isn't just for grandiose inter-agency projects. Tactical,
operational applications can use Big Data technology too.
This guest post is from Vishwas Lele, CTO at Applied Information
Sciences, a provider of software and systems engineering
services to government agencies.
At AIS we work on IT initiatives across several U.S. Federal
agencies including The Departments of Defense, Homeland Security
and Justice. With our work in mind, I thought I would share some
thoughts on the White House's recent Big Data announcement.
While much has been said about the
grand visionsbehind this initiative, my focus in this post is
the usefulness of Big Data in medium-to-large tactical/operational
IT projects.
Big Data is not just for canonical use cases (such as genome
data analysis, video and image analysis), but is equally important
for Federal agencies in accomplishing their core missions. The
majority of the applications we build are targeted towards
implementing new (or optimizing existing) business processes, and
even these applications that can generate a lot of data.
Strategic value amid tactical challenges
But there are challenges here. Given that the primary
driver for these initiatives is to improve efficiency (and
compliance), the data analysis part is often an afterthought.
Even in cases where due importance is given to data analysis, the
data collection strategy flows directly from the existing
requirements. For instance, the grain (least count) of our
data sets is governed by the level of drill-down that users have
asked for today.
Similarly, the amount of the historical data that is kept around
is governed by the parameters used for capacity planning. These
decisions are a result of limited resources (such as storage
infrastructure) and the traditionally non-trivial cost of preparing
data for analysis. These costs have arisen because
traditional BI tools require data be organized in well-defined
structures.
Bringing analysis within Federal reach
But now things may change. The advent of Big Data can
bring the tools for arbitrarily large data collection and analysis
within the reach of Federal agencies, even when resource-bound as
discussed above. This is possible through adoption of open source
frameworks such as Hadoop or Storm, commodity
hardware and familiar SQL-like query constructs provided by such
tools as Hive. Using an ODBC database driver for Hive, that
imports results from a Hadoop query into Excel for further
analysis, extends the life and usefulness of the data collected,
and can be done affordably.
The advent of "Hadoop-as-a-service" from public cloud providers
such as Microsoft and Amazon can greatly lower costs as well.
The existence of such cloud solutions means that agencies without a
continuous need for Big Data can use Hadoop on an as-needed basis.
And agencies that cannot move to a public cloud environment for
security reasons can benefit from the
community cloud-based Hadoop-as-a service offerings.
A private sector example
The CIO for travel services provider Orbitz decided to harness
data that was going uncollected and unanalyzed. He
initiated a big data strategy that allowed Orbitz to keep
the logs of user activity indefinitely (prior to this initiative,
logs were kept only for fixed number of days). This change caused
the collection volume to grow from 7 TB to 750 TB. However, big
data techniques made it possible for Orbitz not only to manage this
data volume but turn it into key insights about their
customers.
A public sector commonality
AIS believes that Federal agencies can apply similar Big Data
techniques to increase insight, by harnessing currently-uncollected
data. For example, financial agencies can use Big Data to improve
fraud detection. Similarly, law enforcement agencies can improve open-source
intelligence collection and analysis.
Hopefully the spotlight on Big Data as a result of the recent
White House announcement will encourage Federal agencies to take
notice.