Courtesy of Alasdair Alan of O'Reilly Radar
Big data isn't just about multi-terabyte datasets hidden
inside eventually-concurrent distributed databases in the cloud, or
enterprise-scale data warehousing, or even the emerging market in
data. It's also about the hidden data you carry with you all the
time; the slowly growing datasets on your movements, contacts and
social interactions.
Until recently, most people's understanding of what can actually
be done with the data collected about us by our own cell phones was
theoretical. There were few real-world examples. But over the last
couple of years, this has changed dramatically. Courting hubris
perhaps, but I must admit it's
possible some of that was my fault, though I haven't been
alone.
The data you carry with
you
You probably think you know how much data you carry around with
you on your cell phone. You'll certainly be aware of it if you've
ever lost your phone, or had it stolen, or it's just plain stopped
working. But there is a large amount of data in the background that
isn't surfaced in the user interface.
We know about what I generally call primary
data: calendars, address books, photographs, SMS messages
and browser bookmarks. These are usually user generated, and we'd
be pretty unhappy if we lost them. There is also
the secondary data that the phone
generates about us: call history, voice mail, usage information and
records of our current and past locations. Most of what I'd call
secondary data is surfaced to us in our phone's user interface. We
generally can't change this sort of information without resetting
the phone to a factory fresh condition; it's generated by the
device for us, it's not something we generate ourselves.
But there is also what I refer to as tertiary
data. This is data that, similar to the examples I
mentioned above, is generated about us, rather than by us. Mostly,
this data consists of cache files - data that is entirely necessary
to you using the device, or significantly improves your user
experience, but you don't necessarily know is there. At least until
some hole is found in the operating system to expose that data
layer to you. That's
happened before, after all.
An obvious example is tucked in your photographs. Every picture
you take is geotagged and date stamped, and if you publish your
pictures to a photo-sharing site without stripping that
information, you're leaking data. Back in 2007, when
geotagged photographs of newly arrived helicopters at a U.S. Army
base in Iraq were published to the Internet, they allowed
insurgents to determine the exact location of the helicopters
inside the compound and conduct a mortar attack. Four of the AH-64
Apaches on the flight line were destroyed in the attack.