Tertiary data: Big data's hidden layer

Courtesy of Alasdair Alan of O'Reilly Radar

 

Big data isn't just about multi-terabyte datasets hidden inside eventually-concurrent distributed databases in the cloud, or enterprise-scale data warehousing, or even the emerging market in data. It's also about the hidden data you carry with you all the time; the slowly growing datasets on your movements, contacts and social interactions.

 

Until recently, most people's understanding of what can actually be done with the data collected about us by our own cell phones was theoretical. There were few real-world examples. But over the last couple of years, this has changed dramatically. Courting hubris perhaps, but I must admit it's possible some of that was my fault, though I haven't been alone.

 

The data you carry with you

You probably think you know how much data you carry around with you on your cell phone. You'll certainly be aware of it if you've ever lost your phone, or had it stolen, or it's just plain stopped working. But there is a large amount of data in the background that isn't surfaced in the user interface.

 

We know about what I generally call primary data: calendars, address books, photographs, SMS messages and browser bookmarks. These are usually user generated, and we'd be pretty unhappy if we lost them. There is also the secondary data that the phone generates about us: call history, voice mail, usage information and records of our current and past locations. Most of what I'd call secondary data is surfaced to us in our phone's user interface. We generally can't change this sort of information without resetting the phone to a factory fresh condition; it's generated by the device for us, it's not something we generate ourselves.

 

But there is also what I refer to as tertiary data. This is data that, similar to the examples I mentioned above, is generated about us, rather than by us. Mostly, this data consists of cache files - data that is entirely necessary to you using the device, or significantly improves your user experience, but you don't necessarily know is there. At least until some hole is found in the operating system to expose that data layer to you. That's  happened before, after all.

 

An obvious example is tucked in your photographs. Every picture you take is geotagged and date stamped, and if you publish your pictures to a photo-sharing site without stripping that information, you're leaking data. Back in 2007, when  geotagged photographs of newly arrived helicopters at a U.S. Army base in Iraq were published to the Internet, they allowed insurgents to determine the exact location of the helicopters inside the compound and conduct a mortar attack. Four of the AH-64 Apaches on the flight line were destroyed in the attack.

Posted at 14:07

0 Comments:

Post a comment

Want to try MATCHCITE MDM?

FREE VERSION

Or contact us for details on our Proof of Concept Program...

Proof of Concept Enquiry

Archive