Courtesy of Computer World
Everyone is talking about Big Data analytics and
associated business intelligence marvels these days, but before
organizations will be able to leverage the data, they'll have to
figure out how to store it. Managing larger data stores--at the
petabyte scale and larger--is fundamentally different from managing
traditional large-scale data sets. Just ask Shutterfly.
Shutterfly is an online photo site that
differentiates itself by allowing users to store an unlimited
number of images that are kept at the original resolution, never
downscaled. It also says it never deletes a photo.
"Our image archive is north of 30 petabytes of data,"
says Neil Day, Shutterfly senior vice president and chief
technology officer. He adds, "Our storage pool grows faster than
our customer base. When we acquire a customer, the first thing they
do is upload a bunch of photos to us. And then when they fall in
love with us, the first thing they do is upload a bunch of
additional photos."
To get an idea of the scale we're talking about, one
petabyte is equivalent to 1 million terabytes or 1 billion
gigabytes. The archive of the first 20 years of
observations by NASA's Hubble Space Telescope comes to a
bit more than 45 terabytes of data, and one terabyte of compressed
audio recorded at 128 kB/s would contain about 17,000 hours of
audio.
Petabyte-Scale Infrastructures Are
Different
"Petabyte-scale infrastructures are just an entirely
different ballgame," Day says. "They're very difficult to build and
maintain. The administrative load on a petabyte or multi-petabyte
infrastructure is just a night and day difference from the
traditional large-scale data sets. It's like the difference between
dealing with the data on your laptop and the data on a RAID
array."
When Day joined Shutterfly in 2009, storage had
already become one of the company's biggest buckets of expense, and
it was growing at a rapid clip--not just in terms of raw capacity,
but in terms of staffing.
"Every n petabytes of additional storage meant we
needed another storage administrator to support that physical and
logical infrastructure," Day says. With such massive data stores,
he says, "things break much more frequently. Anyone who's managing
a really large archive is dealing with hardware failures on an
ongoing basis.
The fundamental problem that everyone is trying to
solve is, knowing that a fraction of your drives are going to fail
in any given interval, how do you make sure your data remains
available and the performance doesn't degrade?"
Scaling RAID Is Problematic
The standard answer to failover is replication,
usually in the form of RAID arrays. But at massive scales, RAID can
create more problems than it solves, Day says. In a traditional
RAID data storage scheme, copies of each piece of data are mirrored
and stored on the various disks of the array, ensuring integrity
and availability. But that means a single piece of data stored and
mirrored can inflate to require more than five times its size in
storage. As the drives used in RAID arrays get larger--3 terabyte
drives are very attractive from a density and power consumption
perspective--the time it takes to get a replacement for a failed
drive back to full parity becomes longer and longer.
"We didn't actually have operational issues with
RAID," Day says. "What we were seeing was that as drive sizes
became larger and larger, the time to get back to a fully redundant
system when we had any component failure was going up. Generating
parity is proportional to the size of the data set that you're
generating it for. What we were seeing as we started using
1-terabyte and 2-terabyte drives in our infrastructure was that the
time to get back to full redundancy was getting quite long. The
trend wasn't heading in the right direction."
Reliability and availability is mission-critical for
Shutterfly, suggesting the need for enterprise-class storage. But
its rapidly inflating storage costs were making commodity systems
much more attractive, Day says. As Day and his team investigated
the potential technical solutions to getting Shutterfly's storage
costs under control, they got interested in a technology called
erasure codes.
Next-Generation Storage With Erasure
Codes
Reed-Solomon erasure codes were originally used as
forward error correction (FEC) codes for sending data over an
unreliable channel, like data transmissions from deep space probes.
The technology is also used with CDs and DVDs to handle impairments
on the disc, like dust and scratches. But several storage vendors
have begun incorporating erasure codes into their solutions. Using
erasure codes, a piece of data can be broken up into multiple
chunks, each of them useless on their own, and then dispersed to
different disk drives or servers. At any time, the data can be
fully reassembled with a fraction of the chunks, even if multiple
chunks have been lost due to drive failures. In other words, you
don't need to create multiple copies of data; a single instance can
ensure data integrity and availability.
One of the early vendors of an erasure code-based
solution is Chicago, Ill.-based Cleversafe, which has added
location information to create what it calls dispersal coding,
allowing users to store chunks, or slices as it calls them, in
geographically separate places, like multiple data centers.
Each slice is mathematically useless on its own,
making it private and secure. Because the information dispersal
technology uses only a single instance of data with minimal
expansion to ensure data integrity and availability, rather than
multiple copies as with RAID, Cleversafe says, companies can save
up to 90 percent of their storage costs.
"When you go to put it back together, you don't have
to have every single piece," says Russ Kennedy, vice president of
product strategy, marketing and customer solutions for Cleversafe.
"The number of pieces you generate, we call that the width.
We call the minimum number you need to put it back
together the threshold. The difference between the number of pieces
you create and the minimum number required to put it back together
is what determines its reliability. Simultaneously, you can lose
nodes and drives, and you can still get the data back in its
original form. The highest amount of reliability you can get with
RAID is dual parity. You can lose two drives. That's it. With our
solution, you can lose up to six."
Erasure codes are also a software-based technology,
meaning it can be used with commodity hardware, bringing down the
cost of scaling even more.
Building Next-Generation Storage
Infrastructure
"Having identified the right technology, we went and
looked at a number of different vendors who were providing
solutions in that space," Day says. "We looked at building it
ourselves. But we felt that if we could find a company that was a
pretty close match to our requirements, with a system that was
reasonably proven, that would be a much better approach for
us."
Shutterfly brought four vendors to its lab for
evaluation and built prototypes of the storage device it wanted for
its data center. Day says he was looking for performance,
availability, fault tolerance and manageability.
"We have a staff that does nothing but manage our
image archive," he explains. "One of the big concerns in 2010 was
the growth we were seeing in our image archive. We were going to
have to grow our staff relative to the growth of our image archive,
and that wasn't very attractive."
Day says Cleversafe emerged as the best fit for
Shutterfly, mostly based on the company's willingness to work with
Shutterfly to tailor its solution to Shutterfly's needs. The two
companies started going through a series of progressive proofs of
concept, including load and performance tests in Shutterfly's lab.
After Shutterfly was comfortable with the operational and
performance characteristics, it placed a parallel storage
infrastructure in production, directing a copy of all Shutterfly's
traffic to it.
"Every image coming in the door was written to our
legacy infrastructure and the Cleversafe infrastructure," Day says.
"We ran it for six months, including holidays."
The holidays are the peak season for Shutterfly, when
many of its customers create photo books.
Shutterfly brought Cleversafe's storage solution into
full production for its image archive in 2011 and has been using it
as the primary image repository ever since.
The TCO of Erasure Code-based
Storage
"It's fundamentally a software solution, allowing us
to deploy on very, very cost-effective hardware," Day says. "That
changes the whole picture from a total cost of ownership
perspective for us. We have more flexibility dealing with hardware
vendors and can guarantee that we're getting the best possible
price on the drives and the infrastructure that supports them."
Administering the storage pool has also been greatly
simplified, Day says.
"We can basically just add another brick of storage
and it automatically gets added to whichever pool we designate it
for," he says. "Previously, we had to do some fairly interesting
administrative gymnastics whenever we added additional
storage."
Also, now, when a drive fails or goes offline,
Shutterfly's storage infrastructure is able to mark it as
unavailable and route data around it while recovering data on that
drive transparently. Instead of an "all hands on deck" situation
when a drive or a shelf fails, Day says his team can now simply
note the failure and replace the affected infrastructure on a
scheduled maintenance schedule.
"It's allowed us to not grow [our staff] as quickly
as we were previously," he says. "We still do grow, but at a much
slower rate than we did with the previous generation of gear. The
daily maintenance workload has declined. Administrators get to
spend more time on interesting proactive projects. Their workloads
have shifted to what I would call additive work. It's good from a
growth perspective and a job content perspective."
If You Store It, the Insight Will
Come
While Shutterfly is an Internet company that deals
with volumes of data that dwarf what most enterprises today have to
deal with, companies across the board are storing ever-increasing
amounts of data.
"Our archive size in five years is going to look
pretty pedestrian, though we'll still be orders of magnitude larger
than the average" he says. "One of the things that's really
interesting right now is in the last four or five years you've seen
a bunch of applications and technologies enter the marketplace that
make it possible to deal with very large datasets. Those are really
exciting because they allow companies to gain deeper insights into
their business by actually looking at the fine-grained data."
"That's a positive move in the industry," Day says.
"We're just at the very early stages of that coming into play.
Another factor that's pretty interesting is that as businesses do
more with real-time customer interactions, with online, with
mobile, they're also generating just massive amounts of data. It's
now possible to analyze that data for really impactful business
insights. But all of that depends on the ability to store massive
amounts of data and do it reliably."