Courtesy of the A Register
Is HP pulling the deduplication wool over our eyes by claiming
its dedupe box can run at 100TB/hour while EMC's best rate is
31TB/hour? Should a 4-pool dedupe system realistically be compared
to single pool design?
Yesterday at the HP Discover event in Las Vegas, the company announced its Store Once Catalyst
software and B6200 disk array combination could ingest data at up
to 100TB/hour while Data Domain's 990, announced just two weeks
ago, ingests it at 31TB/hour. Ergo, HP is three times faster - and
Data Domain sucks.
But the B6200 is actually formed from couplet building blocks:
two controllers or nodes in a high-availability configuration with
their own storage and a single deduplication index.
It expands to an 8-node system by aggregating four couplets in a
cluster, using a 10GigE interconnect and Fusion Manager to control
the 8 nodes/4 couplets as a single system with a single namespace,
but four separate deduplication indices.
There is no global deduplication across a B6200 cluster.

EMC's Mark Twomey, technical director in the office of the CTO
for Backup Recovery Systems, told us: "I don't get how HP can call
it scale-out when those are four separate dedupe pools. That
[100TB/hour] number is from four 2-node systems, isn't it? Yes they
have one manager, but it's still four systems. If I get a manager
can I compare four Data Domains?"
The B6200/Store Once's speed per deduplication index or realm is
25TB/hour. With the Catalyst software, which gets 60TB/hour of
dedupe done on the source servers leaving 40TB/hour for the B6200,
it is 10TB/couplet and 5TB per node.
A Data Domain 990 runs at 15TB/hour when Boost is taken out of
the equation. Its raw dedupe speed is faster than that of a B6200
couplet and there is a single dedupe index. Ergo, based on a single
dedupe index Data Domain's 990 is 50 per cent faster than a base
B6200 configuration. Ergo, HP sucks.
El Reg suspects this difference is because the B6200 uses
an older Intel processor than the newer DD 990.
HP marketing veep Craig Nunessays an 8-node B6200 is a single
system because it is managed as one and has a single namespace. The
single namespace is segmented into four individual namespaces, one
per couplet, and, he says, "next year I could do a firmware update
and change that".
Pooling resources
Will the B6200 get a global deduplication pool next year then?
Nunes declined to comment.
Interestingly, the Sepaton S2100 ES2 deduplicating system that
HP resells is (like the B6200) an 8-node system, supports
Symantec's OST interface, and runs at a 43.2TB/hour ingest rate
into a global deduplication pool.
That global pool probably means that the ES2 dedupes more
effectively than HP Store Once. Also, the ES2 is a bit long in the
tooth and is likely to get a speed bump via a processor
refresh.
This global dedupe capability across an ES2 cluster should
ensure the HP Sepaton reselling relationship remains in place, at
least until the B6200 gets its own global dedupe capability. When
and if that happens then characterising an 8-node B6200 cluster as
a single deduplicating system will be more legitimate.
In the meantime it is justifiable to define the B6200 as a
single system so far as management and overall name space is
concerned, But HP is stretching the point to call it a single
deduplicating system when there are four separate deduplication
realms inside. ®