Dispersal: Smart Economics

Dispersal: Smart Economics

Why RAID & Replication Fail

RAID schemes are based on parity, and at it’s root, if more than two drives fail
simultaneously, data are not recoverable. The statistical likelihood of multiple
drive failures has not been an issue in the past. However, as systems grow to
hundred of terabytes and petabytes, the likelihood of multiple drive failure is now
a reality. 

As a result, enterprises address the data protection shortcomings of RAID by
using replication, a technique of making additional copies of their data to avoid
unrecoverable errors and lost data. However, those copies add additional costs,
typically 133% or more additional storage is needed for each additional copy,
after including the overhead associated with a typical RAID 6 configuration.

As storage grows from the terabyte to petabyte range, the number of copies
required to keep the data protection constant increases. This means the storage
system will get more expensive as the amount of data increases.

RAID & Replication Raw Storage Requirements

(Storage System Expansion is the multiplier to calculate the Raw storage required.)

Realizable Cost Savings with Dispersal

When comparing the raw storage requirements, it is apparent that both RAID 5 and RAID 6 require more raw storage per terabyte as the amount of data increases. The beauty of Dispersal is that as storage increases, the cost per unit of storage doesn’t increase while meeting the same reliability target. 

Dispersal Raw Storage Requirements

In this example, the Storage System expansion remains constant at 1.6, meaning, the raw storage is only 1.6 times the usable storage regardless of the total storage.

Cost Comparison Example

To translate into real world costs, here’s an example of storing 1 petabyte, with six nines of reliability (99.9999%). This also assumes the cost per gigabyte is a commodity and the same for either a RAID 6 and replication or Dispersal solution, and set at $2.75.[i]

1 Petabyte Scenario RAID & Replication Dispersal
Usable Capacity 1 petabyte 1 petabyte
Raw Storage multiplier 2.67 (replicated 2 times 1.6
Cost / gigabyte $2.75 $2.75
Total cost $7,342,500 $4,400,000
Cost savings   $2,942,500

 

It quickly becomes apparent that an organization can save millions of dollars in the petabyte range, and that dispersal is 40% less expensive.



[i] This is a hypothetical price per gigabyte. Based on current research, this would be highly competitive in the market. Readers may adjust math by using a different price per gigabyte as desired.