Capacity Optimization- A Core Storage Technology
Date: Monday , May 01, 2006
The past 3 years have seen rapid and significant shifts in how storage solutions approach capacity management. New archiving and backup solutions are now achieving massive levels of efficiencies, far beyond any technology in history. To date, the industry has not collectively agreed on a common descriptor for this extremely critical enabling technology—Capacity Optimization (CO), and the various storage solutions stemming from this technology termed as Capacity Optimized Storage (COS).
Unlike traditional compression techniques that may provide 2x benefits, CO techniques can reduce the storage requirements of content by several magnitudes, in some cases, delivering in excess of 20x compression. CO technologies are seeing adoption today primarily in disk-based data storage and enterprise networking industries where the benefits of reduced capacity and network utilization is most beneficial. Capacity Optimization will play a foundational role in storage system architecture throughout this decade, and its principles therefore merit detailed analysis.
What is Capacity Optimization?
Capacity Optimization (CO) is a new technology designed to massively reduce data down to its raw essentials. Unlike traditional compression which typically reduces data to half of its previous size, capacity optimization can reduce standard business data down to a twentieth or less of the original size. This is achieved by breaking the data into a small number of fundamental parts which when replicated can be used to rebuild the original data. This technique is being used both in storage devices and also networking devices to build much more cost-effective systems.
For example, a storage device that utilizes CO can store 20x as much data as a standard one at the same cost. A capacity optimized network can transmit 20x as much data as a non-optimized one again all for the same price. This order of magnitude cost reduction in capacity optimized technologies vs. standard ones is ensuring that all future system designs will at some point implement this technology.
A Lexicon for Capacity Optimization
In order to discuss compression and capacity optimization it’s useful to define some simple terms that will be used throughout this article:
Object – A data container such as a file or network transmission.
Part – A piece of an object containing a fixed or variable amount of data.
Plan – A plan that shows how to assemble parts into an object.
Optimized Store – A location for storing parts and plans
CO Versus Compression
Regular compression examines the data in an object for repeating patterns or extended runs of the same data. For example, an object which contains a thousand zero’s one after the other could be represented by two numbers 1000 0, rather than a thousand separate elements. These two numbers could then be stored or transmitted over a network much more efficiently than the one thousand individual zeros. Likewise, the patern 123412341234123412341234 could be stored as the five numbers 6 1 2 3 4 (six times one two three four) rather than the twenty-four original digits (almost a five fold compression).
Capacity Optimization Explained
At a basic level, capacity optimization works in a very similar manner to regular compression technologies. Objects are broken down into parts of either fixed or variable sizes. A plan is constructed to show how to build the parts back into the original object. The parts are then compared to see which are unique. Any non-unique part is discarded resulting in immediate compression. The unique parts and the plan showing how to assemble them back into the original object are then stored.
By analogy, consider a house made out of children’s building blocks, as illustrated below.
This analogy demonstrates the first principle by which CO technologies approach the storing of blocks of data. Because they maintain maximally granular plans for all objects, they only need one instance of any given object.
CO technologies achieve increasingly higher levels of efficiency as more objects are added. This is possible because we have an optimized store of the minimum number of unique parts and plans required to assemble those parts back into their original objects.
Capacity Optimized Storage (COS)
The primary emerging market utilizing CO today is the data protection marketplace. In this market, corporate data has traditionally been stored on tape rather than disk. Although disk is widely recognized to keep data in a manner that is safer and more accessible than tape, until now it had been too expensive to use for longer-term data protection, due to cost differentials making tape about 1/20th the cost of disk storage solutions. By utilizing CO, new generation of solutions are able to provide a massive savings over regular disk based storage, bringing them into price equivalence with more traditional tape solutions.
This savings occurs because when data is migrated to the capacity optimized solutions; only the changed and unique new data (typically only 5 percent of the data stored) needs to be stored on the disks.
CO Players Today and Tomorrow
Today, we see CO technologies in a range of emerging companies. They all leverage unique intellectual property, but they achieve single-instance storage or transport through the principles of Capacity Optimization. In the disk backup and archiving realm, this list includes ADIC/Rocksoft, Archivas, Avamar, Data Domain, Diligent, Symantec, and Permabit. Amongst established players, HP’s RISS solution also utilizes CO technology for disk archiving. In the areas of network transport, we see this technology in use by companies in the WAN optimization space, including Cisco, Expand, Juniper Networks, Orbital Data, Packeteer, and Riverbed. Over the course of the next 2 years, CO technologies will become a ubiquitous feature in the arsenals of established players for both storage and networking solutions, either through acquisition of smaller companies or internal development efforts. In short, CO is going to happen. Quite simply, there is no other way for enterprises or individuals to deal with the massive data deluge than to pursue these technologies to such ends, and all vendors will have to respond.
Brad O’Neill, Senior Analyst and Consultant, Taneja Group Inc.