A few weeks ago, Amazon announced its new Glacier service. It sure sounds like a great service, why aren’t we hearing more about it?
If you haven’t heard about it yet, Glacier is a cloud based archiving service. It shares the underlying design of S3, which is intended to provide 99.999999999% durability. However, it costs just a fraction as much, US$ 0.01 per gigabyte per month in most geographic regions.
I am really surprised that I am not hearing more about this!
Let’s distinguish ”availability” from ”durability”. Availability is the percentage of time you can get your data when you say ”I must have it right now,” while with durability you are saying ”I need it, but I can wait a little bit.”
Of the CIA information security triad of Confidentiality, Integrity and Availability, we can put solid numbers on the first two. We can calculate the work it would take for an attacker to break a cipher, reading our no-longer-confidential data. We can also calculate the work it would take for an attacker to find a hash function collision, setting up undetectable violations of our data integrity or masquerading as a legitimate server or source of patches.
However, confidentiality (and thus durability) is different. We don’t have cryptographic tools like ciphers and hash functions, therefore we don’t have math, and so we can’t have numbers.
Amazon’s number of 99.999999999% durability per archive isn’t a promise, and it isn’t anything they could rigorously prove. It’s the result of carefully analyzing past failure rates of individual storage components along with the design in which they are combined (mirroring across separate facilities, based on top of RAID). Major cloud providers like Google and Amazon have vast collections of media lifetime statistics, so while those numbers are estimates, I don’t know who would be better suited to making the estimates.
How can they offer Glacier at just one tenth the cost of the technically similar S3?
Glacier is intended for archival storage. Replace backups, by storing data there and possibly never retrieving it. Or, for storing very infrequently used data. ”We may need some of this again some day, who knows.”
Retrieval takes three to five hours, typically, and you start incurring I/O charges when you retrieve more than 5% of your stored data per month, or when you delete archives within 90 days.
Companies need to look at Glacier seriously. Investigate your current tape-based backup. Tape isn’t magic! What percentage durability can you really expect with your system?
Learning Tree’s Cloud Security Essentials course discusses using the cloud for archiving your in-house data.
I wrote about the SHA-3 decision last week, thus Integrity. This week is about Availability. Next week: Confidentiality.