To only slightly paraphrase the original: Stable clouds are all alike, every unstable cloud is unstable in its own way.
Information assurance considers three major aspects of information security: Confidentiality, Integrity, and Availability. Availability is about keeping the information around. The concept is pretty simple. But you can always dig a little deeper into the details.
Durability is the real concern for most of us — is our data safe? Can we get it back? This is what Amazon plays up. S3 and Glacier are designed to provide average annual durability of 99.999999999% for an archive of data. This is based on redundant copies of the archive, distributed across multiple facilities and on multiple devices within each facility. Continuing systematic checks ensure that the copies are identical, and the system “self-heals” by rebuilding any individual copy no longer agreeing with the other copies, of which there are at least two at any time. The design certainly seems like it should keep the data safe.
Availability, on the other hand, is the ability to access that data right now. Glacier is intended for long-term archival storage, and they don’t talk about availability on its main page. For S3, however, they say it should provide 99.99% availability over a year. That is, while you could expect to almost always get your data eventually, it’s just 99.99% of the time that you should expect to be able to access it right now. Doing the math, that’s all but about 53 minutes out of the year.
So many Amazon cloud outages trace back to EBS instability, as EBS technology seems to underlie both S3 and Glacier storage as well as EC2 compute platforms, database services, and more. As I’ve mentioned before, EBS stability relies on the logic and the numerical tuning parameters of this vast, never before seen technological experiment on which we’re running so much of our business and government operations.
Microsoft’s cloud problems tend to be errors and oversights on digital certificate maintenance. There was that civilization-threatening outage that took down Xbox Live Halo 4, Karaoke and ESPN apps in late February — along with plenty of other Azure-based cloud services, I’m sure, but we mostly heard about the games and entertainment. What happened in that one was that a digital certificate’s expiration made Azure storage inaccessible.
At least 2013 isn’t one of those pesky Leap Years. Last year Azure had a end-of-February outage blamed on certificates appearing to be invalid because of “a time calculation that was incorrect for the leap year”.
Memo to Microsoft: run cal 2014 on one of those Linux instances you run in the Azure IaaS pool. Highlight the last week of January, annotate it with “REVIEW CERTIFICATES”, and post copies everywhere.
Numerical parameter tuning on the one hand, digital certificate maintenance on the other — it’s always something! In Learning Tree’s Cloud Security Essentials course we discuss how the cryptographic tools used for confidentiality and integrity let us make meaningful estimates of risk, but with availability we have to follow best practice and hope for the best.