When Will People Learn How To Safely Use Amazon Web Services?

Another month, another minor AWS outage, and people still don’t get it.

Starting about 11 AM on October 22, 2012, a small percentage of the EBS (or Elastic Block Storage) volumes within a single availability zone in the US-East-1 region began to suffer degraded performance. The problem degraded further with ripple effects on services that rely on EBS, including ELB (Elastic Load Balancer), RDS (Relational Database Service) and Elastic Beanstalk.

The Internet howled in rage, as “essential services” like reddit, Pinterest, imgur, foursquare, and some of Netflix were suddenly unavailable.

Oh, the humanity!

Civilization somehow managed to survive being unable to share funny pictures or watch movies in the middle of the day for a few hours. Let’s gather our scattered wits and look at this calmly.

“The Cloud” did not fail. Some of the systems within one Availability Zone within one geographic region failed. That’s it.

Some customers still managed to be hurt. To do that, they had to ignore Amazon’s repeated explanations of how to design reliable architectures. They also had to ignore what people throughout the community urge as best practice. “No, let’s just put all of our eggs into one basket”, they said.

Here’s an overview of how to design a much more reliable cloud architecture:

At the very least, use multiple Availability Zones. Many of the major outages have been limited to a single AZ. Better yet, use multiple geographic regions. The Internet backbone has a lot of bandwidth. When one region begins to be degraded, failover to another region and elastically grow your resources there.

See Amazon’s Architecture Center and especially their white paper Architecting for the Cloud: Best Practices for more details.

Notice the publication date on that white paper — January, 2010. This is not new information! Also notice that their detailed descriptions of past outages make the need for geographic diversity clear, including those in US-East in April, 2011 and June, 2012 and in EU-West in August, 2011.

The problem is that so many people are so slow to get the message. “But it costs more to have redundancy”, they say. Yes, that’s true, whether you’re in the cloud or not.

Learning Tree’s Cloud Security Essentials course shows how availability is different from other types of information security, but no less important. And, if you use the cloud carefully, it can likely provide far more reliability than in-house solutions for many customers.

Bob Cromwell

Type to search blog.learningtree.com

Do you mean "" ?

Sorry, no results were found for your query.

Please check your spelling and try your search again.