Yes, Virginia, You Seem To Have A Problem

The most popular Amazon Web Services region has the most problems.  What is going on?

Amazon’s US-East region is nominally referred to as “Northern Virginia”, although that’s just its main or at least most prominent geographic location. This region includes so-called Edge Locations throughout the eastern United States, including three sites in New York city and facilities as far from Virginia as Miami, Florida; South Bend, Indiana; and Saint Louis, Missouri.

Amazon’s EC2 servers and their EBS storage can — and should, in cautious cloud infrastructure design — be spread across multiple Availability Zones (S3 storage is always distributed across multiple Availability Zones within a region, which provides its greater resilience). The Availability Zones remain abstractions, you can’t tell where a given one is, and it is not at all clear just how much geographic separation they really have.

One problem may be that the US-East region is just too attractive.

It is always the first in any list of AWS geographic regions. First in the list, easiest to select, it is often the default choice.

US-East has a reputation of having more products, although the chart of products and services by region shows that the only things unique to it now are the Simple Email Service, CloudSearch, and Simple Workflow Service. New products appear first in US-East and then migrate to the other regions, so just give those a few months.

It is a little cheaper, but not much. US-West-2 (Oregon) has pretty much the same pricing. US-West-1 (California) and EU (Dublin) are only slightly higher.

Interestingly, all the major service disruptions except for one have been in US-East, and that exception was a catastrophic weather-induced event. The August, 2011 outage in Dublin was caused by a lightning strike on a nearby utility substation that damaged both the substation and Amazon’s power interface and generators. There’s not much you can do about a direct lightning strike.

One of those in US-East was also caused by weather, even more catastrophic. The June, 2012 derecho took out some AWS functionality, something I discussed earlier. As with the Dublin event, I don’t think it’s fair to expect anything to be completely resilient in the face of natural disaster. After all, even 911 emergency services were taken down in northern Virginia.

That leaves us with the non-weather service disruptions. Prominent events were those in
April, 2011 and just this past October. The most recent one takes the usual form, in which some of the EBS volumes in just one Availability Zone became “stuck”, unable to process further I/O requests. These have been caused by an underlying design parameter or coding error, triggered in this case by a DNS configuration error.

I wonder — do these events tend to happen in the first Availability Zone, the default in the pull-down menu in some cases? Are the easy defaults overly stressed by the work loads? As I’ve said before, no one has ever built anything like AWS, and Amazon is figuring things out as they go.

Amazon doesn’t provide further geographic detail beyond “one of our five Availability Zones in the US-East Region.” But Oregon looks pretty attractive to me!

Learning Tree’s Cloud Security Essentials course discusses the importance of availability and how to better take advantage of a distributed cloud architecture.

Bob Cromwell

Type to search

Do you mean "" ?

Sorry, no results were found for your query.

Please check your spelling and try your search again.