Fear and Loathing in the Cloud

We were somewhere around Baltimore on Amtrak’s northbound Acela Express when the drugs began to take hold. Mind you, it was only ibuprofin, so the sky was not full of what looked like huge bats, but still… (See here if you don’t get the allusion)

I had broken my shoulder badly and undergone surgery. That arm was immobilized in a padded sling. I was making up for some lost weeks of work by teaching two courses in row for Learning Tree — one in the Washington area followed by one in New York, hence the train. I could type OK and teaching worked fine, although I had continuing pain and just one useful arm.

I checked the news on my smart phone as the train rolled north. The news and weather app lists the top ten items in each category. There it was, in the ten most important things happening in the U.S. (as selected by the Yahoo! news teams), another Amazon Web Services cloud outage causing a significant disruption.

I made a note to research the event later. If the outage has a big impact it might make a nice example for Learning Tree’s Cloud Security Essentials course. But for now I’ll watch the scenery until it gets darkand then read my book. Once I arrive I’ll have to make my way some three miles souththrough Manhattan, more than the usual chore by subway with a broken arm.

Things were a little more chaotic than usual at New York’s Pennsylvania Station. It was the typical Friday evening rush plus a little more, some slow-moving lines getting MTA cards. Some people get part way through the line and then wave their arms in frustration and leave.

Ah, there’s the problem. The city-wide network of machines selling cards for the subways and buses are temporarily unable to process credit card transactions. In our increasingly cashless society, in one of the world’s busiest transit systems, everyone suddenly must pay cash. People bailing out of the line have an anagrammed frustration, leaving the MTA line for the ATM line and then returning to the MTA.

About 5.5 million rides every weekday through 468 stations along 209 miles of track, plus who knows how much more on the buses, and while the system isn’t crippled it effectively has its arm in a sling. But… Life goes on. People got cash, bought their transit cards, made their way home. Despite the enormous number of people slightly impacted, this minor outage made no top-ten news lists.

I did that research later. Amazon’s problem was completely unrelated to the MTA credit card problem. But for those impacted, this was yet another problem limited to one Availability Zone in the US-East-1 region. As I have pointed out here and here, Amazon keeps telling us how to take advantage of their resources to design resilient architectures, but people keep ignoring them. And we’re often unrealistic in our demands for cloud availability — used carefully, it’s better than most in-house solutions.


Finally, just how important was this “top ten” urgent news item? Some popular blogs and entertainment sources were unavailable for two and a half hours on a Friday morning. Look at the scenery. Read your book. Life goes on.

Bob Cromwell

