The U.S. Federal Communications Commission has released a report saying that if the telecommunications service providers — the “phone companies” in older terminology — had followed their own industry’s accepted best practices, then a large percentage of the 911 emergency service outages in last summer’s derecho should have been “avoidable”.
That’s right, 911 emergency service was out for millions of people across a region close to the national capital. Some 3.6 million people had their 911 service disrupted, either completely cut out or at least seriously disrupted, some of them for several days. 77 public-safety answering points (or PSAPs in emergency services terminology) suffered some degradation of service. The FCC reported that seventeen of those PSAPS, mostly in Virginia and West Virginia, lost their 911 service “completely, leaving more than 2 million residents unable to reach emergency services for varying periods of time”. The FCC’s charman Julius Genachowski issued a statement saying, “These failures are unacceptable and the FCC will do whatever is necessary to ensure the reliability of 911.”
The FCC’s full report is available here if you want to read the full details.
During natural disasters, the need for certain uses of 911 service is much higher than normal. But the same disastrous forces are hammering on the telecommunications network, power supplies to network nodes and PSAPs, and other critical infrastructure. Yes, of course the FCC is going to label a 911 outage unacceptable, but there has to be a practical limit to what can be achieved.
Meanwhile, there was some impact on Amazon’s prominent and heavily used cloud services based in the northern Virginia region. Some Amazon Web Services went down for a few hours. But 911 services were down in nearby regions for days.
Keep in mind that 911 is intended to connect every home and business to a nearby PSAP and thereby have emergency services always available a few minutes away.
Cloud computing, on the other hand, is meant to be “out there” in the cloud, distributed, always reachable from those points connected to the Internet right now. The cloud isn’t meant to keep every Internet link up. It’s meant to be reachable from those nodes in the Internet that are up and connected.
Now, as I’ve said before, Amazon clearly explains in several documents how to build a resilient distributed architecture using their cloud resources. And every time there is an outage or just a serious degradation, often in that overly popular US-East geographic region, you can almost hear the frustrated sigh in Amazon’s explanation of what happened (usually EBS storage “becoming stuck” due to replication traffic or scaling instability) and how customers who had followed Amazon’s architectural advice avoided most if not all trouble.
911 is a safety-of-life service, and there will always be disasters large enough to overwhelm it. Cloud computing is not as serious, but in comparison it seems to be doing pretty well.
Learning Tree’s Cloud Security Essentials course tries to keep our expectations realistic.