The problem, which is explained clearly in IEEE Spectrum, seems to stem from unique identifying codes (primary keys) assigned to each call to keep track of it through the system. On the night of the failure the system hit a pre-set limit for these codes and wouldn’t issue any higher (this sounds like a roll-over bug – as a variable reaches the highest number it can store). Unfortunately this problem was compounded by the failure of the failover and monitoring systems, designed to keep the system running in an outage such as this.
In what the Federal Communications Commission called a ‘terrifying’ example of software failure, on April 9, 2014 the 911 emergency telephone system in Washington State and Oregon shut down just before midnight, leaving hundreds of callers unable to contact police, ambulance, or rescue services.
A recently released investigation by the Federal Communications Commission eventually revealed that 911 emergency system failure affected not only Washington State and Oregon, but also 81 emergency dispatch centres in “California, Florida, Minnesota, North Carolina, Pennsylvania and South Carolina”, and that up to 6,600 emergency 911 calls went unanswered during the two hour long outage. In total up to 11 million people were potentially at risk of being unable to contact emergency services.
Luckily law enforcement believes that nobody died as a result of being unable to contact the emergency services, but clearly as well as being a significant failure in its own right, this system highlights our reliance on modern telecommunications systems and the dangers that failures pose. No wonder, then, that telephone networks are often cited as potential targets for cyber-terrorist attacks.
IEEE Spectrum has the full story of the failure.