Fukushima and Amazon – two meltdowns that truly raise the question of when is Bigger no longer Better?
The underlying problem is that Murphy’s Law reigns supreme. No matter who you are, or how hard you try, things WILL go wrong.
If you’re operating at a reasonable scale, the triggering incident is easily manageable and the problem is quickly contained. Once you get too big, the triggering incident cascades, and a full scale disaster results.
We saw this at Fukushima; the initial problem was only at one reactor. If it had been on a site by itself, the problem would have been ugly but containable. But because there were six reactors all at the same site – explicitly for economies of scale – the explosion of one reactor caused problems to the adjacent reactors and infrastructure, and now we have a Level 7 nuclear incident.
Although the cause of the Amazon outage is not yet known, I’m sure it will have a similar pattern. The triggering incident was probably minor, and in a smaller scale operation would have caused only a brief outage for a few customers. Because of the scale and interconnectedness of Amazon’s operation, this minor problem cascaded into a major outage for many customers.
There ain’t such thing as a free lunch. Bigger does bring economies of scale, but these economies come at an inevitable cost: bigger, messier, much more expensive disasters.
What’s the lesson for those of us toiling away in IT?
The Amazon’s incident should not be taken as an indictment of the entire Cloud Computing concept. It’s not as though you can guarantee 100% uptime by running everything in your own datacentre. Nothing we humans do is 100%.
The trick is finding the right scale for your particular situation. If you’re creating a Web2.0 consumer-facing web service, a true computing-utility vendor like Amazon is definitely the right answer. Just be sure to have your own separate limp-along solution and data backup.
But if you’re a typical mid-size industrial or financial company looking for greater reliability and more IT flexibility, a private cloud service that can be tailored to match your own disaster recovery plans is the right option.
At Radiant we operate at this mid-scale level. Instead of a huge farm with 1000’s of machines, we deploy small independent pods of machines and associated storage. Data backup is done in a traditional way, using the same enterprise backup software you probably already use. Because of the smaller scale we can craft the appropriate solution for each customer’s specific budget and reliability requirements, from a few virtual servers in a single datacentre, to dual virtual datacentres in geographically dispersed datacentres.
When we have an issue with a pod, and despite our best efforts we are not immune, only a few dozen customers are typically affected. Our techs have the time to restore services quickly and thoroughly and respond to each customer directly.