The blogosphere is ablaze with commentary following a major outage to AWS (Amazon’s cloudy elastic computing infrastructure-as-a-service) last week. The impacts to enterprises that utilise this service made the New York Times and polarised opinion across the web. Whilst many commentators have highlighted this as an example of the inherent risks of moving critical services ‘to the cloud’, others have focused on the opinion that such outages are inevitable as cloud services evolve, and that the huge number of companies affected merely showed how successful the cloud has become. But these are simple truisms – use of the cloud represents both risk and opportunity. The trick is figuring out the balance between the two for any particular scenario.
Some of those whose services were down (such as foursquare and hootsuite) are agile, web-based startups that simply don’t have the option of not using the cloud – as building their own bricks-and mortar datacentres would have been too expensive (i.e. without the risk there was zero opportunity). For large enterprises considering a switch from in-house to cloud services, the opportunity is less easily defined and, as the details of Amazon’s problems start to emerge, it is apparent that the risks are also pretty difficult to pin down.
As this Gartner blog nicely articulates, architecting for the cloud is no different from designing conventional data centres services in that failures can and will occur and must be planned for. However, the key difference seems to be that for large-scale public clouds many of the design parameters and failure scenarios are unknown to the end-user, or at least difficult to understand.
In the New York Times article, the Amazon interruption is compared to the computing equivalent of an airplane crash: “a major episode with widespread damage …but airline travel is still safer than travelling in a car” (with car travel being equivalent to companies running their own data centres). But as any good student of Freakonomics will know, that depends on how you define ‘safer’. The per-hour death rate of cars and planes are about the same – in actual fact, both machines are equally likely to kill you.





