Monday, July 4, 2011

The 10 worst cloud outages (and what we can learn from them)

Sending your IT business to the cloud comes with risk, as those affected by these 10 colossal cloud outages can attest
"I'd like to apologize to you, our customers and partners, for the obvious inconveniences these issues caused," Dave Thompson, corporate vice president for Microsoft Online Services, wrote in a blog.



Best Microsoft MCTS Training – Microsoft MCITP Training at Certkingdom.com

"I'd also like to apologize for the obvious inconvenience of having to speak 15 syllables every time you say our service's ridiculous name," he probably should have added.

Colossal cloud outage No. 7: The Salesforce slipup. An hour of downtime may not sound like much, but when your company holds the keys to the customer service operations of tens of thousands of businesses, more than a few of those organizations are bound to view those 60 minutes as a lifetime.

Salesforce.com learned this the hard way when its data center shut down last January. Just four days into the new year, Salesforce.com reported a full-on failure -- meaning services, backups, the whole nine yards were kaput.

Annoying? Absolutely. Surprising? Not entirely.

"The reality is that cloud-based data centers -- guess what? -- they go down, too," says Tim Crawford, chief information officer of All Covered, a division of Konica Minolta. "That has always been the case and will always be the case. We have to be realistic about it."

Crawford says successful cloud computing requires a different mind-set than traditional server setups: It's up to you, he suggests, to decide whether your business's data can endure occasional downtime -- and if not, to make sure your configuration has the resiliency needed to avoid it.

"When you pick a cloud provider, you need to do your homework to understand how they're providing those services and if they're able to build a level of redundancy as good or better than what you're able to do on your own," Crawford says. "If the answer is no, then why are you using them?"

Colossal cloud outage No. 8: Terremark's terrible day. These days, Terremark may be making headlines for its billion-dollar Verizon deal, but in early 2010, an extended outage dominated the cloud provider's coverage.

Terremark's luck turned sour on St. Patrick's Day, March 17, 2010. The company's vCloud Express service took a nosedive that day, with a Miami-based data center going offline for about seven hours. Users were unable to access data stored in the center for the entire period.

Not to get overly redundant, but this brings up the value of redundancy -- having your crucial data available on multiple servers in different data centers or, even better, different regions. You could also take the extra step of spreading it among different providers as a failsafe.

"You can pick a series of vendors to host a workload -- one as a backup or two as a backup, and then another as your primary," suggests Harold Moss, chief technology officer of IBM's Cloud Security Strategy program. "You can then implement your workload there in a secure manner, with the appropriate security, and start to introduce your resiliency capabilities."

Colossal cloud outage No. 9: The PayPal fall-down. Want a cloud outage with some seriously wide-reaching impact? Try taking PayPal offline for a few hours.

This is no hypothetical exercise: PayPal fell for real in the summer of 2009, leaving millions of merchants around the world with no way to sell their stuff. The service was completely unavailable for about an hour and remained spotty for several more. PayPal said hardware failure was to blame.

It's a rare kind of outage, no doubt -- but with all the sales lost, this unfortunate interruption easily earns a spot in cloud computing's hall of shame.

Colossal cloud outage No. 10: Rackspace's rough year. When you provide cloud services to Web presences like TechCrunch and Justin Timberlake, you'd better believe people are going to notice when your servers stop working.

Rackspace learned that lesson a few times in 2009. The cloud provider suffered four high-profile failures throughout the year, adding up to hours of offline time for the company's customers. One blip was bad enough that Rackspace had to pay out nearly $3 million in service credits to its users.

Rackspace called the incidents "painful and very disappointing" and promised to "execute at a high level for a long time" after. Today, the company continues to focus on uptime but also works to help users plan for the inevitable turbulence that comes with life in the cloud.

"If you want to cluster a server or build geographical redundancy, it's easier to do now than it ever was before, but you have to actually take those steps," says Rackspace's Lew Moorman. "The cloud doesn't bring inherent weaknesses that weren't present if you did things in-house before."

All considered, the biggest lesson here may be that no single server, center, or service is 100 percent reliable. If you don't build your business with that in mind -- well, my friend, you're just walking around with your head in the cloud.

No comments: