What’s your backup system to prevent a Sidekick data disaster?

By

Boy, am I glad I didn't have a Sidekick last week. But I'm equally as glad I know something about data protection.

In case you've missed the news this week, T-Mobile initially notified its customers who have a Sidekick phone that they may have "almost certainly" lost all their data, like contacts, because of the failure of a data center run by a Microsoft-owned company that stored that information.

While the Sidekick is generally aimed at text-happy teens, this post appeared on a T-Mobile forum from a business user:

"I lost everything. Business contacts, personal contacts, international contacts, I understand things happen but Sidekick should have thought of a backup plan before this happened. Losing business contacts could not have happened at a (worse) time ... No sales means no money, no numbers mean no contact."

Microsoft initially blamed a server failure, and later said it would be able to recover "most, if not all" of that data through an "incredibly complex" data recovery process. But the damage, at least from a public perception point of view, was done.

Some pundits, like Barron's Eric Savetz, wrote, "You have to wonder if the high hopes about cloud computing just suffered a mammoth setback." I think this is a bit drastic, Eric, this really was not a setback for cloud computing; it was a setback for people who simply didn't think things through and assume the worst could happen.

From what I understand, the servers did not fail because they were in the cloud. They failed for an as-yet undetermined reason. The real failure, if you take Microsoft's explanation at face value, is in the design of the backup system, which allowed one server failure to adversely impact access to both the main and backup databases.

Sorry, but that's just bad planning, especially when the data from thousands of people is at stake. They may forgive you for a server failure...that stuff happens. But when you've designed a system that also takes down the backup database that's supposed to be capable of getting the missing data back quickly ... uh, that points to inadequate planning or a lack of testing around business continuity and disaster recovery (BC/DR) for the critical data stored for this application.

Good planning for BC/DR involves a strategic, holistic approach, and not merely tactical "what ifs." If you're outsourcing your data protection to a third party, you have the right (in fact, the need) to ask the tough questions, like:

  • Where is my data being stored?
  • If you have multiple data centers, how far apart are they separated?
  • What kind of protection do I have when, not if, the primary site goes down?
  • What systems do you have in place to allow me instantaneous access to my information?
  • Do you have a BC/DR plan in place?
  • More importantly, when was the last time you tested it?

At the same time, though, we, as consumers of these services, have responsibilities as well; we simply can't throw up our hands and assume that the other guy will take care of everything.

As a PDA owner, I regularly back up my data, with software that creates a parallel database on my desktop computer...that I back up to our corporate servers...that are themselves backed up.

In other words, I assume that there will be a failure at some point, and I've taken steps to prepare for it by creating multiple copies of the data I know I'll need, along with ways of getting it back quickly when it happens. Those copies may reside onsite, or in the cloud. I don't care. I know I need my data when I need it. And so do you.

Realistically, the Sidekick fiasco isn't a failure of the cloud; the cloud is merely the delivery mechanism. Rather, it's a hardware failure that points to possible improper planning and the regular validation of a solid BC/DR program surrounding mission-critical data. The lessons we take from this situation may help you the next time you're counting on your data to be there ... and it isn't.