A Unfortunate Real Lesson in High Availability and Disaster Recovery
Unless you’ve been living under a rock, you should know by now that a magnitude 8.9 earthquake has hit northeast Japan less than 12 hours ago. It has triggered horrible devastation, not the least of which is the tragic loss of life. It has even triggered tsunami warnings all over the world. My heart goes out to anyone affected and I hope that those who have loved ones in those areas get good news. There is no joke or light fun that can be made of this.
Having said that, I’m going to use this tragedy to beat the proverbial drum: you can never be too prepared for a disaster in IT. Paramount in obviously the safety of humans (you, friends, family, co-workers, etc.) … do not think that I am downplaying that at all. Safety, shelter, food – all at the top of the list. But at the end of the day when things hopefully get back to normal, whether you’re a CEO, network guy, or a DBA, you’ve hopefully got a job to go back and do. As they say, the show must go on. If there’s one thing we are as humans, we are resilient. At some point, we always pick ourselves up, dust ourselves off, and start all over again.
If your two data centers are, say, 10 miles apart in the same geography which was affected by a natural disaster, can your company survive the potential loss of both? If not, what is the contingency plan? Do you have access to your backups (tested, of course)? Are your backups moved offsite or are they now in the steaming pile of rubble that was your data centers? These are the typical types of questions a business must ask itself. If you’ve got nothing to bring back, there is no business. There is no job. I’m not trying to be all doom and gloom here, but none of this is an issue when there is no problem. All of this can, and usually is, catastrophic to a business after the problem occurs. Every corner of the globe has its own natural disasters it needs to account for, be it earthquakes, tsunamis, tornadoes, hurricanes, Nor’easters … you name it, someone, somewhere is affected by it.
A business – and by extension, its IT department – has to ask and answer the really difficult “what if” questions. The Q&A may not be pleasant, and the answers not ideal. Some challenges may not be able to be overcome (such as ones related to cost, for example). That’s better than putting up blinders and assuming you’ll never be in your worst-case scenario. Do you think those who were affected by 9/11 (including those who unfortunately lost their lives) thought that two planes hitting buildings would ever happen? Outside of maybe a few folks, no.
Realize you will never hit every “what if” scenario. There will always be something you didn’t take into account that was unforseen. But if you can cover everything you know about, you’re ten steps ahead of the game.
I will also say this: take technology out of the picture for a moment. Any disaster plan, whether it is evacuating an area or rebuilding a business, is all about process. Process minimizes or reduces chaos. People know where they should be and what they should be doing. You may have the technology to restore a backup, but does anyone know what to do and where to get the tapes? I know process is an ugly word to some, but utterly necessary in not only these situations, but in most situations you encounter – especially in IT. Without process, your IT environment is probably the equivalent of the good old Wild West.
So let’s send out thoughts and prayers out to those affected by this tragedy in Japan, but spend some time doing some reflection on your own situation: are YOU ready if this happened to you?
UPDATE: Paul Randal (blog | Twitter) just posted a survey on disaster recovery, and I encourage you all to go take it. It’s completely anonymous. I am curious to see the results.
I commend you on your posting. This is a hard thing to say when there are so many lives being affected in so many ways. It’s true though, and it needs to be said; Business must continue and communication systems need to be resilient to outages. Software solutions like Neverfail really help to take the effort out of the technical recovery and allows companies to focus more on taking care of their people in time of need.
Josh – appreciate the sentiment. What happened is horrible. However, in full disclosure when I approved the comment, I did see you do work for Neverfail. I have not used your product, so I cannot verify it does what you say it does and I take your last sentence at face value.