Even HA Guys Do Dumb Things
Hi again, everyone! It’s been a little bit since my last blog post but I’ve been here, there, and everywhere doing customer work as well as doing a lot of presenting (including my first mutli-day public high availability class in a long time this past week). I can finally take a moment and breathe.
One thing I’m catching up on is a bit of maintenance. As some of you who have seen me know, I have my little powerhouse laptop (Panasonic CF-J10) from Japan. Right now it’s coming up on its birthday. Hard to imagine that I’ve kept a laptop for a year without getting a serious case of upgradeitis, but I have. I have had one nagging problem that I always do – drive space. Even with a 512GB SSD inside, I still carried an external 512GB SSD and another 256GB SSD for a total of 1.25TB of space. This is largely due to the sheer numbers of virtual machines (VMs) that I use depending on the situation and which hypervisor I decide to use (either Hyper-V or VMware Workstation). The only downside to my J10 is that it isn’t truly a 6Gb SATA III bus.
I was very excited when OCZ announced their 1TB SSD quite a few months back. I’m no stranger to being an early adopter of hardware – especially SSDs. When I got my Toshiba R600 back in 2009 or so, it had a 512GB SSD. This is at least 2 years before they were mainstream and “somewhat” reasonable in cost. Toshiba I believe was the only one at the time doing them, and it was easily well over half the cost of the laptop (if not more). Same with the SSD (32GB I believe) in my Sony VGN-G1 in 2006 or 7. The Octane line came out but the 1TB version didn’t really appear until springtime of this year – and the price was stupidly expensive. I mean, even I flinched. A good 512GB SSD will still usually cost in the $700 – $900 range, and I figured double that, and maybe a little more. My 512GB in the Toshiba was about the same, so that’s what I figured. Boy was I wrong! List price on this thing is about $3,000 US, and is generally $2,500 (give or take) street price. Ouch. I can get 1.5TB in three drives for that (albeit not as convenient …). I pretty much decided that I’d sit this one out until prices came down to where I thought they should be. I know I’m weak on this stuff. This past week, I saw that a seller on Amazon had the drive at a price I was willing to pay, and I snagged it. The drive showed up on Friday.
I also usually put on pre-release versions of Microsoft OSes once they are fairly stable. This goes back to at least Vista where I ran it probably for 6 months to a year on my old Sony Vaio SZ, and the same could be said with Windows 7. With the Japanese laptops, this is always an interesting proposition especially since I like to make sure things like the function buttons work. With Sony, most things always worked and once the OS was officially released, everything did. Until now, I haven’t taken the plunge on Windows 8. My disdain for Metro on a non-touchscreen laptop is palpable, but I also know it’s the future. I’ve been using it a bit on Windows Server 2012, and I figured at some point I’d put it on at least one of my Panasonics (I still have the J9 and S9 in addition to the S10 – I really need to sell some!).
Disclaimer: I have this stupid penchant for making non-trivial changes (OS switches, drive swaps) to my laptops before doing something big like my training class this past week. I’ve fought that urge for the past few months because I’ve been on the road and didn’t want to cause myself unnecessary pain and heartache. Keep this in mind. I should have learned a drive change isn’t simple. And boy, this time I almost completely shot myself in the foot.
The original goal was to just change out the 512GB for the 1TB and leave my current Windows 7/Windows Server 2008 R2 dual boot. I have Paragon Partition Manager, so I was just going to clone the drive and change the partition sizes. Oh, if it had been that easy you wouldn’t be reading this blog post. No matter what I tried, I was getting errors. I ran surface tests – the new drive itself was fine. After about five hours of this, I was deleting the partitions (yet again) on the OCZ. Or so I thought. Unfortunately, I must have not been paying attention because I basically blew away my partitions on the 512GB SSD. Cue sad trombone.
My first thought when I knew I was in a bad state: crap – when was the last time I made a backup of any of this stuff and would I need to use it? It was somewhat recently, but not completely up to date. Worst case, I would lose minimal data. My work e-mail is all in Exchange and on a server, so I knew I had that (and since the documentation I do I send to customers and such, a lot of newer stuff would be there even if not in a backup). All was not bleak.
Before you ask, “Hey dummy, why didn’t you make a backup first of your data?” Well, the primary reason is that I was working mainly with the new drive and I wasn’t changing anything on the old and I knew I had a lot of it elsewhere where I could get it or recreate it. So in my risk mitigation, I deemed that low. Since I had been doing things right for hours, the odds of me screwing up were small. Well, I hit the magic window alright. To be honest, I’ve done this a lot in the past so again, the risk from my perspective was minimal.
I stopped and took stock of where I was: how would I go forward, regardless of the backup situation and what I had. In any disaster recovery drill, it’s one thing you need to do. Be calm, have a level head, and make smart decisions. I already did something monumentally stupid. The first decision I made was to take the 512GB out, put the 1TB in, and then put an OS on it just to get started (should have done that to begin with in hindsight). I installed Windows 7, and then made “the” decision – since I basically was starting fresh and we’re at RC, I’ll install Windows 8 – Metro UI warts and all. No time like the present. I got my OS up and running, and then installed Windows Server 2008 R2 for dual booting. I even got most of the Panasonic utilities working under Windows 8 – the only thing that isn’t working at the moment is the touch pad which is a shame because of the way the rounded touchpad should work. But the important stuff – function keys changing brightness, audio volume, selecting display for presentations – all work. Most of the hardware (sound card, display, USB 3) had drivers built-in. This was arguably better than the Windows 7 experience. In the roughly 24 hours since installing Windows 8, I still dislike Metro but have found ways to work around most of the stuff I hate and overall, it’s much faster than Windows 7 on this laptop. I’m surprised by it because Windows 7 was pretty zippy. If I didn’t bring up a menu to select between the two OSes (8 and Windows Server 2008 R2), it would boot even faster. This switch has also gotten rid of a nagging issue I had with the Crucial M4 where it would randomly spin up I/O for a minute or two (usually right after logging in, but sometimes before) and I’d have to wait until I could do anything on the laptop.
However, at this point was the difficult part: where the hell was I going to get my files from and what, if anything, would I lose? Paragon has an Undelete Partition feature. Fingers crossed, I ran it on my 512GB drive. The thing felt like it ran for 10 years. When it was done, I was able to get back two of the three partitions – my old Windows 7 with most of my files (so no data loss there – yay!) and the one with my Hyper-V VMs and various installs I use when using VMs. What I did lose completely was my Windows Server 2008 R2 partition and any of the recent scripts I had written for Windows Server 2012 (including my TechEd ones and one I was working on to change the binding order of the NICs). I lost a few other files, but in the grand scheme of things, I got very, VERY lucky. It could have been much worse than things turned out.
Moral of this story: even HA guys have bad days and do stupid things to their own equipment or installs where I felt “invincible”. Had this been a customer environment, I would have definitely made a backup before doing anything, but I decided to be a bit more reckless with my own setup and I paid the price. I was fortunate. Not everyone is in these scenarios. At least I’m honest enough to admit my mistake. Outside of some of the scripts, I may need to recreate some of my VM environments but I enjoy that so it’s not a big deal. I needed to reconfigure my multi-site cluster demo anyway.
Last night none of this was funny, but now I can look back and chuckle, roll my eyes, and proverbially slap myself for falling into the trap many of us do. Although to be fair, things, while they were not working, were not going poorly all night. I needed to find a way to get from A to B, and my mistake was a completely honest one. I selected the wrong thing once and all hell broke loose.
EDIT: As a friend pointed out on Twitter, I had a D/R plan that worked. To a degree, that’s very true. I knew what I had to do, where I could get things, from, etc. Can you say the same thing? If not, you would really be in trouble in this scenario. I’m being very critical of myself because I did make a super dumb mistake.
Going forward and doing my own post-mortem, I’m going to take my own advice and look for a way to not have this happen again. It will probably involve putting things like decks and scripts up on the SQLHA SharePoint site Ben and I use for our business stuff. Lesson learned. I may run a program I’ve used in the past to retrieve files from a pretty messed up SD card to see if I can get the remaining files back from the SSD, but I may leave well enough alone at this point, take my own lumps, and be done with things. I’m up and running with my new 1TB drive … albeit about a day late.
Oh, as an aside, I have both Hyper-V enabled and VMware Workstation installed and both work under Windows 8 (unlike Windows Server 2008 R2). This is good news, but I’ll still need to dual boot since there’s no software routing in Windows 8 (client OS; Windows Server 2012 does).