Maintaining Quorum During Windows Patching and Updating
I’ve been working with a customer to deploy their new Windows Server 2008 R2 SP1-based failover cluster that will have bot SQL Server 2008 R2 and SQL Server 2012 instances. One thing I always talk about in my classes as well as the presentations I focus on patching is this: you must maintain quorum during the patching and updating cycle. Quorum is one of the most important considerations in general for your clustered deployments (whether you’re implementing traditional failover clustering instances or availability groups), and it’s something that quite frankly, I find there’s a real lack of knowledge out there. It’s so important it’s most likely getting a whole chapter in my upcoming book Mission Critical SQL Server 2012.
I have talked to customers that have had SQL Server outages because their Windows guys when patching the underlying nodes for the Windows Server failover cluster (WSFC) rebooted too many at once, so there were not enough voters us, thus bringing the WSFC (and SQL) down when it wasn’t expected. This is NOT a scenario you want. While in theory it’s easy to control and keep track of when things reboot, this little gremlin in Figure 1 pops up after a little while reminding you it needs to be done:
That’s all well and good, but the unskilled administrator who happens to see that message if they have to log onto the server may think nothing of it and just click Restart now. Yikes! The value of the reminder can be altered as shown in Figure 2, but you still need to worry about the reboot being done.
There’s another dark underbelly I’ve seen these dialog boxes associated with: automatic updating of Windows via Windows Update. That is a much scarier situation because that means your servers could be updating 24×7 on their own, and some of those updates may even force reboots. Check your settings! I still see production servers at client sites that have automatic updating enabled. If they are, you need to have a serious discussion with your server admins as to why this is being done. Automatic updating is not in line with mission critical. Yes, it’s important to have a patch management strategy that keeps you up-to-date, but doing it smartly is the key. Not blindly.