Failover Cluster Manager Can Be Deceiving
GUIs are good for some things, but at other times, they are not always what we need or should be using. Case in point: Failover Cluster Manager in Windows. I use it fairly extensively, but there are times I want to strangle it because what you see in the UI doesn’t match what you may see in PowerShell. One or the other is accurate – but which is it?
A great example of this is when you want to enable the Availability Groups feature with the PowerShell cmdlet Enable-SqlAlwaysOn (see my rant about AlwaysOn). If you use the -Force option, it will stop and start the SQL Server service, but not SQL Server Agent (another annoying issue that is a “feature”). Executing this, you will see what is in Figure 1 in Failover Cluster Manager.
You can clearly see that SQL Server Agent is offline and the LabFCI role has a status of Stopped. Now, anyone looking at that who was not looking at the resources tab may have a heart attack thinking SQL Server is down. It clearly isn’t.
In PowerShell, if you see what’s going on with the role, you can see that it has a status of PartialOnline. Diving further, if you look at the resources, you’ll see in Figure 2 that SQL Server Agent is offline (just as it shows in FCM). PartialOnline is much more friendly and less threatening than Stopped.
The point here is that sometimes what you see is not really 100% accurate to what is going on. Yes, one resource in the role is offline, but is the whole role stopped? No!
So as you are putting together your plans to go live in production and monitoring your WSFC and the roles running in it, keep these types of things in mind. You may need to scratch below the surface to find out what’s really going on as to not cause yourself some undue stress.