SQL Server FCI Setup Problems 1: Windows Server 2012 and VCOs Part 1
Happy New Year everyone! I hope the holiday season treated you well, but like everyone else, it’s time for me to roll up my sleeves and get back to things – including blogging.
This is the first of a few SQL Server failover clustering instance blog posts centered around Setup. I’ve been noticing a few things as of late and as I’m writing the new Mission Critical SQL Server 2012 book, I wanted to share some of what I found (and there will be much more detail in the book around some of these).
There’s a great new feature associated with deploying a Windows Server 2012 failover cluster (WSFC). When a WSFC is created, a corresponding Cluster Name Object (CNO) also gets created in Active Directory (AD). Similarly, when SQL Server (or any other clustered application requiring a name) gets created, it also gets an AD object called a Virtual Computer Object (VCO). These require certain rights to happen automatically, or these objects can be known as what is “pre-staged” – meaning they can be created ahead of time. I’m oversimplifying a bit just to present the general concepts to understand what I will be showing later.
The problem for AD admins is that many of them have a customized structure to place objects – like computers – into. For example, all of the SQL Servers may go in an organizational unit (OU) – think of a folder – called SQL Servers. By default, a Windows Server 2008 or 2008 R2 WSFC would stick the CNO and VCO in the Computers OU. The only way to have it placed elsewhere would be to pre-stage it. In Windows Server 2012, this behavior is different. There’s a bit more intelligence behind where CNOs and VCOs get created. If you have another default OU, it will try to use that. However, even if you don’t change the default OU but you move, say, the cluster nodes into another one, it will try to create the CNO where the nodes are, and the subsequent VCOs there as well because the CNO will now be there as well. It may not be optimal for everyone but it sure is light years from where we were with Windows Server 2008 and 2008 R2.
So in theory, SQL Server’s Setup – 2008 R2 or 2012 – when installing a clustered instance of SQL Server (FCI) should be able to use this properly, right? It can, but there are some slight differences to the way you did things in the past which I discovered through testing. The domain controller I used in this testing is Windows Server 2012 at a W2012 functionality level.
I have an OU named W2012 Servers (see Figure 1; for all figures in this post, click on them to make them larger). It has two computer objects in it – my nodes DENNIS and TOMMY. The WSFC administration account I will use to create the WSFC also has the requisite Create Computer Objects (CCO) right on the W2012 Servers OU – basically the same as if I would have added it to the Computers OU.
Then I created a WSFC named STYX as shown in Figure 2.
As expected, the CNO got added in that OU because it saw the two nodes in there. The OU now looks like Figure 3. So far, so good.
I then tried to install a SQL Server 2012 FCI named Equinox. Everything was great until Setup tried to bring the network name resource online. Cue sad trombone, as shown in Figure 4.
So what’s wrong? Setup barfed before it could complete (no SQL Server Agent was created) because it couldn’t configure SQL Server itself due to not being able to come online. Remember that there are dependencies when it comes to clustered resources, and in the case of SQL Server, if the name isn’t up, neither is SQL. This can be seen in the SQL Server Setup log as shown in Figure 5.
Now I need to figure out why this happened. Because of what I know about WSFC, I already had a suspicion it had to do with permissions or something related to the VCO. The first place I look is at the errors in Failover Cluster Manager (FCM), which is a good place to start. It aggregates the cluster related events for you. The one in Figure 6 stuck out at me like a sore thumb as root cause or close to it.
This error confirmed my suspicions. What was notable is the wording. Now, I know I had the account used to create the WSFC set up with CCO. In the past, that’s all I’ve ever needed unless I was pre-staging. That first bullet is the telling one. Remember that the STYX object was created just fine, so I had no issues. But after I created the WSFC, I did not explicitly give the CNO any additional rights on the W2012 Servers OU. Since I didn’t need to do anything like that in the past, I decided to try an experiment. Before trying to reinstall SQL Server and waste time there, I would try to cluster something as a generic applicatioon and see if I could reproduce the error. What I was trying to do was rule out SQL Server’s Setup as having a bug. Before going any further, I ran the Remove Node functionality of Setup to clean up the failed install of Equinox.
I then tried to create Notepad (yes, that Notepad) as a generic application as shown in Figure 7. The same issue happened – the network name could not come online. The error was the same. So at this point – for now anyway – I was ruling out SQL Server’s Setup. But I had to get it working. Note that in Figure 7, the Wizard calls out the fact that the WSFC detected where te VCO should probably be placed based on the location of the CNO. I did not input that. Pretty cool.
Based on this, I granted the CNO the CCO right on the OU to see if that would fix my issue. I then tried to create Notepad again in the WSFC. This time, it worked as shown in Figure 8. Notepad was created properly in the cluster and its associated VCO was created in the W2012 Servers OU as shown in Figure 9.
Now that I know Notepad worked, I was going to try SQL Server. I was fairly confident I solved my issue by granting the CNO the proper rights on the OU. If it didn’t work, there was a problem with Setup itself that was probably trying to put it in the Computers OU and bypass whatever Windows behavior is. I was hoping this was not the case. Before installing SQL Server, I deleted Notepad (not from the node, just the WSFC). Fingers crossed, I reinstalled the Equinox FCI. Figures 10 and 11 tell the story.
Whether you are installing SQL Server 2008 R2 or SQL Server 2012 on a Windows Server 2012 node, as long as the permissions are right, the new VCO functionality of Windows Server 2012 works like a charm. No more Computers OU by default if Windows detects things elsewhere. Yay.
So how does this new VCO behavior work with a non-Windows Server 2012 DC? Stay tuned for a future installment of this series.
Before I wrap up this post, I wanted to point out something else I observed. After creating the FCI, FCM was showing the following in Figure 12.
Name resolution not yet available? What? This is something I had never seen before that did go away. I’m going to do some digging into this, but what I am assuming is that DNS had not been propagated so while the object was in AD, something still wasn’t resolving. This was non fatal because it went away, but I found it interesting nonetheless.
That’s it for today, kids. Class dismissed!