By: Allan Hirt on August 15, 2019 in SQL Server 2012, SQL Server 2014, SQL Server 2016, SQL Server 2017, SQL Server 2019, Upgrade, Windows Server 2012 R2, Windows Server 2016, Windows Server 2019, Windows Server Failover Cluster
It’s been a crazy summer so far. I haven’t had much time to blog but an interesting scenario came up recently that I wanted to share with everyone. I have also missed writing technical posts, and this is a good one to scratch the itch.
In Windows Server 2016, Microsoft added the ability to do a rolling OS upgrade with a Windows Server Failover Cluster (WSFC). This is Microsoft’s official documentation. The SQL Server side of things is not well documented, especially for Always On Availability Groups (AGs), so I did a few tests. The results are fairly interesting. All start out with SQL Server 2012 SP4. The goal is to get to Windows Server 2016 and some version of SQL Server, whether it is the same or different.
A few things you should know
One of the reasons this upgrade method is important is that especially in virtualized or public cloud-based architectures, you’re no longer tied to hardware for OS uprgades so how do you upgrade clustered configurations of SQL Server?
Strong Warning
The configuration for a cluster rolling upgrade allows for mixed Windows Server versions to coexist in the same WSFC. This is NOT a deployment method. It is an upgrade method. DO NOT use this for normal use. Unfortunately, Microsoft did not put a time limit on how long you can run in this condition, so you could be stupid and do something like have a mixed Windows Server 2012 R2/2016 WSFC. Fire, ready, aim. The WSFC knows about this and you’ll see a warning with an Event ID of 1548.
Also, what is shown below when adding replicas to lower versions of SQL Server is also NOT a normal deployment method. It’s an upgrade tactic only. Don’t be dumb – especially since it’s a one way street once the AG fails over to the newer version.
If it was me, I would do all of this in one weekend if I could. Circumstances may not allow that, but don’t drag this on very long.
One very good reason to not stay this way is that when dealing with the cluster functional level, unless all nodes are at the higher version, you’re stuck at the older functionality/behaviors. A good example of a benefit you could not take advantage of is cloud witness in Windows Server 2016 if you are at a functional level of 8. 9 is Windows Server 2016 and 10 is Windows Server 2019.
The Tests
Detailed steps are not here, just the overall process to see what does/does not work. For a real production move, there are things I would add to this list but that isn’t what I was going for here. I also did not necessarily use the latest/greatest updates. I was testing process here so it was pretty quick, down, and dirty. There are also many more variants. You can try those yourself.
Test 1
2 x SQL Server 2012 SP4/Windows Server 2012 R2 nodes
2 x SQL Server 2012 SP4/Windows Server 2016 nodes
High level process:
Result – Does not work. I tried to get around this by manually dealing with possible owners, but the configuration was not having it. It looks like if you want to do a rolling cluster upgrade and stay on all SQL Server 2012, you’re out of luck.
Test 2
2 x SQL Server 2012 SP4/Windows Server 2012 R2 nodes
2 x SQL Server 2012 SP4/Windows Server 2016 nodes
High level process:
Result – Success! This mixed combo worked and I was left with both Windows Server and SQL Server upgraded with minimal downtime and little fuss.
Test 3
2 x SQL Server 2012 SP4/Windows Server 2012 R2 nodes
2 x SQL Server 2012 SP4/Windows Server 2016 nodes
High level process:
Result -Similar to Step 2, this worked with none of the problems of the all SQL Server 2012 configuration.
Test 4
2 x SQL Server 2012 SP4/Windows Server 2012 R2 nodes
2 x SQL Server 2017 CU 15 + GDR/Windows Server 2016 nodes
High level process:
Result – Success! This one was not straightforward, and I do not know why when you join a SQL Server 2017 replica to a lower level AG in this configuration it gives a status of NOT_JOINED. Clearly that did not happen in Test 2 or Test 3. That is a question for the dev team and another day.
Final Thoughts
While three of the configurations worked in the end, one gave me a bit of heartburn, and the fourth just didn’t work. As always, you should plan upgrades and test them before doing them on your production boxes. If you’re looking for help on upgrades or anything else SQL Server or infrastructure-related, reach out to us. We’d love to help you get to your destination!
[…] Allan Hirt shows how you can combine Availability Groups with Windows Server Failover Clusters and u…: […]
This is EXACTLY what I needed Allan!! Incredibly timely blog post – I’m glad you had that itch that needed scratching! 😀
Hi Allan,
i am trying to follow your approach,
to give you some intro
Below nodes are part of same WSFC (windows 2016 and SQL server 2019)
SQL-1
SQL-2
Trying to add 2 new Nodes to the same WSFC (windows 2019 and SQL server 2019)
SQL-3
SQL-4
can create an AG and databases are in Sync .
SQL-1 and SQL-3 are part of same subnet range
SQL-2 and SQL-4 are part of same subnet range.
i can failover between sql-2 and sql-1 , but cannot failover to newly created SQL-3 and SQL-4. gives IP conflict and in use, even though they are not used.
Getting below error :
Failed to bring availability group ‘AWS_AAG2’ online. The operation timed out. If this is a Windows Server Failover Clustering (WSFC) availability group, verify that the local WSFC node is online. Then verify that the availability group resource exists in the WSFC cluster. If the problem persists, you might need to drop the availability group and create it again. (Microsoft SQL Server, Error: 41131)
I would need to understand your networking to figure things out. If you’re using AWS, its networking works a certain way.
None of this has anything to do with patching, and hopefully you tested all of this before you went live.