SQLBits 2020 Training Day Announcement

By: on December 16, 2019 in Conference, Pre-conferece, SQLbits | No Comments

Happy Monday, everyone. I’ve been heads down with both teaching and customer work, but wanted to pop up to make an exciting announcement: I’m honored to have been chosen once again to deliver a Training Day at SQLBits 2020 which will be in London.

This year’s Training Day will be SQL Server Cloud Fundamentals which is scheduled for Tuesday, March 31. I’ll be covering not only Azure, but Amazon AWS (EC2 and RDS), Google Cloud Platform, and hybrid scenarios – yes, even if you want to partially stay on premises, I’ve got you covered.

I’ve been in touch with the organizers, and right now all things are green for doing the lab that I’ve planned which is a lot of fun (no, really …). You can’t beat hands on experience! If you’ve ever taken one of my classes or been in a previous pre-con at another conference or a Training Day, you know my labs are designed to reinforce the concepts taught that day. No prior experience with any public cloud is necessary.

So my Training Day will be good for those who are not only new to any cloud, but those looking to maximize their existing experiences to be better at what they do.

If you don’t want to miss out on actual hands on experience with cloud stuff in addition to the instruction, reserve your spot early. If you sign up, you’ll get an e-mail closer to the date with instructions on what you’ll need to bring with you to do the lab.

I look forward to seeing you in London at SQLBits.

Two Big SQL Server Availability Announcements from Microsoft

By: on November 7, 2019 in Azure, FCI, High Availability, Licensing, Shared Disk, SQL Server, SQL Server 2019 | No Comments

Hello everyone. I have not had much time to blog, but there are a few things that you should be aware of and that I have an opinion about. Both were recently announced – one before Ignite and PASS Summit and the other this week somewhat quietly in a session at Ignite.

Big Deal #1 – Azure Shared Disks for FCIs (and One More Thing …)

If you have ever heard me present, you will know I am not the biggest fan of deploying Always On Failover Cluster Instances (FCIs) in non-physical environments. That includes in any of the public clouds. Why? There are quite a few reasons which I will not go into here, but I will discuss the biggest one: shared storage. Configuring shared storage is awful in any of the public clouds. Sure, if you use something like VMware’s solutions in any of the public clouds you can take advantage of vSAN, but that will not work for most folks. I’ve been VERY vocal how much the Storage Spaces Direct solution is currently a fairly terrible experience at least in Azure. SIOS DataKeeper is a good solution, but for some, they do not want to purchase anything else; they want a workable, native solution. Sure, you can use iSCSI but why would you?

You get the idea. It kinda sucks right now (forget the fact you probably should not be doing it to begin with). I’ve had conversations with both the SQL Server and Windows Server dev teams about this. They know my displeasure.

Maybe they finally heard me. At Ignite during BRK3253 – Windows Server on Azure overview: Lift-and-shift migrations for enterprise workloads (man, I hate how MS does not capitalize titles ….), Elden Christensen and Rob Hindman from the Windows Server dev team (both of whom I know very well) announced the private preview of Azure Shared Disks (it’s apparently being renamed, and isn’t Shared Azure Disks) and show a non-SQL Server-based demo. If you want to skip ahead, they talk about it at 22:34. This will not change my opinion on using FCIs up in Azure, but it will sure as heck make it easier to deploy. As soon as I can test, I’ll report back. There is another session BRK3283 Optimize price-performance and lower TCO for your workloads with the next-gen Azure Disks (not live as of this posting) which will cover more of the Azure disk stuff.

One more recent change which I will be most likely showing and talking about for the first time in a few weeks at my SQL Server LIVE! precon is the ability to use Azure Files for FCIs.

I would recommend you also have a look at the Ignite sessions from John Marlin to see and hear about anything new coming in WSFCs in Windows Server vNext LTSC. The biggest announcement I’m aware of that could affect SQL Server (and specifically FCIs) is the ability to stretch a WSFC with Storage Spaces Direct (S2D). THR2155 I can’t currently watch (network error on the Ignite site), so I haven’t been able to ascertain what else could impact SQL Server. If there’s something else, I’ll either update this post or do a new one if it’s really major.

Big Deal #2 – Changes to Licensing for Availability Scenarios

On October 30th, Microsoft let the cat out of the bag that licensing rules for the availability scenarios were changing as of November 1, 2019 if you have Software Assurance (SA). We always tell our customers that SA is often worth it. You can see the full new rules in the SQL Server 2019 licensing guide. The rules apply to all supported versions of SQL Server.

Let that sink in: supported versions of SQL Server. What does that mean? For example, if you are using SQL Server 2016 with Service Pack 1 whose support ended on July 9, 2019, you would need to go to Service Pack 2 or later. Microsoft is not going to come in and crash your party, but if they audit you or you are going through a true-up for licensing, it could bite you. This means you need to stay current to take advantage of these benefits.

I’m going to discuss what I feel are the biggest game changers. I knew licensing was changing as I had conversations with Microsoft around this months ago. I was not sure what the final result was going to be, but I’m fairly pleased. Is it perfect? No, but it’s much better than it was.

It’s interesting to note that for licensing purposes, Microsoft defines HA as a synchronous replica with automatic failover and D/R as an asynchronous replica with manual failover.

Change #1 – Backups and DBCC on a Secondary Replica Is Now Free

I still hate the use of the words active and passive, but that’s a whole other blog post. In talking with Microsoft, there are other reasons some of these terms are used, but again, irrelevant at the moment. Taking terminology out of the picture, with SA, you can now generate full, copy only backups, transaction log backups, and run DBCC on a secondary replica of an AG without incurring a license. That of course assumes no other databases are live on that instance. That is a huge change because through SQL Server 2017, those were considered “use” and you had to license. This is one thing that should have been there since day one, but I’m just glad they finally corrected their stance.

Change #2 – Allowance for Disaster Recovery Testing

This one is buried in the licensing document, but very big. Here’s what it says:

Customer may also run primary and the corresponding disaster recovery replicas simultaneously for brief periods of disaster recovery testing every 90 days.

I can’t emphasize how huge this is. For example, there was always a question if you were using something like VMware’s Site Recovery Manager (SRM) as your primary D/R method. Now that is covered in pretty clear language. Thank you, Microsoft.

Change #3 – Disaster Recovery Is Now Covered (To a Point …)

Prior to these changes, you only got one “free” replica. If you had an AG with three replicas, two local for HA and one remote for D/R, you paid for two of those. With this licensing change, both the HA and D/R replicas would be covered if you have SA. This is huge.

Change #4 – Azure Disaster Recovery Replicas Are Covered via the Azure Hybrid Benefit

This is related to #3. I’m sure this one will irk some, and if so, please comment below. I know Microsoft is listening 🙂 If Azure is your cloud provider and you want to stick a replica for your AG up in the cloud that is just for disaster recovery, you are now covered with SA. This was not the case before. This new benefit does not apply if you are using Amazon Web Service or Google Cloud.

With this benefit, you can technically have up to three passive replicas (one on premises for HA, one one premises for D/R, and one up in Azure for D/R). Anything else would still seemingly need to be paid for. So if you have two D/R replicas in different data centers (for example, your main data center is in Boston, and you have D/R replicas in London and Chicago), you are going to pay for one of those. That is still better than paying for two.

Things That Could Use Clarification

Due to the “standardization” of terminology, FCIs, AGs, and log shipping are basically treated the same in this document. That’s confusing. Microsoft says this:

Each of these implementations uses different terminology. The examples that follow use ‘Active’ as the read-write database or instance (also referred to as Primary Replica in an Always On Availability Group) and ‘Passive’ for the write only database or instance (marked to not-read, referred to as Secondary Replica in an Always On Availability Group).

But that’s not the terms we use for FCIs in technical documentation. For example, FCIs do not have “write only instances”. You install the binaries on other nodes of a Pacemaker cluster or WSFC and then the FCI can fail over to it. It’s not a “live” instance. I know this is probably not changing because documents like this go through legal reviews and has to match other similar documents/licensing terms for things like Windows, but it irks me to no end.

Also, I am wondering how all of this affects distributed availability groups and how the forwarder is considered. Is it passive or active from a licensing perspective? I’m hoping they clarify that.

Availability Group Support For Containers Not Shipping with SQL Server 2019 RTM

By: on September 5, 2019 in Always On, AlwaysOn, Availability Groups, Containers, Kubernetes, SQL Server 2019 | No Comments

This one is going to be a short post. One of the big capabilities of SQL Server 2019 – the ability to deploy AGs using containers – is not shipping when SQL Server 2019 is released. My guess is it it will be enabled in a CU sometime after its release. From the RC1 release notes:

The Kubernetes operator for Always On Availability Groups is not supported in this release candidate and will not be available at RTM.

I’ll update this blog post when it is lit up in SQL Server 2019.

AGs and WSFC OS Rolling Upgrades: What Works and What Doesn’t

By: on August 15, 2019 in SQL Server 2012, SQL Server 2014, SQL Server 2016, SQL Server 2017, SQL Server 2019, Upgrade, Windows Server 2012 R2, Windows Server 2016, Windows Server 2019, Windows Server Failover Cluster | 2 Comments

It’s been a crazy summer so far. I haven’t had much time to blog but an interesting scenario came up recently that I wanted to share with everyone. I have also missed writing technical posts, and this is a good one to scratch the itch.

In Windows Server 2016, Microsoft added the ability to do a rolling OS upgrade with a Windows Server Failover Cluster (WSFC). This is Microsoft’s official documentation. The SQL Server side of things is not well documented, especially for Always On Availability Groups (AGs),  so I did a few tests. The results are fairly interesting. All start out with SQL Server 2012 SP4. The goal is to get to Windows Server 2016 and some version of SQL Server, whether it is the same or different.

A few things you should know

  • A cluster OS rolling upgrade works/is supported N-1, so you can upgrade Windows Server  2012 R2 to 2016 or Windows Server 2016 to 2019, but not Windows Server 2012 R2 to 2019.
  • Running validation will fail when you add a Windows Server 2016 node to a 2012 R2 WSFC.
  • I strongly recommend only adding the upper version nodes with that version. I saw some weird things when I tried to do it the other way.

One of the reasons this upgrade method is important is that especially in virtualized or public cloud-based architectures, you’re no longer tied to hardware for OS uprgades so how do you upgrade clustered configurations of SQL Server?

Strong Warning

The configuration for a cluster rolling upgrade allows for mixed Windows Server versions to coexist in the same WSFC. This is NOT a deployment method. It is an upgrade method. DO NOT use this for normal use. Unfortunately, Microsoft did not put a time limit on how long you can run in this condition, so you could be stupid and do something like have a mixed Windows Server 2012 R2/2016 WSFC. Fire, ready, aim. The WSFC knows about this and you’ll see a warning with an Event ID of 1548.

Figure 1. 1548 warning

Also, what is shown below when adding replicas to lower versions of SQL Server is also NOT a normal deployment method. It’s an upgrade tactic only. Don’t be dumb – especially since it’s a one way street once the AG fails over to the newer version.

If it was me, I would do all of this in one weekend if I could. Circumstances may not allow that, but don’t drag this on very long.

One very good reason to not stay this way is that when dealing with the cluster functional level, unless all nodes are at the higher version, you’re stuck at the older functionality/behaviors. A good example of a benefit you could not take advantage of is cloud witness in Windows Server 2016 if you are at a functional level of 8. 9 is Windows Server 2016 and 10 is Windows Server 2019.

Figure 2. Cluster functional level of 8, or Windows Server 2012 R2

The Tests

Detailed steps are not here, just the overall process to see what does/does not work. For a real production move, there are things I would add to this list but that isn’t what I was going for here. I also did not necessarily use the latest/greatest updates. I was testing process here so it was pretty quick, down, and dirty. There are also many more variants. You can try those yourself.

Test 1

2 x SQL Server 2012 SP4/Windows Server 2012 R2 nodes

2 x SQL Server 2012 SP4/Windows Server 2016 nodes

Figure 3. All 2012 instances

High level process:

  1. Add the Windows Server 2016 nodes to the 2012 R2 WSFC. The WSFC is mixed, and looks similar to the screen shots below.

    Figure 4. Windows Server 2012 R2 Node

    Figure 5. Windows Server 2016 node

  2. I joined the instances on the Windows Server 2016 nodes as replicas to UpgAG. This query output shows that everything has joined and seemingly looks okay. Or is it?

    Figure 6. Joining the WS2016-based SQL Server 2012 instances to the AG

  3. I then set 2012R2UPGN1 and 2016UPGN1 to synchronous data movement since I’m trying to go to WS2016. That works fine.

    Figure 7. Changing how data is synchronized between 2012R2UPGN1 and 2016UPGN1

  4. I should be able to manually initiate a failover from 2012R2UPGN1 to 2016UPGN1 right? Wrong. For whatever reason, SQL Server 2012 in this configuration can’t update possible owners on the AG resource at the WSFC level.

    Figure 8. Failure to fail over

Result – Does not work. I tried to get around this by manually dealing with possible owners, but the configuration was not having it. It looks like if you want to do a rolling cluster upgrade and stay on all SQL Server 2012, you’re out of luck.

Test 2

2 x SQL Server 2012 SP4/Windows Server 2012 R2 nodes

2 x SQL Server 2012 SP4/Windows Server 2016 nodes

High level process:

  1. Add the Windows Server 2016 nodes to the 2012 R2 WSFC.
  2. Using the configuration from Test 1 that bombed, perform an in-place upgrade to SQL Server to 2016 on the two Windows Server 2016 nodes. Since everything at SQL Server 2012 failed, I wanted to see if this would make a difference.

    Figure 9. Mixed versions of SQL Server

  3. Set 2012R2UPGN1 and 2016UPGN1 to synchronous.
  4. I failed UpgAG over to 2016UPGN1 from 2012R2UPGN1. This was successful, and the SQL Server 2016 replicas can now synchronize, but they cannot with the 2012 ones. That makes sense given 2012 is downlevel and upon failing over, the database was upgraded to 2016.

    Figure 10. Replica states after failing over to SQL Server 2016

  5. Remove the SQL Server 2012 replicas from the AG.
  6. Evict the Windows Server 2012 R2 nodes from the WSFC.
  7. Upgrade the WSFC functional level.

    Figure 11. Upgrading and verifying the cluster functional level

  8. Party.

Result – Success! This mixed combo worked and I was left with both Windows Server and SQL Server upgraded with minimal downtime and little fuss.

Test 3

2 x SQL Server 2012 SP4/Windows Server 2012 R2 nodes

2 x SQL Server 2012 SP4/Windows Server 2016 nodes

High level process:

  1. Add the Windows Server 2016 nodes to the 2012 R2 WSFC
  2. Join the instances on the Windows Server 2016 nodes as replicas to UpgAG.
  3. This time upgrade all four instances in place to SQL Server 2016.

    Figure 10. All SQL Server 2016 instances

  4. Set both 2012R2UPGN1 and 2016UPGN1 to synchronous.
  5. Manually fail UpgAG from 2012R2UPGN1 to 2016UPGN1. This succeeds.
  6. Remove the 2012 replicas from the AG.
  7. Evict the 2012 R2 nodes from the WSFC.
  8. Upgrade the WSFC functional level.
  9. Drink a beverage of your choice.

Result -Similar to Step 2, this worked with none of the problems of the all SQL Server 2012 configuration.

Test 4

2 x SQL Server 2012 SP4/Windows Server 2012 R2 nodes

2 x SQL Server 2017 CU 15 + GDR/Windows Server 2016 nodes

Figure 12. SQL Server 2012 and 2017 instances

High level process:

  1. Add the Windows Server 2016 nodes to the 2012 R2 WSFC
  2. When joining the SQL Server 2017 instances on the Windows Server 2016 nodes as replicas to UpgAG, there is a different behavior. Note the join state highlighted. That is not good. Can that be changed?

    Figure 13. SQL Server 2017 replicas with a state of NOT_JOINED

  3. Setting things to synchronous does not seem to help.

    Figure 14. SQL Server 2017 replicas still do not have a join state that is expected

  4. Changing ownership of UpgAG to 2016UPGN1 does not move the needle much … but it moves it a little. Note that 2016UPGN2 now has a status of JOINED_STANDALONE.

    Figure 15. Failing the AG over changes the secondary replica running SQL Server 2017, but not the primary

  5. Removing the 2012 replicas from the AG has no change on 2016UPGN1.

    Figure 16. Same status for 2016UPGN1

  6. Failing UpgAG to 2016UPGN2 however gets everthing where it should be for a join state.

    Figure 17. 2016UPGN1 now has a join state that is expected

  7. Evict the 2012 R2 nodes from the WSFC.
  8. Upgrade the WSFC functional level.

Result – Success! This one was not straightforward, and I do not know why when you join a SQL Server 2017 replica to a lower level AG in this configuration it gives a status of NOT_JOINED. Clearly that did not happen in Test 2 or Test 3. That is a question for the dev team and another day.

Final Thoughts

While three of the configurations worked in the end, one gave me a bit of heartburn, and the fourth just didn’t work. As always, you should plan upgrades and test them before doing them on your production boxes. If you’re looking for help on upgrades or anything else SQL Server or infrastructure-related, reach out to us. We’d love to help you get to your destination!




Independence Day Training Sale

By: on July 1, 2019 in SQLHAU, Training | No Comments

Hi everyone. Quick note. From today (July 1) until 11:59 PM Eastern on July 8, we are having a limited time sale on all SQLHAU training – 30% off everything. Use discount code JULY4 to get 30% off your entire order.

The sale includes all SQLHAU training scheduled on our events page including the two upcoming in person classes in Chicago (Modernizing Your SQL Server Infrastructure Boot Camp and the Always On Availability Group Boot Camp) and the two online live classes – Almost On: A Guide to Troubleshooting Availability Groups and FCIs in a few weeks and the Always On Availability Groups Boot Camp in December.