Demystifying Microsoft Distributed Transaction Coordinator and SQL Server HA Configurations
One topic that seems to vex people when it comes to highly available configurations of SQL Server is the use (or lack thereof …) of (the) Microsoft Distributed Transaction Coordinator. It seems to be something I’ve addressed at least five times in the past few weeks, so I decided to do a blog post. Sometimes you will see the feature abbreviated as any one of three ways: DTC, MS DTC, and MSDTC. As an aside, I’ve written a few posts already on DTC – including two (this and this) on how to cluster DTC, so I’m no stranger to DTC land.
Back in the dinosaur era when we only had log shipping, SQL Server failover clustering (when Windows Server failover clusters [WSFCs] were called server clusters or something other than WSFC which it is today, and there was no naming conflict with the words failover and cluster together), and (heaven help us) replication, DTC was around, too. DTC is a component of the OS (i.e. Windows Server), not SQL Server.
Going back to those bygone days, the story was this: you had to cluster DTC if you were deploying a clustered SQL Server installation (this is before availability groups [AGs], kiddies). It was not optional. It only became optional for failover cluster instances (FCIs) starting with SQL Server 2008 (but you got and to this day, still get a warning on FCI installs if you do not; you can ignore that for reasons I’m not getting into deeply in this blog post). Clustered instances of SQL Server were the only way to get high availability support for distributed transactions until SQL Server 2016 (more on that in a minute). That means every feature – log shipping, database mirroring (DBM), replication, and yes, even AGs did not support distributed transactions.
What exactly is a distributed transaction? It’s one where the work needs to be completed in more than one database so data is kept in sync everywhere. For example, if you need to update data in Database A and in Database B, and they need to be kept in sync, that’s a distributed transaction. Database A and Database B can be in the same SQL Server instance, or they could be in different instances … or even in different data sources, such as Oracle or DB2. This whole shebang is often referred to as a cross-database transaction. DTC is based on the principle of a two phase commit – for the whole thing to get done, all the little bits need to be committed everywhere before claiming complete success. Otherwise stuff needs to be rolled back so that things stay in sync and all is right in the world.
Why is all of this important? With the HA features of SQL Server, it all boils down to failing over or switching to another server. It has nothing to do with when things are up and running and there are no issues. DTC plays well with everything in those cases. What happens when data commits in A but not B and you have a failover of one of those and the commit did not happen everytwhere? You are out of sync. Now you need to manually deal with the data issue – if you even know you have it. That is the problem with all of the features besides FCIs not supporting DTC, and one reason why FCIs are still popular today (among other reasons).
When I first started talking to customers about AGs even before SQL Server 2012 was released, many assumed that because an AG could be configured with more than one database that it supported distributed transactions. Bzzt! Thanks for playing, we’ll send you home with a lovely parting gift. The story remained unchanged until fairly recently.
At PASS Summit in I believe in 2015, the HA feature PM at the time announced that DTC support was coming for AGs and there was much rejoicing in the land … until it was realized that what was done only covered half the story. What was implemented in SQL Server 2016 was the case where if you had a database in Instance A and another in Instance B (or in another data source), not if you had two databases in the same instance (whether or not they were in the same AG).
Microsoft had originally intended on the full functionality being shipped in a CU or SP for SQL Server 2016, but it turned up in SQL Server 2017 as I blogged about recently. Given the short dev cycles and the amount of effort it took to do, I’m not surprised. To get DTC support for AGs was a cross effort between the Windows and SQL Server teams – and a lot of work. I’m OK with it being in 2017+.
DTC is a huge red flag/lightning rod/insert your euphemism here for some (IT DOESN’T WORK WITH AGs! MS SUCKS!). Do you actually need it? Do you even know if any of your applications use it? Most of my customers do not. I ask the question all the time when I’m helping them architect and implement HA solutions. And if you want help with your HA stuff for SQL Server, contact us. Max and I have decades of experience in this arena – literally.
Anyway, anecdotal evidence: in the near 30 years into my career with SQL Server going back to SQL Solutions in Burlington, MA pre-Sybase, the number of applications I’ve encountered that actually use DTC is probably under 25. The use and importance of DTC is vastly overrated in my opinion, but it’s nice for folks who want to bash on SQL Server and Windows dev to say it’s not supported.
Having said that, and for those of my customers over the years that rely on distributed transactions in their applications, it’s super important to them and their HA story has been not so great. I can see why it is a dealbreaker for those people, but this one is more like a 90/10 or 95/5 rule – the no DTC use case covers most, but for those who it doesn’t work for, if they hate FCIs, and, say are using VMware and really hate RDMs, well … you get the picture.
Bottom line: Do not make a mountain out of a molehill as I see many do with this topic. There’s so much FUD out there with DTC. Here’s a quick bullet list of what you really need to know:
- Know if your app uses distributed transactions, and if you need anything special like XA transactions which is a more detailed configuration of DTC.
- If you deploy apps using log shipping, replication, database mirroring, and AGs (pre-2016), DTC won’t kill you until you have a bad failover and your data is out of sync. So it will work … until it doesn’t. Graceful failovers won’t kill you if you coorindate things.
- Always test your apps in a failover situation with DTC – supported or not. Shame on you if you don’t and assume it will all be honky dory.
- If you require DTC for a highly available configuration of SQL Server, you have two options: FCIs or AGs.
- For AGs, whether you go with SQL Server 2016 or 2017 depends on what flavor of DTC you require.
- DTC support is not enabled by default on an AG in 2016+.
- DTC support for AGs can only happen when an AG is created, and is currently T-SQL only. Want to add it later? Delete and reconfigure the AG.
- DTC support on an AG is enabled by an option (DTC_SUPPORT = PER_DB), and it’s on or off. There’s nothing else you have to configure.
- DTC support for AGs requires Windows Server 2012 R2 or later. Windows Server 2012 R2 requires a hotfix.
- There is no support for DTC at all for SQL Server on Linux. If you need DTC, you’ll need to deploy SQL Server on Windows Server.
- Clustering DTC is not required for AGs or FCIs. Period. There are advantages for doing so with some FCIs, but not getting into that here. That’s a story you’ll have to read my book to get (and it’s coming along nicely, thankyouverymuch) or attend one of my classes.
Hope that clears things up.