Once More With Feeling: Stop Using Active/Passive and Active/Active
Happy 2012, everyone! I figured I might as well kick of the new year by beating a dead horse (not literally – don’t call PETA; I’m not an animal abuser). If you’ve ever heard me speak, read my books or whitepapers, seen a tweet, a blog post, or heck, know of me, you know there’s nothing more that gets my dander up than the use of active/passive (A/P) and active/active (A/A) as it relates to SQL Server failover clusters. This will not be my first time ranting, nor will it be my last. I ranted in this forum post over at SQL Server Central last May as part of a response to something I felt added to the confusion of why people still cling to this silly terminology today. Expect yet another diatribe in my upcoming SQL Server 2012 book, too (although if I like what I write here, I’ll just link this blog post). Anyway …
You may be asking yourself, “Why does he care?” The answer is not as simple as just using wrong terminology.
A Bit of History
Let’s take a way back machine for a moment. July of this year will mark 10 years since the publication of the SQL Server 2000 failover clustering whitepaper I wrote while at Microsoft. That paper has a lot to do with my career today, and I am forever indebted to MS for publishing it. That paper also features the first use of non-A/P and A/A terminology. The terminology is really simple and SQL Server specific. That was a main driver in doing it. You’ll see why as you read on.
SQL Server 2000 marked the biggest change in concepts with failover clustering with the introduction of instances. I think we take it for granted now, but 10+ years ago, it was a pretty big deal to have more than one SQL on a single box or a cluster. There were very practical issues which made having multiple instances of SQL Server 2000 difficult, namely how scalable both Windows and its underlying hardware were. Single core processors, poor hyperthreading support, not much memory (8GB was a luxury for most), needing AWE to go beyond 4GB of memory, 32-bit (discounting the very limited Itanium variant which was released near the end of its lifecycle), DAS, no mount points – shall I go on?
Now let’s back up to the previous release – SQL Server 7.0. SQL Server 7.0 allowed you to cluster up to 2 separate installations (not instances) on two nodes of a Windows failover cluster (I won’t touch that terminology in this post … people still uses MSCS which is wrong, too). Very few of you reading this post may have ever used SQL Server 7.0, let alone clustered it, but let me tell you it was a miserable experience and anyone complaining about things today (and even I do) should put things in perspective. Yes, failover clustering today still has a few things I’d love to see addressed (mainly in Setup), but I’ll take what we have now any day of the week that was the misery known as SQL Server 7.0. Why?
- SQL Server 7.0 forced you to put the binaries for SQL Server on the shared disk – meaning you only had one copy and you could only run the management tools from the node which did the installation.
- To apply a SQL Server service pack, you had to uncluster the SQL Server installation, run the update (such as SP2), and then re-cluster the install. Let’s just say that didn’t always work.
- The clustering implementation was a bit of a hack prior to 2000. As part of the clustering process for SQL, MS replaced some DLLs to get SQL to work in a cluster. This is why specific versions of MDAC and such were required, and if you updated them on your own, you had the potential to break your cluster with 7.0. Fun stuff that I’m glad went away with the rearchitecture of failover clustering in SQL Server 2000. The MDAC issue had a lot to do with bad perceptions of SQL Server (in general – not just clustering) as well. Those issues were fixed a long time ago – notice you never hear about MDAC and issues related to it much, if at all, any more 🙂
So you can see why the ability to easily have more than one clustered installation of SQL Server on the same Windows failover cluster was so appealing when Microsoft fixed this in SQL Server 2000. SQL Server 7.0’s failover clustering implementation burned a lot of people, and unfortunately helped create part of the perception that failover clustering is bad. I know a lot of folks had a hand in that, including Richard Waymire (Blog | Twitter) who was the PM for the feature at the time.
SQL Server 2000 didn’t do clustering any favors in helping out in the perception department, either if I want to be a bit honest all these years on. With the aforementioned limitations (especially memory), you were really limited as to how many instances you could configure on a single Windows failover cluster. People had some really bad experiences with failovers. That’s why to this day I believe you still see a lot of single instance deployments on two-node Windows failover clusters – somewhere along the line they had something bad happen when more than one instance tried to run on a single node. This leaves us in a fun conundrum with other wrong perceptions, such as so-called “wasted resources” but that’s a topic for another time, too. The failover condition wasn’t the only thing that contributed to the poor perception of clustering with 2000 (the other bigger issue tended to be around the occasional file not being laid down during an update or on a remote install – not all customers encountered it which was what made it frustrating), but the entire experience was greatly improved.
In a long-winded way, this is how both “single instance (SQL Server) failover cluster” or “multiple instance (SQL Server) failover cluster” came about. This is more important than ever since Windows Server 2008 and beyond also uses the terminology failover cluster, which used to be a SQL Server only one.
Looking back at SQL Server 7.0, A/P and A/A made sense – even I’ll admit that. You could only have up to two nodes with Windows NT 4.0 Server (the predominant OS used around that time; SQL 7.0 was not developed with Windows 2000 Server in mind), and up to two installations. One instance? A/P, two instances? A/A. In that context, I have no real issue. But once the concept of instances was introduced with SQL Server 2000, all of that should have gone out the window. Now, sure, technically you could have two nodes with two instances and have it “technically” meet the A/P and A/A criteria. I get it. But it’s wrong now, and has been for 10+ years. I can’t be any more clear on this.
Let’s look at scenarios to drive the point home further.
Two Instances, Two Nodes
OK, so this is the “classic”. In a perfect world, you’ve got two nodes, each with its own instance. Since both nodes are running an instance, they are both actively hosting a SQL Server instance. In that case, it’s easy to buy into A/A if you wanted to. What happens – assuming capacity is not an issue – if there is a failover? Technically you are no longer in an A/A configuration. One node is hosting both instances. The other is down or not running any SQL Server instances. Is that now AA/P? No. This where using the right terminology – a multiple instance SQL Server failover cluster – is much more descriptive and actually makes sense.
Three Instances, Two Nodes
This one is a riff on the previous scenario. This one is a simple math problem: is it AA/A? A/AA? AAA/P? It certainly isn’t A/A/A, nor A/A/A/P. Again, the multiple instance SQL Server failover cluster terminology works shockingly well here. (“I’ve got a three instance SQL Server failover cluster.”)
Three Instances, Three Nodes
This one is similar to the Two Instances, Two Nodes scenario. If you have one instance (perfect world: no failovers) per cluster node, to some that would be A/A/A. But how many slashes do you get before things get out of control? And then what happens if failover happens (see previous scenarios for how that will go).
Two Instances, Three Nodes
Here’s another one that I see wrong, so I can kill two birds with one stone here. When Windows gained the ability to have more than two nodes in a failover cluster, the idea of a dedicated failover node – especially when you had multiple instances – was introduced. Because the limitation was at most four nodes with the of Windows 2000 Server Datacenter edition, this was often known as the N+1 scenario. When you really could have more than four nodes and more than one dedicated failover (not getting into specific configurations and things like preferred or possible owners in this post, either), this is N+i, where i would be the number of nodes that are failover ones.
From a SQL Server point of view, how would you denote the N+1 scenario? A/A/P? That’s just silly. Again, saying something like “I have a two instance SQL Server failover cluster with a dedicated failover node” makes a whole lot more sense.
Many Instances, More Than Two Nodes
Here’s where I throw up my hands and if I could curse, I probably would. If you plan on having, say, six instances on however many nodes (let’s pick four for the sake of argument), A/A/A/A/A/A (denoting the number of instances), nor is it really A/A/A/A (denoting the number of nodes). It sure as heck isn’t A/P or A/A. Capiche?
Doesn’t Microsoft Use Both the A/P and A/A Terms Still?
Unfortunately, they have and some still do to this day. I’ve seen it in more current documentation and blog posts from them. Sigh. It doesn’t mean they are right, though. We tried with the release of SQL Server 2000 and all of the corresponding documentation (including my SQL Server 2000 High Availability book) to make things consistent. The same could be said for 2005. But A/P and A/A seem to be sticking around like a cockroach will long after humans after a nuclear blast. Not everyone studies their history, so I guess we’re doomed to repeat it.
In Conclusion
Stop the insanity!
http://www.youtube.com/watch?feature=player_detailpage&v=p063wg78Yss#t=11s
OK, maybe that’s a bit overblown but I wanted to work that reference/kitsch in somewhere. Who wasn’t entertained by that infomercial back in the day?
Like I said in the forum post I link above, I won’t publicly shame anyone for using A/P or A/A, and I may not even correct or embarass you (I may … my apologies if I do), but I hope I’ve explained to you why they’re wrong. Honestly, I don’t get why people are not embracing the right terminology as they accurately describe the configuration. I would even venture to guess that there would be less confusion around more complex clustered SQL Server configurations if people talked about them properly.
Since blog (mixed case would have made it easier to read).
I have A/A cluster, I will now start calling it a multi-instance two node cluster. Knowing some of the history helps to put the terminology in prespective.
Bill –
The caps thing is apparently only happening in some older browsers. It is not in all caps.
Allan
To me, the “Active” / “Passive” still makes perfect sense when talking about 2 or 3 node clusters. Maybe because it’s ingrained in my mind that A/A, or A/A/P would automatically denote two (or more) instances, and A/P would denote 1 or more instances.
It’s a matter of perspective, I guess. The “Active”, “Passive” does not indicate the number of instances running on a cluster, but it does tell you how many nodes are in the cluster, and if any are “passive” which means they do not run any instances during normal operations. It’s a great designation from the OS layer, but not the greatest at the application layer. Shorthand is at times, neccessarily vague.
Active passive does not make sense, especially in a multi-instance configuration. And it is incorrect for SQL – it’s NOT a matter of perspective.
Great rant.
Since writing that article you will be happy to hear I personally have actively stop using this terminology. 🙂 It’s also great to have the background on how terminologies like these were created and take shape in the world.
However, during my time using the correct terminologies in both formal presentations and informal discussions, people will want to refer back to the “active/passive” points to gain an understanding.
I realise now, people use this to understand what the hardware is doing, and not the services the “correct” terminology defines. As Jason alludes to in his post, this makes “sense” to him, but factually incorrect.
There is a silver lining though; my last webinar ran with not one person questioning using the “correct” multiple instance SQL Server failover cluster N+i terminology.
May be all that’s required is some patience & time for the new to replace the old…
Thanks for the link back 🙂 and good luck with the new book :).
The takeaway from this post is that Microsoft turned a notorious feature into a huge win.
The problem is, in regards to the subject, I’m not buying what you are selling.
I prefer A/A/P. So in an email to my boss, you are saying I should describe this cluster as, “a multiple instance SQL Server failover cluster with two instances and three nodes” That doesn’t even describe the preferred configuration.
I can do it in 5 characters, you can’t even do it in five words. Here is to the performance of email systems around the world, don’t be afraid of A/A/P.
Dustin, I think Allan’s point is that there are many more moving parts at this point in Clustering’s life, so saying A/A/P actually doesn’t give a complete picture of the setup–at least for an outsider. And considering Allan’s perspective as a consultant…his job is to be an outsider.
Being verbose for the sake of being verbose is never a good thing. But being verbose to be descriptive is a good thing. I think it’s also a good thing to differentiate between formal & colloquial terminology. “Multi-instance Windows Server Failover Cluster with two SQL instances, and three nodes, one of which is normally passive” might be appropriate for a formal document, but certainly won’t work when you’re sending an email to your boss.
Shorthand, you can still tell your boss that the WSFC is 3 node, 2 instance (or 3N-2I) and save your keystrokes. In fact, if you move to a 4-node, 3 instance cluster, you can save some keystrokes!! A/A/A/P ==> 4N-2I
I wholeheartedly agree with A/A (etc) as being terrible and incorrect when describing a cluster. As someone who has worked extensively on 4-node, multi-insurance clusters (sometimes also running clustered non-SQL services on top of those instances). What if the “passive” node has some OTHER service? It’s passive for SQL, but not passive overall. Now there’s another layer of confusion.
Active and Passive have limited usefulness. Particularly if you want to speak in a way that is clear to the DBA, sysadmin, and outsiders. For example, in a 4-server, 3-instance cluster, it can be helpful to colloquially describe the instanceless server as passive, as in “The D node is currently passive. We can reboot it during business hours.”
My problem with ‘active-passive’ with WFC is that it’s too changeable to provide an accurate description of a cluster’s architecture. Say you have two-node cluster with two instances that you call active-active. One instance fails over. OK, now you have an ‘active-passive’ cluster. It fails back later due to your failback policy. OK, now it’s ‘active-active’ again. As such, it’s difficult to provide an answer other than ‘it depends’ to the common question from developers and business users of whether we’re running an active-active or active-passive cluster.
The other issue I’ve seen is that developers and business users are misled by the ‘active-active’ terminology into thinking that an ‘active-active’ WFC means there are multiple read-write copies of the data with some type of coordinator, similar to my understanding of MySQL NDB Cluster or Oracle Rac. If you tell them their cluster is active-passive, they want to know how much it costs to upgrade to active-active so that they can ‘use all the nodes’ to make their application perform better.
There are other problems mentioned by lots of other people, but I haven’t seen these two get much attention. It is true that it is a semantic argument, but I think there is a good case for preferring ‘n-node, y-instance’ terminology when a-p vs a-a leads to so many misunderstandings.
For me, A/P means having a standby node. A/A meaning no standby node.
Passive means sitting there not doing anything, when we have 3 instance and 4 nodes, this is a A/P cluster. If however, we have 4 instances and 4 nodes, the nomal configuration of preferred nodes will dictate it as A/A cluster as there is no standby node.
Multi-instance cluster cannot describe the situation clearly in that from it we cannot know do we have more nodes than instances or the other way arround.
Pingback: Active/active SQL Server Clustering in Installation Center (setup.exe) | XL-UAT
Hey Al,
Just to be very late to the game.
I’ve always been content with the A/A or A/P terminology. For me, it represented a licensing comment.
If one or more nodes were intended to be unused most of the time. Then it was an Active/Passive situation. The normally Passive nodes didn’t need to be licensed. (As they were covered under the DR provision)
If it was expected that all nodes would be running SQL instances evenly distributed over them. But if one went down sometimes they’ve run multiple instances. That was Active/Active.
It didn’t matter how many nodes, or how many instances. Just do you expect one of more to be idle most of the time.
Of course in an Architecture diagram/discussion, you would need to be more precise, perhaps using the terms you’ve outlined above.
😉
A/A and A/P are crucial legal terms for licensing. That’s where it ends for me as they completely fail from an architecture discussion. You’re not necessarily counting widgets there 🙂
Hope all is well with you! 🙂