Once More With Feeling: Stop Using Active/Passive and Active/Active
Happy 2012, everyone! I figured I might as well kick of the new year by beating a dead horse (not literally – don’t call PETA; I’m not an animal abuser). If you’ve ever heard me speak, read my books or whitepapers, seen a tweet, a blog post, or heck, know of me, you know there’s nothing more that gets my dander up than the use of active/passive (A/P) and active/active (A/A) as it relates to SQL Server failover clusters. This will not be my first time ranting, nor will it be my last. I ranted in this forum post over at SQL Server Central last May as part of a response to something I felt added to the confusion of why people still cling to this silly terminology today. Expect yet another diatribe in my upcoming SQL Server 2012 book, too (although if I like what I write here, I’ll just link this blog post). Anyway …
You may be asking yourself, “Why does he care?” The answer is not as simple as just using wrong terminology.
A Bit of History
Let’s take a way back machine for a moment. July of this year will mark 10 years since the publication of the SQL Server 2000 failover clustering whitepaper I wrote while at Microsoft. That paper has a lot to do with my career today, and I am forever indebted to MS for publishing it. That paper also features the first use of non-A/P and A/A terminology. The terminology is really simple and SQL Server specific. That was a main driver in doing it. You’ll see why as you read on.
SQL Server 2000 marked the biggest change in concepts with failover clustering with the introduction of instances. I think we take it for granted now, but 10+ years ago, it was a pretty big deal to have more than one SQL on a single box or a cluster. There were very practical issues which made having multiple instances of SQL Server 2000 difficult, namely how scalable both Windows and its underlying hardware were. Single core processors, poor hyperthreading support, not much memory (8GB was a luxury for most), needing AWE to go beyond 4GB of memory, 32-bit (discounting the very limited Itanium variant which was released near the end of its lifecycle), DAS, no mount points – shall I go on?
Now let’s back up to the previous release – SQL Server 7.0. SQL Server 7.0 allowed you to cluster up to 2 separate installations (not instances) on two nodes of a Windows failover cluster (I won’t touch that terminology in this post … people still uses MSCS which is wrong, too). Very few of you reading this post may have ever used SQL Server 7.0, let alone clustered it, but let me tell you it was a miserable experience and anyone complaining about things today (and even I do) should put things in perspective. Yes, failover clustering today still has a few things I’d love to see addressed (mainly in Setup), but I’ll take what we have now any day of the week that was the misery known as SQL Server 7.0. Why?
- SQL Server 7.0 forced you to put the binaries for SQL Server on the shared disk – meaning you only had one copy and you could only run the management tools from the node which did the installation.
- To apply a SQL Server service pack, you had to uncluster the SQL Server installation, run the update (such as SP2), and then re-cluster the install. Let’s just say that didn’t always work.
- The clustering implementation was a bit of a hack prior to 2000. As part of the clustering process for SQL, MS replaced some DLLs to get SQL to work in a cluster. This is why specific versions of MDAC and such were required, and if you updated them on your own, you had the potential to break your cluster with 7.0. Fun stuff that I’m glad went away with the rearchitecture of failover clustering in SQL Server 2000. The MDAC issue had a lot to do with bad perceptions of SQL Server (in general – not just clustering) as well. Those issues were fixed a long time ago – notice you never hear about MDAC and issues related to it much, if at all, any more
So you can see why the ability to easily have more than one clustered installation of SQL Server on the same Windows failover cluster was so appealing when Microsoft fixed this in SQL Server 2000. SQL Server 7.0’s failover clustering implementation burned a lot of people, and unfortunately helped create part of the perception that failover clustering is bad. I know a lot of folks had a hand in that, including Richard Waymire (Blog | Twitter) who was the PM for the feature at the time.
SQL Server 2000 didn’t do clustering any favors in helping out in the perception department, either if I want to be a bit honest all these years on. With the aforementioned limitations (especially memory), you were really limited as to how many instances you could configure on a single Windows failover cluster. People had some really bad experiences with failovers. That’s why to this day I believe you still see a lot of single instance deployments on two-node Windows failover clusters – somewhere along the line they had something bad happen when more than one instance tried to run on a single node. This leaves us in a fun conundrum with other wrong perceptions, such as so-called “wasted resources” but that’s a topic for another time, too. The failover condition wasn’t the only thing that contributed to the poor perception of clustering with 2000 (the other bigger issue tended to be around the occasional file not being laid down during an update or on a remote install – not all customers encountered it which was what made it frustrating), but the entire experience was greatly improved.
In a long-winded way, this is how both “single instance (SQL Server) failover cluster” or “multiple instance (SQL Server) failover cluster” came about. This is more important than ever since Windows Server 2008 and beyond also uses the terminology failover cluster, which used to be a SQL Server only one.
Looking back at SQL Server 7.0, A/P and A/A made sense – even I’ll admit that. You could only have up to two nodes with Windows NT 4.0 Server (the predominant OS used around that time; SQL 7.0 was not developed with Windows 2000 Server in mind), and up to two installations. One instance? A/P, two instances? A/A. In that context, I have no real issue. But once the concept of instances was introduced with SQL Server 2000, all of that should have gone out the window. Now, sure, technically you could have two nodes with two instances and have it “technically” meet the A/P and A/A criteria. I get it. But it’s wrong now, and has been for 10+ years. I can’t be any more clear on this.
Let’s look at scenarios to drive the point home further.
Two Instances, Two Nodes
OK, so this is the “classic”. In a perfect world, you’ve got two nodes, each with its own instance. Since both nodes are running an instance, they are both actively hosting a SQL Server instance. In that case, it’s easy to buy into A/A if you wanted to. What happens – assuming capacity is not an issue – if there is a failover? Technically you are no longer in an A/A configuration. One node is hosting both instances. The other is down or not running any SQL Server instances. Is that now AA/P? No. This where using the right terminology – a multiple instance SQL Server failover cluster – is much more descriptive and actually makes sense.
Three Instances, Two Nodes
This one is a riff on the previous scenario. This one is a simple math problem: is it AA/A? A/AA? AAA/P? It certainly isn’t A/A/A, nor A/A/A/P. Again, the multiple instance SQL Server failover cluster terminology works shockingly well here. (“I’ve got a three instance SQL Server failover cluster.”)
Three Instances, Three Nodes
This one is similar to the Two Instances, Two Nodes scenario. If you have one instance (perfect world: no failovers) per cluster node, to some that would be A/A/A. But how many slashes do you get before things get out of control? And then what happens if failover happens (see previous scenarios for how that will go).
Two Instances, Three Nodes
Here’s another one that I see wrong, so I can kill two birds with one stone here. When Windows gained the ability to have more than two nodes in a failover cluster, the idea of a dedicated failover node – especially when you had multiple instances – was introduced. Because the limitation was at most four nodes with the of Windows 2000 Server Datacenter edition, this was often known as the N+1 scenario. When you really could have more than four nodes and more than one dedicated failover (not getting into specific configurations and things like preferred or possible owners in this post, either), this is N+i, where i would be the number of nodes that are failover ones.
From a SQL Server point of view, how would you denote the N+1 scenario? A/A/P? That’s just silly. Again, saying something like “I have a two instance SQL Server failover cluster with a dedicated failover node” makes a whole lot more sense.
Many Instances, More Than Two Nodes
Here’s where I throw up my hands and if I could curse, I probably would. If you plan on having, say, six instances on however many nodes (let’s pick four for the sake of argument), A/A/A/A/A/A (denoting the number of instances), nor is it really A/A/A/A (denoting the number of nodes). It sure as heck isn’t A/P or A/A. Capiche?
Doesn’t Microsoft Use Both the A/P and A/A Terms Still?
Unfortunately, they have and some still do to this day. I’ve seen it in more current documentation and blog posts from them. Sigh. It doesn’t mean they are right, though. We tried with the release of SQL Server 2000 and all of the corresponding documentation (including my SQL Server 2000 High Availability book) to make things consistent. The same could be said for 2005. But A/P and A/A seem to be sticking around like a cockroach will long after humans after a nuclear blast. Not everyone studies their history, so I guess we’re doomed to repeat it.
Stop the insanity!
OK, maybe that’s a bit overblown but I wanted to work that reference/kitsch in somewhere. Who wasn’t entertained by that infomercial back in the day?
Like I said in the forum post I link above, I won’t publicly shame anyone for using A/P or A/A, and I may not even correct or embarass you (I may … my apologies if I do), but I hope I’ve explained to you why they’re wrong. Honestly, I don’t get why people are not embracing the right terminology as they accurately describe the configuration. I would even venture to guess that there would be less confusion around more complex clustered SQL Server configurations if people talked about them properly.