Dear Microsoft: I Love You But You’re Driving Me Batty
If Microsoft and I had I relationship status, it would be “it’s complicated”. As a Microsoft MVP and someone who specializes in all things Windows and SQL Server (within reason … there’s way too much for one person to literally know everything), I make my bread and butter on their platforms. Sure, I’ve used Unix and other RDBMSes over the years, too – but not like SQL Server. I speak VMware very well (not just Hyper-V). While I don’t want to bite the hand that feeds me, anyone who knows me is aware that I’m outspoken. Some people love it, other people have an issue with it.
I try to be honest but not mean (despite what people may think – and you’d know when I’m being mean, trust me), but the reality is some people just can’t handle the truth so you pussyfoot around it. I come from a place where facts rule the day and I try not to let emotion cloud my judgement, but some of Microsoft’s recent decisions have me a bit puzzled, and quite frankly annoyed. What MS decides directly affects not only how I do my day job as a consultant and educator, but any one using Windows and/or SQL Server. In this post, I’m going to focus on two issues because I think Redmond needs some tough love and a dose of reality from someone who cares and is passionate about both Windows and SQL Server.
Issue One: Always On
Yes, you read that right – Always <space> On. It’s almost a joke now with me and AlwaysOn, but it ceased to be funny a long time ago. It’s funny in the “haha” sense, and you can see by my Twitter feed people troll me all the time. I’m used to it.
You may have seen my previous blog post “AlwaysOn Is the New Active/Passive and Active/Active“. If not, take a few minutes and read it because it sets this up. If you want the TL;DR approach here it is: AlwaysOn is not the AG feature. If that was the issue, I wouldn’t be writing a new blog post.
As noted in that blog post, the words “always” and “on” have a long history with SQL Server. Always On (with space) was the initial designation for the highly available storage for use with SQL Server. So availability has always been linked; that is not in question. To quote this Hitachi whitepaper:
The Microsoft SQL Server Storage Solution Review Program is a specific SQL Server program that enables storage solution providers to highlight those storage solutions and configurations, via the SQL Server “Always On” labeling, that they have successfully reviewed against core functional Microsoft SQL Server storage requirements. The core requirements defined herein must be met for reliable, highly available SQL Server storage systems.
Here are some other links from that era (there are more, but I think you’ll get the idea):
- EMC Celerra Solutions for Microsoft SQL Server 2005 Always On Technical Note
- IBM System Storage Solutions for Microsoft SQL Server IT Environments
- Always On White Paper for Fujitsu ETERNUS Storage System
As I wrote about in my book Pro SQL Server 2005 High Availability (Apress, 2007) in Chapter 4 on page 122:
At TechEd 2006, Microsoft unveiled the SQL Server 2005 Always On program (http://www.microsoft.com/sql/alwayson/default.mspx) [note: this link does not work now; it did back then, so don’t try it], which is an umbrella that brings all the SQL Server availability technologies under one banner. Part of the Always On program is a partner program for storage solution partners. This specific portion of Always On is known SQL Server Storage Solution Review Program.
I write a bit more in that, but this was the first morphing of the Always On name, which now included clustered instances as well as database mirroring, log shipping, backup and restore, etc. By SQL Server 2008, the storage certification portion of Always On was killed and it was just the HA-related features (FCIs, DBM, and log shipping clearly mentioned) as evidenced in this SQL Server 2008 OLTP marketing PDF.
Enter the feature introduced in SQL Server 2012: availability groups (AGs). Yes, AlwaysOn was a name for the feature for about five minutes in its development cycle as noted in my other blog post, but ultimately marketing decided to name the feature to be AGs and now had a new umbrella moniker for only FCIs and AGs: AlwaysOn. So the official names for those features are/were (we’ll get to were in a minute) AlwaysOn Failover Cluster Instances and AlwaysOn Availability Groups. So AlwaysOn has no space in 2012. This can be seen here and a screen grab is below in Figure 1. So out are DBM, log shipping, etc from the AlwaysOn moniker. And for heaven’s sake, it’s not AOAG. It’s just AG. We don’t say AOFCI – it’s just FCI (which is clearly seen at the link and in the screen grab; click to make it bigger). Precedence, people. Follow it.
My frustration with people using AlwaysOn improperly as the AG feature is well documented in the other blog post, and that issue has been occurring now for the better part of five years at this point. I’m not going to lie – it’s hard getting people to refer to the feature properly. Microsoft and its employees are some of the worst offenders truth be told.
Fast forward to this year’s MVP Summit which was right after PASS Summit. I was tipped off by a few birdies there was a change coming to AlwaysOn I would not be happy about – Always On (with a space). At first I thought they were trolling me and having a laugh – but no, this was real. SQL Server marketing in their “infinite wisdom” has decided to go back to the original spelling from here on out, which means for SQL Server 2016 (and possibly everything in 2012 and 2014), forget the no space. Ugh.
The new and improved Always On (with space) made its first official appearance in the December 15th blog post “Enhanced Always On Availability Groups in SQL Server 2016“. Now that it’s out there, I can talk about it. I’ve been sitting on this one for over a month. As one Microsoft PM once said to me, “You can report the news, but you can’t make it.”
I’ll cut right to the chase: as if we didn’t already have an AlwaysOn problem, marketing just made it worse. I mean this in the nicest possible way: I really think their heads are up their posterior. That won’t win me any favors, but it has to be said. What the hell are/were they thinking here? This change neither clarifies things or makes them better. It makes them WORSE. We’re taking a time hop back nearly 10 years.
When a bunch of us who fight this constantly found this fact out last month, we were all pretty livid. Marketing apparently lives in a bubble where they don’t give a hoot that people like me have to live with these asinine and uninformed decisions. No, I don’t expect them to consult me personally when they do something related to the SQL Server availability feature set, but it’s like they asked no one who even uses this stuff or thought about the downstream effect. I have a great working relationship with SQL dev, but I’m calling this ugly spade what it is: ridiculous.
What they should have probably done for SQL Server 2016 is something I would actually advocate (no, really): call the feature currently known as AGs – what most do already – AlwaysOn (no space). Drop the umbrella banner and just have HA features. FCIs will just be FCIs. I have been adhering to the rules marketing set up for 2012 and 2014, and I get penalized for it because they do a piss poor job enforcing it. PMs on TAP calls and in presentations regularly call AGs AlwaysOn with nary a word said. MS should train their own employees properly to be honest; it would solve many of my issues. Now marketing pulls this rabbit out of the hat? Unbelievable. Well, it is believable; I just have a hard time believing they consciously thought this was a winner of an idea. This almost is the equivalent of Bizarro and Bizarro World in the Superman universe.
A good friend and fellow MVP, Mike Steineke (Twitter | Blog), said two things today with regards to this which made me laugh but are true:
- AlwaysOn naming is as random and complicated as SQL licensing
- “Lucas thought Jar Jar Binks was a good idea too… Meesa like AlwaysOn” – something one of his employees said and appropriate given the opening of Star Wars Episode VII The Force Awakens this week
I think those two quotes sum up my feelings pretty well.
EDIT: As has been pointed out by two comments, the real goal here is consistency. Failure to be consistent leads to confusion and problems (read the comments for more).
Issue Two: Services/Applications/Roles/Resource Groups and Windows Server Failover Clusters
SQL Server is not alone in its penchant to do things that are detrimental to customers and understanding the platform. Back in the day pre-Windows Server 2008, at the Windows level, we had the concept of resource groups. Simply put, they were basically containers for clustered resources. For example, if you installed a clustered instance of SQL Server 2005, its resources would be in its own resource group. In Cluster Administrator (the GUI-based management tool), they were referred to as groups. The command line cluster.exe also used the group nomenclature. So far, so good. Figure 2 shows what that looks like with a clustered default instance of SQL Server 2005 named BATMOBILE.
In Windows Server 2008, we got a new GUI administration tool – Failover Cluster Manager (FCM). Lo and behold, they changed the name for things in the GUI to be Services and Applications, not groups. But cluster.exe and in Windows Server 2008 R2, the PowerShell cmdlets (such as Get-ClusterGroup), refer to groups. For people who don’t know Windows Server Failover Clusters (WSFCs), this can be confusing (more on that in a minute), and if you come from 2003, it’s a problem as well. The reality is that underneath the covers it’s still a resource group. The concept never changed. There are other issues I won’t get into in this blog post. The groups –> Services and applications thing can be seen in Figure 3 which shows another default FCI named AJA.
In Windows Server 2012, this was then changed in FCM to be Role(s), but again, any command line still refers to groups. Figure 4 shows a named clustered instance of SQL Server named MINI\DISC.
Why is the UI and the command line out of sync? There honestly isn’t a good reason other than it is what it is. As someone who trains and speaks all the time, I need to make that connection for people between what you do in PowerShell and why it is different than the UI management tool. That also usually means I have to show Windows Server 2003 to people – something I’m sure they do NOT want me doing. I’m sure we’ve all seen the 1,000,001 WINDOWS SERVER 2003 IS GOING OUT OF SUPPORT – UPGRADE NOW e-mails, ads, Tweets, etc. from various folks including Microsoft.
I’ve been talking to the Cluster PMs about this since Windows Server 2012, and with Windows Server 2016 since it wasn’t done, formally filed a bug which was recently closed as will not fix in this version – maybe in the future. Look, I get it. It’s basically a visual thing which does not affect functionality and it’s not broken, and I’m sure there are things that need time allocated to fix which are more important. However, it leads to confusion when showing PowerShell (“What’s a group?”). It’s bad enough dealing with the 2008 R2 to 2012 questions of the Service/Application change to Role. Three versions into this new Role cycle, I’m calling enough is enough. Usability is a concern.
So if they want me to continue to show Windows Server 2003, don’t fix this. Until they do, Windows Server 2003 will be shown in some of my presentations and classes because many people learn visually and need to see that progression. Either that or change the UI back to Group or Resource Group. I’d be OK with that, too.
The Bottom Line
As you can tell, I’m not a very happy camper. Windows and SQL Server have given themselves enough rope to hang themselves on these two issues. SQL Server’s issue is completely self inflicted and causing a lot of confusion out in the world, and has for the better part of five years. I can only hope they come to their senses before SQL Server 2016 RTMs. As for Windows, like I said, I get things that are truly broken need to be fixed. But at this point when you want to put 2003 to rest, you can’t do it half assed. There’s still time before Windows Server 2016 RTMs. I’m not asking for this change to roll back to 2012 or 2012 R2.
I can’t reiterate enough that I am writing this from a standpoint of passion and caring, but the truth hurts sometimes. No one likes hearing their baby (in this case, their work) is ugly. Unlike animals and humans, these problems I discussed here are not life and death, and can be fixed. The ball is in your court, Microsoft. Do the right thing. I would love to see comments below because I think Microsoft needs to hear it from more than just me. Many voices affect change.
Bravo Allan!
Thanks for saying what I’ve wanted to say for a long time!
I’m not shy about saying this stuff – I say it all the time. But this new incarnation pushed me over the edge. Enough is enough.
Honestly, the marketing team has to be trolling you. It’s the only explanation.
Both the SQL & WSFC terminology drive me nuts. We have WSFCs on 2008 through 2012R2. Talking someone through a “how to” is frustrating when you have to remember what terminology is used in what version. Not to mention the outlandish things people come up with because “Always[ ]On Availability Group” is too verbose (and “AG” is too simple?).
Consistent branding & terminology is important.
If it were only that easy of an explanation (marketing trolling me).
I totally agree that consistent branding and use of terminology is key. In any given situation, I could have 10 people talk about the same thing 10 different ways because of this. I spend a lot of time in classes, presentations, and on engagements making sure we’re all on the same page or people walk away thinking something they shouldn’t. And therein lies the rub and goes back to the consistency comment. Minor changes to terminology can be a major headache from version to version for everyone.
Thanks Alan for taking the time to explain it in detail. Consistency in naming things is super important. If things are named randomly for marketing reasons mistakes will be made which when it comes to HA/DR can be fatal.
Consistency is key as I mentioned in my response to Andy. SQL Server marketing doesn’t seem to grasp that.
I agree, I don’t like the change for change sake. If they really don’t like it is it so hard to just stick it out without a space for the next decade until the next shiny thing comes along and this slips into obscurity?
But to be fair I haven’t had customers talk much about AlwaysOn, they still refer generically to HA or “clusters”. For the most part they still do not know Availability Groups exist, though we often convince them to move over, but coordinating management between Infrastructure and DBAs is extremely complicated no thanks to Microsoft.
“Please manage our traditional clusters. But if it’s a super magic AOAG cluster which looks almost exactly the same, then you shouldn’t. Well we need you to help when it’s broken, but you can’t fail it over in the cluster manager when you’re patching because Microsoft says it’s unsafe (even though there is probably a trivial way to make it safe if Microsoft would only document it), and you can’t alter the failover properties in cluster manager either because Microsoft says that’s unsafe too (although I’ve never seen documentation on how to ever detect if that has happened by accident and then how to rectify it).”
I wish those problems got more attention.
AOAG – ugh (see post) 😉 All kidding aside, you should only ever fail an AG over to another node using SQL Server, not FCM or PS. Failovers in a WSFC are easy enough to detect, but many customers do not have monitoring to do it or refuse to do it. Double edged sword.
AlwaysOn needs to go, its misleading. Exec’s just see this and go well sql is “always online” and get all confused when there is a few seconds downtime between AG fail overs. Should be MostlyOnOffForAShortWhileBackOnAgain
There is also that aspect. The irony for me with using AlwaysOn is that you need to stop and start SQL Server to enable it … definitely not “always on”.
Excellent post, thank you for expressing the frustration of many people so eloquently and concisely Alan.
You have to ask yourself, do Microsoft actually pay people to come up with these crackpot ideas?
A similar example of this marketing magic, the re-use of the MCSE certification moniker.
Cue endless confusion between “Microsoft Certified System Engineer” and “Microsoft Certified Solutions Expert”.
DAC anyone? 🙂
DAC? MCSE? Acronym overload? Not surprising. This is a problem that was identified back in the late 1980s and has, as yet, not been addressed. See http://catb.org/jargon/html/T/TLA.html
😉
As for the “AlwaysOn” + Microsft personnel being the worst for the misuse… This. What’s worse is that they have no idea that it’s wrong, or why it’s wrong.
My guess, MS creates these confusions to help their sub-market, i.e. the consultants setting it all up. Since most of them are certified, they must hope these consultants have worked through the confusion faze. This is a misconception which only gets worse launching a new set of terminology every 2 ( or 4 ) years.
Just wondering: How will any non-god consumer or dba handle this all?
Well said, my friend. There is no reasonable justification for the lack of consistency. Back to the Twitter thread, you did this post with obvious kindness.
Now I understand the point you tried to make this afternoon and where you are coming from. I do recall now the beginning of all this from TechEd 2006. I’m curious why would SQL Sentry would label the AG feature AlwaysOn. They do refer to AG in other UI sections except the main tab label. Anyway, thanks for the history lesson! 😉
I can’t stop laughing reading this.
Honestly, you are right about MS should train their people..
Cheers
~ kunto ~