Windows Server Failover Cluster Archives - Page 2 of 2

March 29, 2018

Revisiting Storage Spaces Direct and SQL Server FCIs

Let me be up front here: I really do like Storage Spaces Direct (S2D). I’ve been talking about it a lot in presentations (including just recently at SQLBits – click here to see the video), and first mentioned it in more detail back in 2016 around a benchmark with SQL Server. It makes implementing failover cluster instances (FCIs) *so* much easier (especially under virtualization) and it adds a lot of IO scalability when done right. It’s a great architecture if you have the need. We’ve helped customers plan it, and if you need we can help you, too. Just contact us.

For those of you unfamilar with S2D, it’s a way to create storage for FCIs using storage that is local to the nodes themselves – not via SAN, iSCSI, etc. All of the official Microsoft documentation can be found here (no real mention of SQL Server), and if you want to know more, see my presentation linked above. I also teach S2D in some of my classes, where you may even get hands on experience with it.

That said, there are a few gotchas, some of which unfortunately affect SQL Server architectures – one of which could be a temporary showstopper for some, at least in its current form in Windows Server 2016. Let me explain.

Arguably, the biggest thing about S2D is that the solutions currently have to be certified (see this bit of documentation from MS for more detail). This obviously doesn’t really affect, say, virtualized versions or ones up in the public cloud such as in Azure in a meaningful way, but it’s still technically a requirement much like logoed hardware for Windows Server supportability. Anyone want to point me to the logo stamped on your VMs? Didn’t think so. Now, from a pure FCI perspective none of this is an issue. The way a Windows Server failover cluster (WSFC) is currently designed, it is expecting that all nodes participating in the WSFC are also using/needed S2D. Why am I mentioning this? Disaster recovery.

By its nature as needing shared storage, FCIs are more often than not a “local” HA solution and you add something else to it to make it business continuity friendly like an AG, log shipping, underlying storage-based replication which is transparent to the FCI, etc. For an FCI using traditional storage methods such as a SAN, none of this is a problem. It all just works and is supported just fine. Storage Replica in Windows Server 2016 (also Datacenter Edition like S2D) is also a great option for FCIs.

However, Storage Replica and Storage Spaces Direct cannot and do not work currently together in Windows Server 2016, and I am hoping they do in the future. SR cannot stretch an S2D WSFC right now. That means the D/R aspect of S2Ds with FCIs has to come in some other way. Here’s the Uservoice item for stretching an S2D WSFC to another site. It should be noted that as of the writing of this blog post, Windows Server 2019 was recently announced but nothing I’ve seen publcly points to SR and S2D working together in that upcoming release, nor would I bank on that.

Let’s add AGs to the party. In a traditional Windows Server-based AG, an instance participating as a replica can be standalone or an FCI. FCI + AG is a fully supported architecture from a pure WSFC and SQL Server standpoint right now. S2D complicates this, though. Why?

S2D WSFC configurations currently do not expect a non-S2D node. So if you have an FCI across n WSFC nodes with S2D for storage, and then want to extend that by an AG, it’s technically not an architecture that is currently supported in the traditional sense. Does it work? Yes, I’ve configured it. Will Microsoft give you best effort support if you’ve done this and run into an issue? Absolutely but at some point if it’s a serious issue, they could say “This isn’t a supported architecture, implement something supported.”

S2D across multiple sites is not an architecure that is currently supported, either. FCI + AG would be stretching the S2D WSFC, albeit in a non-traditional way. Since stretched S2D clusters are not supported anyway … it’s “complicated”. Yes, from a pure SQL Server point of view (and my perspective, quite frankly), the FCI + AG techically isn’t a stretched S2D WSFC, but in a way it is because of the mixing of S2D and non-S2D nodes in the same S2D WSFC. I can see both sides of the coin here.

If you try to add a non-S2D node to a WSFC configured to use S2D for storage, here’s what you see during the validation process in Windows Server 2016:

Running validation for a new node in an S2D WSFC

How do you achieve D/R right now with S2D-based FCIs? Outside of platform-based methods for things like virtualization and IaaS, at the OS/SQL Server layer, log shipping is the easiest way. No clustered anything needs to be involved. SQL Server 2017 has the NONE-type AG which does not require any clustering whatsoever, but I’d avoid that for reasons I’ve discussed and talked about elsewhere. A Distributed AG would also technically work since it would span two different WSFCs; it is not a stretched single WSFC with S2D. Only the nodes participating in the S2D-based FCI would be in the same WSFC. Distributed AGs are an Enterprise Edition-only feature, which rules it out for many people.

Bottom line: right now: if you want to combine FCIs and AGs, the safest and currently the only fully supported way is to use traditional storage methods for your FCIs that are not S2D-based. I really hope that Microsoft officially supports the FCI + AG scenario where the FCI uses S2D in the near future. Stay tuned!

March 26, 2018

Mission Critical Moment #2 Is Here

I just got done uploading the second in our new, free video series and it’s now live. This time around I’m tackling a question I get all the time as well as a problem I see – FCIs when people virtualize as well as in the public cloud. Click here to see it.

June 14, 2017

Supporting Our Troops – Win a Seat in the Upcoming Classes In Chicago

Nearly every year since we’ve been offering public training, we’ve given away one seat in a class for a good cause or to people who would not ordinarily be able to attend. As an example, last year was WIT. Max and I really believe that giving back is important. Sure, we’re taking money out of our own pockets but it’s the right thing to do. SQLHA really does put our proverbial money where our mouth is; we do not take giving back lightly.
We’re happy to announce that this year, we’ll be giving away one seat for each of the upcoming Chicago classes: SQL Server Availability Fundamentals on August 7 and the Always On Availability Groups Boot Camp from August 8 – 10. This year, we are also focusing on a specific group like last year with WIT. This time around it’s active duty or retired military members (gender does NOT matter). As long as you did or currently do serve in a branch of the military, you are eligible.

Good luck if you enter!

The Rules

You have to be able to prove that you are or were in one of the branches of the military.
Send an e-mail to sales at sqlha dot com with the subject Supporting Our Troops and tell us which class you are interested (it’s possible for one person to get both, but if you put both, specify a preference) along with why you think you deserve the seat and how it would impact you. You really have to demonstrate that you truly would benefit from attending.
You do not have to send a tome, but one or two lines won’t cut it either. The grammar police won’t hold it against you if your e-mail is not up to snuff; we prefer heartfelt over perfect. Having said that, see #10 of The Fine Print. There is one exception.
Entries must be in by Friday, June 30, at 5PM Eastern. A winner will be chosen and notified by Wednesday, July 5.
Do not make or send a video, write a Word document, etc.; that will disqualify you. This should be e-mail only.

The Fine Print

One entry per person.
Winners will not be eligible for a free seat in a future class and are ineligible for winning any other free SQLHA LLC giveaway for 12 months after winning the seat in the class (excluding any giveaways in the class). If you cannot attend the class where you are chosen as a winner, you forfeit the prize.
Do not enter if you cannot attend; it is not fair to those who can and a waste of everyone’s time.
You (or your company) are responsible for all travel and expenses including, but not limited to: airfare, taxis, food, hotel, and so on. If you cannot meet this obligation for the class you are thinking of entering, please save it for one you can. We’re just providing the class, not the whole shebang.
Entries without the proper subject will be disqualified. Sorry.
While we do not have delicate sensibilities, keep your entries clean.
You are responsible for any taxes you may need to pay as a result of winning this contest.
You must be eligible to win. For example, some who work in certain jobs or roles would be ineligible. Know if you can before you enter. I apologize in advance if what you do rules you out, but we don’t want to waste anyone’s time or cause issues for you OR us.
All entries must be in English.
While we understand that writing is not everyone’s forte, anyone who uses text speak such as ur will be disqualified as well. We have to have some standards, you know.

Not in the military?

We have a few seats left in each class, so don’t miss your opportunity. Use the discount code HOORA20 to get 20% off (which is bigger than the current built-in discount) before you miss out.

Why SQLHA for your training needs?

Show your boss this. We offer the best in person training which includes labs. Keeping our class sizes smaller, our classes have great interaction. Between the labs and being in the room with one of the world’s recognized experts for availability on SQL Server, it doesn’t get much better.

February 22, 2017

Always On Availability Groups with No Underlying Cluster in SQL Server v.Next

UPDATED 2/22/17 in the afternoon

With a lot of the focus seemingly going to the Linux version of SQL Server v.Next (including the inclusion of Always On Availability Groups [AGs] in the recently released CTP 1.3) , I don’t think that a lot of love is being showered on the Windows side. There are a few enhancements for SQL Server availability, and this is the first of some upcoming blog posts.

A quick history lesson: through SQL Server 2016, we have three main variants of AGs:

“Regular” AGs (i.e. the ones deployed using an underlying Windows Server failover cluster [WSFC] requiring Active Directory [AD]; SQL Server 2012+)
AGs that can be deployed without AD, but using a WSFC and certificates (SQL Server 2016+ with Windows Server 2016+)
Distributed AGs (SQL Server 2016+)

SQL Server v.Next (download the bits here) adds another variant which is, to a degree, a side effect of how things can be deployed in Linux: AGs with no underlying cluster. In the case of a Windows Server-based install, this means that there could be no WSFC, and for Linux, currently no Pacemaker. Go ahead – let that sink in. Clusterless AGs, which is the dream for many folks (but as expected, there’s no free lunch which I will discuss later). I’ve known about this feature since November 2016, but for obvious reasons, couldn’t say anything. Now that it’s officially in CTP 1.3, I can talk about it publicly.

Shall we have a look?

I set up two standalone Windows Server 2016 servers (vNextN1, vNextN2). Neither is connected to a domain. Figure 1 shows the info for vNextN1 (for all pictures, click to make bigger).

Figure 1. Standalone server not domain joined

Using Configuration Manager, I enabled the AG feature. Prior to v.Next, you could not continue at this point since there was no WSFC; you would be blocked and get an error. However, in v.Next, you will get what is seen in Figure 2. It still indicates that there is no WSFC, but it happily allows you to enable the AG feature.

Figure 2. Enabling the AG feature in v.Next

After enabling and restarting the instance, you can clearly see in Figure 3 that the AG feature is enabled. We’re not in Kansas anymore, Toto.

Figure 3. AG feature is enabled, but no WSFC

Just to prove that there is no magic, if you look in Windows Server, the underlying feature needed for a WSFC is not enabled which means a WSFC cannot be configured. This is seen in Figure 4.

Figure 4. Failover clustering feature is not installed – no WSFC!

In SQL Server, configuring this is done via T-SQL and is similar to how it is done for AD-less AGs with Workgroup WSFCs in SQL Server 2016/Windows Server 2016. In other words, you’re using certificates. In addition to certificates, there is a new clause/option (CLUSTER_TYPE) that exists in the CREATE and ALTER AVAILABILITY GROUP T-SQL. Unlike the Linux example in documentation which shows how to use the CLUSTER_TYPE syntax, I altered the syntax I’ve been using for the AD-less AGs with certificates since it is basically the same and I did not use seeding (you can if you want); I manually restored the database AGDB1. I created an AG called AGNOCLUSTER. This can be seen in Figures 5 and 6.

Figure 5. vNextN1 as the primary replica of AGNOCLUSTER

Figure 6. vNextN2 as the secondary replica of AGNOCLUSTER

To support this new functionality, there are new columns in the DMV sys.availability_groups – cluster_type and cluster_type_desc. Both can be seen in Figure 7. You will also get an entry in sys.availability_groups_cluster with this new cluster_type (also a new column there).

Figure 7. New columns in sys.availability_groups

So what are the major restrictions and gotchas?

This new AG variant is NOT considered a valid high availablity or disaster recovery configuration without an underlying cluster (WSFC for Windows Server or currently Pacemaker on Linux). It is meant more for read only scenarios, which means it’s more meant for Enterprise Edition than Standard Edition. I cannot stress enough this is NOT a real HA configuration.
UPDATE – A major reason this is not a real HA configuration is that there is no way to guarantee zero data loss without you first pausing the primary and ensuring that the secondary replica is in sync (or replicas, as the case may be).
Having said #1, you can do a manual failover from a primary to a secondary. This would be true even in the case of an underlying server failure, hence this not being really a true availability configuration: there’s no cluster and the mechanism (sp_server_diagnostics) to detect and handle the failure.
Since there’s no underlying cluster, you can’t have a Listener. This should be painfully obvious. This also means that you will connect directly to any secondary replica for reading. This makes it possibly less interesting for read only scenarios, but again, did you expect you’d get everything?
Since there is no Standard Edition version of the CTP, it is unknown if this will work with Standard Edition in SQL Server v.Next. I would assume it will, but we’ll see when v.Next is released.
UPDATE – This also can most likely be used for migration scenarios (arguably it will be the #1 use), which I will talk about in a Windows/Linux cross platform blog post soon.
UPDATE – This is not the replacement for database mirroring (DBM). That is/was putting AGs in Standard Edition in SQL Server 2016 even though it requires a WSFC. You get so much more that you really should stop using DBM. It’s been deprecated since SQL Server 2012 and they could pull it at any time (and I’m hoping it’s gone in v.Next).

Keep in mind this is how things are now, but I don’t see much, if anything, changing whenever RTM occurs.

I won’t really be talking about this configuration this weekend at SQL Saturday Boston or at SQL Saturday Chicago in March, but I will be talking about it a little bit at both the SQL Server User Group in London on March 29th and at the SQL Server User Group in Dublin on April 4. I will be covering this configuration in more detail as part of my Training Day at SQL Bits 16 – “Modern SQL Server Availability and Storage Solutions” (sign up for it today – seats going fast!). It will also be part of my upcoming new SQLHAU course Always On Availability Groups Boot Camp coming up in August, and incorporated into the 4-day Mission Critical SQL Server class (next scheduled delivery is in December).

December 16, 2015

Dear Microsoft: I Love You But You’re Driving Me Batty

If Microsoft and I had I relationship status, it would be “it’s complicated”. As a Microsoft MVP and someone who specializes in all things Windows and SQL Server (within reason … there’s way too much for one person to literally know everything), I make my bread and butter on their platforms. Sure, I’ve used Unix and other RDBMSes over the years, too – but not like SQL Server. I speak VMware very well (not just Hyper-V). While I don’t want to bite the hand that feeds me, anyone who knows me is aware that I’m outspoken. Some people love it, other people have an issue with it.

I try to be honest but not mean (despite what people may think – and you’d know when I’m being mean, trust me), but the reality is some people just can’t handle the truth so you pussyfoot around it. I come from a place where facts rule the day and I try not to let emotion cloud my judgement, but some of Microsoft’s recent decisions have me a bit puzzled, and quite frankly annoyed. What MS decides directly affects not only how I do my day job as a consultant and educator, but any one using Windows and/or SQL Server. In this post, I’m going to focus on two issues because I think Redmond needs some tough love and a dose of reality from someone who cares and is passionate about both Windows and SQL Server.

Issue One: Always On

Yes, you read that right – Always <space> On. It’s almost a joke now with me and AlwaysOn, but it ceased to be funny a long time ago. It’s funny in the “haha” sense, and you can see by my Twitter feed people troll me all the time. I’m used to it.

You may have seen my previous blog post “AlwaysOn Is the New Active/Passive and Active/Active“. If not, take a few minutes and read it because it sets this up. If you want the TL;DR approach here it is: AlwaysOn is not the AG feature. If that was the issue, I wouldn’t be writing a new blog post.

As noted in that blog post, the words “always” and “on” have a long history with SQL Server. Always On (with space) was the initial designation for the highly available storage for use with SQL Server. So availability has always been linked; that is not in question. To quote this Hitachi whitepaper:

The Microsoft SQL Server Storage Solution Review Program is a specific SQL Server program that enables storage solution providers to highlight those storage solutions and configurations, via the SQL Server “Always On” labeling, that they have successfully reviewed against core functional Microsoft SQL Server storage requirements. The core requirements defined herein must be met for reliable, highly available SQL Server storage systems.

Here are some other links from that era (there are more, but I think you’ll get the idea):

As I wrote about in my book Pro SQL Server 2005 High Availability (Apress, 2007) in Chapter 4 on page 122:

At TechEd 2006, Microsoft unveiled the SQL Server 2005 Always On program (http://www.microsoft.com/sql/alwayson/default.mspx) [note: this link does not work now; it did back then, so don’t try it], which is an umbrella that brings all the SQL Server availability technologies under one banner. Part of the Always On program is a partner program for storage solution partners. This specific portion of Always On is known SQL Server Storage Solution Review Program.

I write a bit more in that, but this was the first morphing of the Always On name, which now included clustered instances as well as database mirroring, log shipping, backup and restore, etc. By SQL Server 2008, the storage certification portion of Always On was killed and it was just the HA-related features (FCIs, DBM, and log shipping clearly mentioned) as evidenced in this SQL Server 2008 OLTP marketing PDF.

Enter the feature introduced in SQL Server 2012: availability groups (AGs). Yes, AlwaysOn was a name for the feature for about five minutes in its development cycle as noted in my other blog post, but ultimately marketing decided to name the feature to be AGs and now had a new umbrella moniker for only FCIs and AGs: AlwaysOn. So the official names for those features are/were (we’ll get to were in a minute) AlwaysOn Failover Cluster Instances and AlwaysOn Availability Groups. So AlwaysOn has no space in 2012. This can be seen here and a screen grab is below in Figure 1. So out are DBM, log shipping, etc from the AlwaysOn moniker. And for heaven’s sake, it’s not AOAG. It’s just AG. We don’t say AOFCI – it’s just FCI (which is clearly seen at the link and in the screen grab; click to make it bigger). Precedence, people. Follow it.

Figure 1. AlwaysOn defined in SQL Server 2012 Books Online

My frustration with people using AlwaysOn improperly as the AG feature is well documented in the other blog post, and that issue has been occurring now for the better part of five years at this point. I’m not going to lie – it’s hard getting people to refer to the feature properly. Microsoft and its employees are some of the worst offenders truth be told.

Fast forward to this year’s MVP Summit which was right after PASS Summit. I was tipped off by a few birdies there was a change coming to AlwaysOn I would not be happy about – Always On (with a space). At first I thought they were trolling me and having a laugh – but no, this was real. SQL Server marketing in their “infinite wisdom” has decided to go back to the original spelling from here on out, which means for SQL Server 2016 (and possibly everything in 2012 and 2014), forget the no space. Ugh.

The new and improved Always On (with space) made its first official appearance in the December 15th blog post “Enhanced Always On Availability Groups in SQL Server 2016“. Now that it’s out there, I can talk about it. I’ve been sitting on this one for over a month. As one Microsoft PM once said to me, “You can report the news, but you can’t make it.”

I’ll cut right to the chase: as if we didn’t already have an AlwaysOn problem, marketing just made it worse. I mean this in the nicest possible way: I really think their heads are up their posterior. That won’t win me any favors, but it has to be said. What the hell are/were they thinking here? This change neither clarifies things or makes them better. It makes them WORSE. We’re taking a time hop back nearly 10 years.

When a bunch of us who fight this constantly found this fact out last month, we were all pretty livid. Marketing apparently lives in a bubble where they don’t give a hoot that people like me have to live with these asinine and uninformed decisions. No, I don’t expect them to consult me personally when they do something related to the SQL Server availability feature set, but it’s like they asked no one who even uses this stuff or thought about the downstream effect. I have a great working relationship with SQL dev, but I’m calling this ugly spade what it is: ridiculous.

What they should have probably done for SQL Server 2016 is something I would actually advocate (no, really): call the feature currently known as AGs – what most do already – AlwaysOn (no space). Drop the umbrella banner and just have HA features. FCIs will just be FCIs. I have been adhering to the rules marketing set up for 2012 and 2014, and I get penalized for it because they do a piss poor job enforcing it. PMs on TAP calls and in presentations regularly call AGs AlwaysOn with nary a word said. MS should train their own employees properly to be honest; it would solve many of my issues. Now marketing pulls this rabbit out of the hat? Unbelievable. Well, it is believable; I just have a hard time believing they consciously thought this was a winner of an idea. This almost is the equivalent of Bizarro and Bizarro World in the Superman universe.

A good friend and fellow MVP, Mike Steineke (Twitter | Blog), said two things today with regards to this which made me laugh but are true:

AlwaysOn naming is as random and complicated as SQL licensing
“Lucas thought Jar Jar Binks was a good idea too… Meesa like AlwaysOn” – something one of his employees said and appropriate given the opening of Star Wars Episode VII The Force Awakens this week

I think those two quotes sum up my feelings pretty well.

EDIT: As has been pointed out by two comments, the real goal here is consistency. Failure to be consistent leads to confusion and problems (read the comments for more).

Issue Two: Services/Applications/Roles/Resource Groups and Windows Server Failover Clusters

SQL Server is not alone in its penchant to do things that are detrimental to customers and understanding the platform. Back in the day pre-Windows Server 2008, at the Windows level, we had the concept of resource groups. Simply put, they were basically containers for clustered resources. For example, if you installed a clustered instance of SQL Server 2005, its resources would be in its own resource group. In Cluster Administrator (the GUI-based management tool), they were referred to as groups. The command line cluster.exe also used the group nomenclature. So far, so good. Figure 2 shows what that looks like with a clustered default instance of SQL Server 2005 named BATMOBILE.

Figure 2. Cluster Administrator in Windows Server 2003

In Windows Server 2008, we got a new GUI administration tool – Failover Cluster Manager (FCM). Lo and behold, they changed the name for things in the GUI to be Services and Applications, not groups. But cluster.exe and in Windows Server 2008 R2, the PowerShell cmdlets (such as Get-ClusterGroup), refer to groups. For people who don’t know Windows Server Failover Clusters (WSFCs), this can be confusing (more on that in a minute), and if you come from 2003, it’s a problem as well. The reality is that underneath the covers it’s still a resource group. The concept never changed. There are other issues I won’t get into in this blog post. The groups –> Services and applications thing can be seen in Figure 3 which shows another default FCI named AJA.

Figure 3. Failover Cluster Manager in Windows Server 2008 R2

In Windows Server 2012, this was then changed in FCM to be Role(s), but again, any command line still refers to groups. Figure 4 shows a named clustered instance of SQL Server named MINI\DISC.

Figure 4. Failover Cluster Manager in Windows Server 2012

Why is the UI and the command line out of sync? There honestly isn’t a good reason other than it is what it is. As someone who trains and speaks all the time, I need to make that connection for people between what you do in PowerShell and why it is different than the UI management tool. That also usually means I have to show Windows Server 2003 to people – something I’m sure they do NOT want me doing. I’m sure we’ve all seen the 1,000,001 WINDOWS SERVER 2003 IS GOING OUT OF SUPPORT – UPGRADE NOW e-mails, ads, Tweets, etc. from various folks including Microsoft.

I’ve been talking to the Cluster PMs about this since Windows Server 2012, and with Windows Server 2016 since it wasn’t done, formally filed a bug which was recently closed as will not fix in this version – maybe in the future. Look, I get it. It’s basically a visual thing which does not affect functionality and it’s not broken, and I’m sure there are things that need time allocated to fix which are more important. However, it leads to confusion when showing PowerShell (“What’s a group?”). It’s bad enough dealing with the 2008 R2 to 2012 questions of the Service/Application change to Role. Three versions into this new Role cycle, I’m calling enough is enough. Usability is a concern.

So if they want me to continue to show Windows Server 2003, don’t fix this. Until they do, Windows Server 2003 will be shown in some of my presentations and classes because many people learn visually and need to see that progression. Either that or change the UI back to Group or Resource Group. I’d be OK with that, too.

The Bottom Line

As you can tell, I’m not a very happy camper. Windows and SQL Server have given themselves enough rope to hang themselves on these two issues. SQL Server’s issue is completely self inflicted and causing a lot of confusion out in the world, and has for the better part of five years. I can only hope they come to their senses before SQL Server 2016 RTMs. As for Windows, like I said, I get things that are truly broken need to be fixed. But at this point when you want to put 2003 to rest, you can’t do it half assed. There’s still time before Windows Server 2016 RTMs. I’m not asking for this change to roll back to 2012 or 2012 R2.

I can’t reiterate enough that I am writing this from a standpoint of passion and caring, but the truth hurts sometimes. No one likes hearing their baby (in this case, their work) is ugly. Unlike animals and humans, these problems I discussed here are not life and death, and can be fixed. The ball is in your court, Microsoft. Do the right thing. I would love to see comments below because I think Microsoft needs to hear it from more than just me. Many voices affect change.

Categories