By: Allan Hirt on September 13, 2018 in Disaster Recovery, High Availability | No Comments
With Hurricane Florence heading towards the Carolinas here in the US and dominating a lot of the news, it puts many things into the limelight – not the least of which is this question: is your business ready if such an event happened in your area?
Every part of the world has challenges with Mother Nature in some way. Here in the Northeastern US, we generally only see Nor’easters which can knock out power, the West Coast certainly gets earthquakes and there is a potential tsunami threat in places, and so on. You get the idea. There is only so much you can do in a physical data center to counteract all of this. Man made events, including hacking, also falls under this disaster recovery category, but there are defenses you can generally account for in most (but not all) scenarios. You cannot stop a hurricane coming at 140 miles per hour; you can have redundant links to prevent a network outage if your telco cuts a trunk.
The reality is you can never protect against every single scenario planned or unplanned, but you can do your best to ensure that once the event is gone, people are able to get into work, and life starts to get back to relative normal, you have a business to come back to. The famous shots in newscasts and in pictures of people boarding up houses and businesses is one way to do this; the goal is to minimize physical damage. But do you:
- Have a plan to properly shut down your systems and bring them back up?
- Have a way to restore or rebuild your physical systems/servers in the event they are destroyed?
If the answer to both of these is not a resounding “Yes!”, that’s a problem. For over 20 years I’ve been in the availability business. I’ve helped customers of all sizes from small shops to large enterprise companies. FCIs, AGs, log shipping, etc. – all are great features. But when floods take out your data center, what do you have? For the most part, dead servers which most likely need to be replaced (and possibly the data center, too). You need to start with backups and/or the software to rebuild those systems.
The good news is that with the rise of the public cloud (Amazon’s various offerings, Microsoft’s Azure, GCP), disaster recovery is not completely impossible. Even just storing backups in “cold” storage up in the cloud makes them available in a way that was not an option not too long ago. Most companies, such as Microsoft, can get you the software you’re licensed to use via websites and no longer are you relying on DVDs. Heck, you can build systems up in the public cloud with IaaS VMs that extend your on premises solutions that you can flip to in the event of your main data center not being online.
In a worst case scenario, assuming your SQL Server databases are not hundreds of terabytes or petabytes, back that stuff up to an external disk and take it with you. Of course, you should protect it properly (i.e. encrypted backups, password protected, etc.) since you do not want sensitive data falling into the wrong hands, but for heaven’s sake DO SOMETHING!
Do all of these things have costs, especially cloud-based solutions? You bet! In no way am I claiming is any of this free, but what is the cost associated with bringing you business back online? What is the cost if it cannot come back online? The cost of “cold” storage in the cloud is much less than never coming back.
If you’re in the path of Florence, I truly hope you are safe and that its effects are not devastating. If you want to ensure your business is resilient, it’s not too late to start thinking about how to protect yourself in a worst case scenario. Contact us today to devise the right disaster recovery strategy for business continuity. We can help you minimize and possibly eliminiate your downtime.
By: Allan Hirt on July 20, 2018 in SQLCareer | No Comments
My friend Steve Jones (blog | Twitter) asked for SQL Server professionals to write four blog posts describe those days. His request can be found here. People are using the #SQLCareer hashtag if you want to see what other folks are posting.
As some of you know, Max Myrick and I are both managing partners at SQLHA. We don’t have other employees, so we do everything. I’m not a traditional DBA, or IT admin, or anything for that matter. When working with customers, I do a lot of different things. If I’m not onsite, my schedule can vary depending on what is going on with the business, our customers, and so on. When onsite I have more standard hours like anyone going into an office would. I’m adaptable.
11:00 AM (July 19) – Wake Up
Yes, I live on the East Coast. This isn’t some alternate time zone. But I also generally work until very late at night and am naturally nocturnal. If I don’t have any early meetings or stuff going on (doctor’s appointment, et al.), I tend not not up early. If I’m up before 10, it’s generally because I’m a) still up (which means I’ll need some kind of power nap in the afternoon) or b) see previous sentence – some kind of commitment. It’s that simple. When most people are getting up at 4 or 5 AM, I’m either still up or just thinking about maybe going to bed. It’s not rare for me to go to sleep somewhere between 6 and 7 AM. I think I passed out around 6:30 on the 19th.
11:15 – 12:00 PM – E-mails and various tasks
I get a lot of e-mail over the course of teh day, but not a crushing amount. I can generally handle it when it comes in. I do think handling a lot of it early clears your plate to actually get stuff done.
12:00 – 1:00 – NDA Call #1
Sorry, if I discussed what I learned on this call, you’d have to disappear … or I may disappear. Neither is a pleasant thought, so dream up your own scenario of what was discussed.
1:00 PM – 1:15 – NDA Call #2
This one was not so successful. Audio was a mess, so I dropped off.
1:15 – 1:30 Grab a quick bite
I generally don’t eat much, if at all, during the day, to be honest. I also don’t drink coffee or tea. I don’t understand why people do but hey, everyone ticks differently. I don’t need much sleep and stay up all night. I just poured myself a bowl of Puffed Rice with cashew milk and also had an apple + cherry That’s It bar.
1:30 -2:00 Test new microphone
I have some recording to do soon and got another microphone (the Countryman B2D) to add to the arsenal. This one I’m trying to see if it will eliminate a certain problem I’m having. I learned the hard way (RTFM, Allan!) that my interface (the Audient iD14) needs to have AC power to give the mic phantom power; it can’t come via the USB bus. Boo!
2:00 – 5:00 – Continue Work on Review and Content for an Upcoming Whitepaper
I’m helping review an update to an existing paper and contribute some content to it at the same time. Can’t talk about it just yet, but it should be out soon. I have to update some other content, but that’s after this is done. Before you ask: this work was paid for by the ones publishing this paper.
5:15 – 5:45 – Talk with a vendor about some stuff we’re doing for SQLHA
You’ll all find out soon enough …
5:45 – 6:45 – Catch up with Max
We sync up nearly every day. And yes, this was using the phone and our voices. Some of us prefer to actually talk on it.
7:15 – 7:45 – Break/Dinner
I made myself a chicken salad sandwich with some chips and washed it down with some cherry juice and some Hot Tamales for dessert. Fun fact: you can get kosher canned chicken from Amazon.
8:00 – 12:30 AM (July 20) – Finished Review and Content for Said Upcoming Whitepaper
Sent out my edits so that’s off of my plate.
12:30- 1:00 – Get music in order for tonight’s gig
I’m playing a gig as part of a quintet tonight so I extracted the individual songs and made set-based PDFs so things will go smoothly. I use my older Surface Pro 3 with a bluetooth pedal as my sheet music viewer. I got a new stand that is specifically made for tablets but is still a normal music stand (it’s been a bit of a journey there … I’ve tried a ton of different things), and am excited to give it a whirl tonight.
1:15 – 2:00 – Work on this blog post
2:00 – 2:15 – Ice cream break
2:15 – 4:30 – Work on presentation for the South Jersey SQL Server User Group for Monday
I was going to just tweak an existing presentation, but shocker, I’m basically building it from scratch.
4:30 – 6:00(?) – Watch most of Windows Weekly until I pass out (I think it was 6ish)
Windows Weekly can sometimes be entertaining.
10:00 – Wake up
No alarm, body just did its thing.
10:05 – 10:45 – Do some e-mails which include some scheduling/logistics of things you guys will see soon
This is related to, but not the same as, some other stuff
10:45 – 11:00 – Pay some bills and search for more information (other than the Amazon pre-order) on the Batman : the Animated Series blu-ray coming in September.
The SDCC panel for BTAS I think is tomorrow so we’ll know more. Other than the kinda lame cover, I’m pretty excited. I’ve got one cel from my favorite episode as well as a pencil drawing from it (one is in my office, one in another room).
11:00 – 11:20 – Finalize and post this blog
I’m not going to post the rest of the day but I can tell you how it’s going to go:
- I have to run to MicroCenter in Cambridge to pick upa longer power cord for something
- I plan on finishing the SJSSUG presentation
- I will eat at some point
- I need to head to the gig around 6:15 PM to ensure I’m there and set up for downbeat at 7:30
- We’ll finsih somewhere between 9 and 10, after which I’ll get home around 11
I’ll do the other two days sometime soon …
By: Allan Hirt on April 25, 2018 in SQL Server, Windows Server 2019 | 1 Comment
Yesterday I blogged about enhancements to availablity groups in recent updates to SQL Server 2016 and 2017. Today it’s Windows Server’s turn.
Announced last month, Windows Server 2019 will be released later this year. I know there has been a lot of buzz about SQL Server on Linux since SQL Server 2017 was released, but most deployments I see are still on good ol’ Windows Server. There has been no announcement yet which versions of SQL Server will be supported on Windows Server 2019. That said, every version of Windows Server pushes things forward a bit for clustered configurations, and there are three specific improvements that I feel are good for WSFCs and SQL Server in general. All of the features I talked about are in Build 17650 or later which was released on April 24 (yesterday).
There is one new feature right now that I’d love to say works with SQL Server but it doesn’t – cluster sets. I hope they eventually will, but I doubt it will be in Windows Server 2019.
This one is a general Windows Server improvement that may benefit SQL Server deployments both clustered and non-clustered. I was poking around in setting a server up and formatting some disks. Lo and behold I saw this:
Figure 1. Disk allocations greater than 64K
Your eyes are not deceiving you – disk allocation sizes bigger than 64K! Interesting, no?
This one is specifically for Always On Failover Cluster Instances (FCIs) that are not using Storage Spaces Direct, but more traditional shared storage. WIndows Server 2016 introduced Storage Replica (SR), but it was only in Datacenter Edition. SR is just Microsoft’s implementation of disk-based replication that works with WSFCs. In Windows Server 2019, SR is coming to Standard Edition. It was introduced in Build 17639. However, there are currently some restrictions which will most likely rule it out for many, namely:
- SR replicates a single volume, not an unlimited number of volumes). So if you have one disk associated with your FCI, it’s great. If you have more than one, not so much.
- You can only have one partnership.
- The volume you are replicating can only be up to 2TB. That is probably the biggest dealbreaker for many SQL Server deployments.
As they note “We will continue to listen to your feedback and evaluate these settings through our telemetry during Insider previews of Windows Server 2019. These limitations may change several times during the preview phase and at RTM.” Make your voices heard – if you want to see this actually useful for SQL Server deployments, provide Microsoft with feedback. Follow the feedback info in the 17639 post I link above. Or just bother Ned Pyle on Twitter. He’ll love it 🙂
This improvement is for Always On Availablity Groups (AGs). Starting with SQL Server 2016, you could deploy an AG on a Windows Server Failover Cluster (WSFC) without a domain. That particular configuration of a WSFC is called a Workgroup Cluster, but its fatal flaw is that for a witness resource, you were stuck with either a disk (which invalidates the whole non-shared disk deployment model) or cloud witness (which not everyone can use). Well, Windows Server 2019 fixes that problem.
Announced on April 16, you can now use a file share witness (FSW) with a Workgroup Cluster. FSW prior to this specific implementation required a domain, hence the problem in Windows Server 2016. It’s all done with local ACLs on the file share, and when you create the FSW, you provide the credentials. This first screen grab is what you see when you first input the command to create the FSW
Figure 2. Adding a FSW in a Workgroup Cluster
This next one is when it is complete.
Figure 3. FSW successfully configured
All the gory details are in the blog post I link, but this is exciting stuff.
WSFCs no longer use NTLM and use Kerberos and certificates for authentication. This means that if you’re deploying Windows Server 2019 and your security folks want to disable NTLM, go right ahead.
Last, and certainly not least, is the one I’m proabably the most excited about. One of the questions I’m asked the most is “How can we change the domain for our FCI or AG configuration?” The answer has been to unconfigure/reconfigure. Going back to the early days of clustering Windows Server, you could not change the domain. That all changes in Windows Server 2019. We finally have the ability to move a WSFC from one domain to another and not have to reconfigure anything! This is big, folks. How to do it from a pure Windows Server perspective is documented here.
By: Allan Hirt on April 24, 2018 in Availability Groups, SQL Server 2016, SQL Server 2017 | No Comments
Hi everyone. If you’re using Always On Availability Groups (AGs), Microsoft has put a few improvements/fixes in recent patches that you should be aware of.
First and foremost, SQL Server 2016 Service Pack 2 was just released today. There are two major improvements in it for AGs:
1. SQL Server 2016 now has full Microsoft Distributed Transaction Coordinator (DTC) support. SQL Server 2016 had partial support for DTC with one of the two scenarios (cross instance/cross platform), but not intra-instance DBs. SQL Server 2017 had both, and now that was backported so SQL Server 2016 supports all DTC scenarios with AGs. This is great news.
2. Fixing Service Broker so that on failovers, AGs close open connections.
In SQL Server 2017 CU6, the big change is that the distribution database in replication can now be made highly available using AGs. There are a bunch of caveats, but this is a pretty big deal. See this link for all the info. As with full DTC support, they are going to port this back to a CU for SQL Server 2016 SP2, so SQL Server 2016 will also be getting this capability.
By: Allan Hirt on March 29, 2018 in Availability Groups, FCI, Storage Spaces Direct, Windows Server 2016, Windows Server Failover Cluster | 3 Comments
Let me be up front here: I really do like Storage Spaces Direct (S2D). I’ve been talking about it a lot in presentations (including just recently at SQLBits – click here to see the video), and first mentioned it in more detail back in 2016 around a benchmark with SQL Server. It makes implementing failover cluster instances (FCIs) *so* much easier (especially under virtualization) and it adds a lot of IO scalability when done right. It’s a great architecture if you have the need. We’ve helped customers plan it, and if you need we can help you, too. Just contact us.
For those of you unfamilar with S2D, it’s a way to create storage for FCIs using storage that is local to the nodes themselves – not via SAN, iSCSI, etc. All of the official Microsoft documentation can be found here (no real mention of SQL Server), and if you want to know more, see my presentation linked above. I also teach S2D in some of my classes, where you may even get hands on experience with it.
That said, there are a few gotchas, some of which unfortunately affect SQL Server architectures – one of which could be a temporary showstopper for some, at least in its current form in Windows Server 2016. Let me explain.
Arguably, the biggest thing about S2D is that the solutions currently have to be certified (see this bit of documentation from MS for more detail). This obviously doesn’t really affect, say, virtualized versions or ones up in the public cloud such as in Azure in a meaningful way, but it’s still technically a requirement much like logoed hardware for Windows Server supportability. Anyone want to point me to the logo stamped on your VMs? Didn’t think so. Now, from a pure FCI perspective none of this is an issue. The way a Windows Server failover cluster (WSFC) is currently designed, it is expecting that all nodes participating in the WSFC are also using/needed S2D. Why am I mentioning this? Disaster recovery.
By its nature as needing shared storage, FCIs are more often than not a “local” HA solution and you add something else to it to make it business continuity friendly like an AG, log shipping, underlying storage-based replication which is transparent to the FCI, etc. For an FCI using traditional storage methods such as a SAN, none of this is a problem. It all just works and is supported just fine. Storage Replica in Windows Server 2016 (also Datacenter Edition like S2D) is also a great option for FCIs.
However, Storage Replica and Storage Spaces Direct cannot and do not work currently together in Windows Server 2016, and I am hoping they do in the future. SR cannot stretch an S2D WSFC right now. That means the D/R aspect of S2Ds with FCIs has to come in some other way. Here’s the Uservoice item for stretching an S2D WSFC to another site. It should be noted that as of the writing of this blog post, Windows Server 2019 was recently announced but nothing I’ve seen publcly points to SR and S2D working together in that upcoming release, nor would I bank on that.
Let’s add AGs to the party. In a traditional Windows Server-based AG, an instance participating as a replica can be standalone or an FCI. FCI + AG is a fully supported architecture from a pure WSFC and SQL Server standpoint right now. S2D complicates this, though. Why?
S2D WSFC configurations currently do not expect a non-S2D node. So if you have an FCI across n WSFC nodes with S2D for storage, and then want to extend that by an AG, it’s technically not an architecture that is currently supported in the traditional sense. Does it work? Yes, I’ve configured it. Will Microsoft give you best effort support if you’ve done this and run into an issue? Absolutely but at some point if it’s a serious issue, they could say “This isn’t a supported architecture, implement something supported.”
S2D across multiple sites is not an architecure that is currently supported, either. FCI + AG would be stretching the S2D WSFC, albeit in a non-traditional way. Since stretched S2D clusters are not supported anyway … it’s “complicated”. Yes, from a pure SQL Server point of view (and my perspective, quite frankly), the FCI + AG techically isn’t a stretched S2D WSFC, but in a way it is because of the mixing of S2D and non-S2D nodes in the same S2D WSFC. I can see both sides of the coin here.
If you try to add a non-S2D node to a WSFC configured to use S2D for storage, here’s what you see during the validation process in Windows Server 2016:
Running validation for a new node in an S2D WSFC
How do you achieve D/R right now with S2D-based FCIs? Outside of platform-based methods for things like virtualization and IaaS, at the OS/SQL Server layer, log shipping is the easiest way. No clustered anything needs to be involved. SQL Server 2017 has the NONE-type AG which does not require any clustering whatsoever, but I’d avoid that for reasons I’ve discussed and talked about elsewhere. A Distributed AG would also technically work since it would span two different WSFCs; it is not a stretched single WSFC with S2D. Only the nodes participating in the S2D-based FCI would be in the same WSFC. Distributed AGs are an Enterprise Edition-only feature, which rules it out for many people.
Bottom line: right now: if you want to combine FCIs and AGs, the safest and currently the only fully supported way is to use traditional storage methods for your FCIs that are not S2D-based. I really hope that Microsoft officially supports the FCI + AG scenario where the FCI uses S2D in the near future. Stay tuned!