By: Allan Hirt on February 6, 2018 in Mission Critical Moment | 1 Comment
I’m proud to announce that the Mission Critical Moment is now live on SQLHA. You’re probably asking yourself, “So Allan, what exactly is the Mission Critical Moment?”
Max and I have been wanting to add some form of video content for quite some time, but wanted to think about the best way to put it out there. Throwing up content for the sake of it doesn’t work for us. I’ve done my share of recorded videos for other folks in the past, so I’m definitely no stranger to pre-recorded, non-live stuff.
Max and I came up with some guiding principles:
- The videos must be short (under 15 minutes, ideally 5 – 7), focused, lively, and easily consumable. In other words, they are bite sized morsels/nuggets where you don’t have to carve out long lengths of time to watch. They also need to be, where appropriate, a bit lighthearted. Not every Mission Critical Moment will be super serious and tackle a specific tip, trick, or bit of information that is the differerence between up and down
- The video content has to be free with no strings attached. You do not need to sign up for our newsletter to see the Mission Critical Moment, nor do you have to create a login to see these behind a gated wall. If you want to sign up for our newsletter, feel free to do so – we’d love that, but you shouldn’t have to be “part of the club” to see the Mission Critical Moment.
- The videos shouldn’t require action (within reason) on your part to do anything other than watch. Something like asking you to download it was out of the question for us.
- The videos should be easy to find. No digging around on Youtube or anything like that, which meant hosting them on our site directly.
- There will be at least one per month.
What are you waiting for? Click here to see #1. The first Mission Critical Moment was a lot of fun to do and its topic is something I am truly passionate about.
The Mission Critical Moment is the first in line of a bunch of exciting things SQLHA is rolling out in 2018.
By: Allan Hirt on January 30, 2018 in High Availability, Security | No Comments
Sometimes when I speak or in some of my writings, I discuss the cost of downtime and how knowing that number can help you devise a better solution. That number is often company-, and sometimes industry-specific. For example, if processing credit cards, a company may have a financial hit from the customer if it cannot process a transaction either fast enough or, worst case, not at all. That adds up when you have even a five or ten minute outage. Processing a credit card transaction is not the same as loss of life in a hospital, hence needing to account for a system and its solution individually.
However, as of this week, if you have a company or work in the UK, things just got a whole lot more interesting. The UK government officially released a statement on January 28 which affects “critical industries”. Long story short: if you fall under the classification which seems to be limited right now to energy, transport, water, and health firms, you could be fined up to £17 million ($24 million in US Dollars at today’s exchange rate) in the event of a cyber attack taking you down. It was the WannaCry outages that precipitated the response (as an example, FedEx says WannaCry cost them about $300 million US Dollars). Remember this doozie from British Airways? Also covered under this new Network and Information Systems (NIS) Directive; it’s not just about security, but includes other things like power outages, hardware failure, and environmental hazards.
The NIS Directive is effective as of May 10, 2018, and is essentially based on this Consultation on the Security of Network and Information Systems Directive from August of 2017, and the outcome/latest is the document Security of Network and Information Systems: Analysis of responses to public consultation which was just published. I wouldn’t be surprised to see other places around the world adopt a similar stance. For some this may proverbially add insult to injury since everyone is already dealing with GDPR which goes into effect May of 2018 as well.
I’ve always talked about how security is a key component of availability. The UK government is literally putting their money where their mouth is. The NIS Directive isn’t meant to start with fines. The press release states the following:
Fines would be a last resort and will not apply to operators which have assessed the risks adequately, taken appropriate security measures and engaged with regulators but still suffered an attack.
That is actually good news – it’s not shoot, aim, fire. However, what that means is that you need to do the right steps to be prepared to avoid it if possible. That includes things like patching servers and having a strategy to do so in a timely manner is going to matter. Things like the recent Spectre/Meltown chip flaws (which I put everything you need to know as it relates to SQL Server in one place here) will not be a “kick the can down the road” exercise. To that point, I’m still seeing people saing they don’t need to worry about patching for Spectre and Meltdown. YOU DO. Yes, it sucks you may see a performance hit, but would you rather be down instead? I do not think so. I’d rather be slower and up than down and out.
It is always better to be proactive than reactive, and SQLHA can certainly help you assess where you are. We can help address and mitigate issues related to availablity and disaster recovery (which would help with things like accounting for power outages and hardware failure), but also devise realistic patching strategies that work. Max and I have done these types of things for some of the largest systems in the world over the course of our careers. It doesn’t matter if you are a small company or one of the biggest in the world – we’re happy to help! Just reach out.
By: Allan Hirt on January 24, 2018 in AWS, Azure, GCP, Private Cloud, Public Cloud, Virtualization | No Comments
Is anyone else bothered by the word “serverless” when it comes to computing – especially in the cloud? The workload you are running, website you are surfing, or bauble you are buying is being served up somewhere on a backend. That backend is comprised of servers even if they are not in your own data center. There’s no magic compute dust at work.
Having said that, infrastructure as a service, or IaaS, is largely based on you accessing servers you configure and control on a backend. If you’re using Azure, AWS, GCP, or any of the other cloud platforms, it’s a virtual machine (VM) running on a hypervisor. So if your company is running ESXi, Hyper-V, Xen, or another hypervisor on premises and you have been running VMs, what you would be using in the cloud is the same … just more abstracted from you.
The problem as we saw with on premises virtualization is sizing. When you want to start doing IaaS-y things in the cloud, you actually need to know the capacity to rightsize. Why? If you don’t, you will either overspend (costing you money), or undersize and have poor performance, which means you’ll need to spend more money to fix the problem. When you own the servers and the platform on premises, it is usually easier to correct this problem. This is not always true. Virtualization was not a panacea. Over the years, both Max and I as part of working with customers have seen virtualized SQL Server environments that were not rightsized, and it caused quite a bit of agita.
The whole premise of virtualization and IaaS in the cloud (I’ll touch on other cloud-y things in a minute) is that you can give things the resources they want. When we went through the waves of consolidation in the mid-2000s which opened the door to virtualization later, a lot more care was put into those consolidations. Early virtualization efforts were often done via physical to virtual (P2V) conversions whereby if you had a server that had P processors and M amount of memory, that’s what the VM was assigned. That’s not rightsizing; that’s lift and shift. You may have been able to sunset the physical hardware, but that’s about it.
To properly rightsize an environment, you need to baseline and benchmark your servers and applications to accurately know what resources they are using. That also allows you to understand how it is growing to plan for the future and have the capacity for that, too. Without that information, you might as well lick your finger, stick it in the air, and try to see which way the wind is blowing because you certainly won’t know what to get as you transition to the public cloud providers. Using Azure, AWS, or GCP is a much more viable option for many folks, but when you’re picking your server, as stated above, if you don’t know what size IaaS VM or storage to select, you will be met with a lot of problems like many of the early SQL Server virtualization attempts went down in many companies. We help out customers all the time with capacity management; it’s very important for long term health of your deployments.
The one thing that the cloud providers do which we often see that many on premises customers do not do is quality of service, or QoS. QoS is a very important concept. In a nutshell, QoS means you’re guaranteed something. For example, if cloud provider X says you’ll get 10,000 IOPS with said storage, you’ll get 10,000 IOPS. On premises virtualization has the same concepts, and if you’re seeing spike-y performance with your VMs, it’s definitely one place to look.
If you’re using Amazon’s RDS or Azure SQL Database, that’s not IaaS; some may call it software as a service (SaaS), but more accurately, it’s database as a service (DBaaS). Amazon and Microsoft are giving you a database that is based in the cloud. You do not manage the instance, nor do you worry about things like performance. Those immortal words “it just works” applies here. Microsoft will soon offer managed instances of SQL Server in Azure so you can have a whole instance that is yours, but without any of the things that come along with IaaS.
For all of these, you still need to measure performance, and if you’re just starting on your journey to the public cloud, you really need to know your numbers prior to making the leap or you might wind up like Icarus and get your wings clipped the hard way. Don’t be that person. One of the things we do for our customers is to help them transition to their next generation platforms and architectures, be it new versions of SQL Server or Windows, Linux, on premises (physical or virtual), hybrid solutions of on premises and the cloud, or going whole hog up into Azure, AWS, or GCP. If you want some help figuring all of this out, including things like baselining and benchmarking to designing the whole thing or anything inbetween, contact us today and we will ensure your transition to the future keeps you soaring high, not falling to the ground.
By: Allan Hirt on January 4, 2018 in Linux, Security, SQL Server, Windows Server | 2 Comments
UPDATED JANUARY 18
If you haven’t been paying attention, a serious security flaw in nearly every processor made in the last ten years was discovered. Initially it was thought to be just Intel, but it appears it’s everyone. Official responses:
- AMD (downplaying the issue)
- ARM (great response)
- Intel (oy)
There are two bugs which are known as Meltdown and Spectre. The Register has a great summarized writeup here – no need for me to regurgitate. This is a hardware issue – nothing short of new chips will eradicate it. That said, pretty much everyone who has written an OS, hypervisor, or software has (or will have) patches to hopefully eliminate this flaw. This blog post covers physical, virtualized, and cloud-based deployments of Windows, Linux, and SQL Server.
The fact every vendor is dealing with this swiftly is a good thing. The problem? Performance will most likely be impacted. No one knows the extent, especially with SQL Server workloads. You’re going to have to test and reset any expectations/performance SLAs. You’ll need new baselines and benchmarks. There is some irony here that it seems virtualized workloads will most likely take the biggest hit versus ones on physical deployments. Time will tell – no one knows yet.
What do you need to do? Don’t dawdle or bury your head in the sand thinking you don’t need to do anything and you are safe. If you have deployed anything in the past 10 – 15 years, it probably needs to be patched. Period. PATCH ALL THE THINGS! However, keep in mind that besides this massive scope, there’s pretty much a guarantee – even on Linux – you will have downtime associated with patching.
Below is a summarized list of the biggest players for SQL Server-related deployments covering physical, virtualized, and cloud. Finding all these links took some time, so I figured I should put them all in one convenient place for everyone. Each vendor and product has its own guidance and response, and there may be updates to what I’ve posted but this should get you started. What I did not list is all the hardware vendors. Check with Dell, HP, Hitachi, etc. to see if there are firmware/BIOS/UEFI updates as well.
If you want help with new baselines and benchmarks, or just assistance in sorting this out and coming up with a plan, contact us. If you are on an older, unsupported version of one of the things below that will not be patched, you should strongly consider accelerating your upgrade/migration plans. This is also something we can help with.
If you’re running workloads using Amazon Web Services, their response can be found here. It appears that their stuff has been patched, but if you’re running IaaS VMs with EC2, you’re going to have to patch your OSes and software in them.
Microsoft’s response for Azure customers can be found here. They also did a KB article (4073235) which can be found here. Like AWS, they’ve patched the underlying stuff. If you are running IaaS VMs, you’ll need to make sure they are patched properly unless you have automatic patching and running WIndows Server (see below).
If you’re using the Google Cloud for your workloads, their response is here. As with AWS and Azure, they took care of the base, but you’re responsible for your IaaS VMs/workloads.
Red Hat Enterprise Linux
Red Hat’s response can be found here which talks more about the impact and the performance. To understand the patching side of things, refer to this. SQL Server is supported on 7.3 or later, and those builds have patches available (although I didn’t see 7.4 listed as of the writing of this post, just 7.3). CentOS had its patches released on January 5th.
Microsoft did a great KB (4073225) article summarizing your options which you can read here. Microsoft is patching SQL Server 2008 and later, but reality is because SQL Server 2005 can technically run on Windows Server 2008 and 2008 R2, it would be affected but it’s out of support. I don’t see Microsoft doing anything for it. This would be a good time to consider when you are planning to upgrade or migrate. As of January 18th, patches are available for 2008, 2008 R2, 2012, 2014, 2016, and 2017.
Microsoft lists five scenarios in the KB. Please read them carefully and make the right choice(s), but the absolute wrong choice is to patch nothing.
If you’re using SLES for your SQL Server deployment, their information can be found here and here (KB). It appears they’ve patched 11 SP3-LTSS through 12 SP3. Although not officially supported for SQL Server, the OpenSUSE info can be found here.
Here is Ubuntu’s high level response. Here is the link to where to get the patches. 16.04 is covered, which is important for SQL Server.
VMware posted a security announcement with regards to this issue as well as a blog post. So if you’re using ESXi as your hypervisor, you need to read it. As of the writing of this blog post, it looks like they patched ESXi 5.5, 6.0, and 6.5. It does not look like they are patching anything older than 5.5. There are two vulnerability alerts: VMSA-2018-002.1 and VMSA-2018-0004.2. VMware patched CVE-2017-5715 and CVE-2017-5753. VMware is not affected by CVE-2017-5754, so no patch exists for that.
If you are not on ESXi 5.5 or later, I strongly encourage you to upgrade as soon as possible, and you want that anyway since 6.0 is the first version of ESXi to support vMotion of clustered configurations of SQL Server.
Similar to SQL Server, Microsoft wrote a KB article (4072698) for this issue that can be found here. As of the writing of this blog post, Microsoft has released patches for Windows Server 2008 R2, 2012 R2, 2016, and RS3 (AKA 1709). Hopefully 2008 and 2012 will get patches soon (still the case as of 1/18). If you have automatic updating enabled, the fixes should be picked up by Windows Update. If not, apply them manually. If you’re still running Windows Server 2003/R2 or earlier, I don’t see Microsoft going back and patching. You’re on your own there. The mitigation would be to upgrade ASAP to something that is patched. If you’re running 2008 or 2012 and MS does not release a patch, I strongly urge you to consider upgrading/migrating your deployments to something that is patched.
More information about the January 3rd patch can be found in KB 4072699. Note that due to some anti-virus vendors, unless the registry is changed, you may not automatically see the patch.
If you’re using XEN as your hypervisor, they did a writeup as well. Things don’t look as rosy right there for now because they don’t seem to have patches for everything yet as of the time I’m writing this blog post. I’m sure that will change.
Apple – If you’re running High Sierra, Sierra, or El Capitan, it looks like Apple took care of this back in December of 2017. See this for more infomation.
- Chrome – It looks like Google is going to release a patch for Chrome later in January. See this link for more information.
- Firefox – Version 57 or later has the proper fixes. See this blog for more information, so patch away!
- Edge and Internet Explorer – Microsoft has a blog post here. It looks like the January security update (KB4056890) takes care of that. So if you’re using either of these browsers, please update your OSes as soon as possible.
This isn’t an exhaustive list, but will hopefully help some of you. A full list of vendors can be found here.
- Cisco (thanks to the commenter below)
- Dell Dell’s list of servers and storage is here. Here is a link for Dell’s Data Security product.
- Hewlett Packard Enterprise HPE is continually updating this post with the various servers and such they sell with compliance and patch links.
By: Allan Hirt on January 3, 2018 in Advice, Mission Critical | No Comments
Happy New Year, everyone! Sorry I’ve been a bit lax on blogging, but it was a crazy busy last half of the year. I will be doing more blogging this year and there will be some other new things which I’ll talk about soon. All in good time …
Anyway, I’m at the car dealer this morning having my car serviced and I overheard an exchange between a tech and a customer that inspired me to write this blog post. The service person who is handling this customer’s case talks to the gentleman explain what the tech found (or didn’t, in this case). Said customer did not believe him, so he asked for the tech to come out. The tech explains things and how he does his process, including to the point of explaining how he could possibly be seeing what he is. Now, I’m not a deep car guy, but here’s this tech trying to explain how the systems are working together. The guy was having none of it and pulled the “Well, it’s a brand new car. I don’t see why this is relevant.” HE then starts asking the tech if they have a rental car or a loaner which isn’t his responsibility. At no time did I hear the tech raise his voice, and it was not a shouting match but clearly the customer felt like he was being wrong and lied to.
I’ve seen this in our end of the world in different ways. I’ve even experienced it.
I love working with customers. Heck, I’ve built a career on it and wouldn’t have survived this long if I sucked at my job. Ostensibly you’re hiring myself or Max (or someone else, if not SQLHA) because you want expertise. I certainly want to provide that, and would turn down an engagement if I felt you knew more than me or I could be of no help (or didn’t have the bandwidth). Why would I take on an engagement that would ultimately be a problem? The money isn’t worth it.
However, there have been those handful of cases over the years where no matter what you say to someone, they’re in denial. Their problem can’t possibly be the problem, right? Sometimes it is what it is, but people don’t like the answer. This devolves – like the situation I witnessed this morning – into a no win situation. Having said that, if you’re going to keep fighting me, why did you hire me? Why would you hire any expert if you’re not going to listen to them? Could we be wrong? Sure. We’re not infallible. I will admit and own my mistakes or if I am wrong. At the same time, I stand by my track record. You’re not hiring me only for my dashing good looks, you know.
Recently I was working with one of our customers who hit a problem. They sent me an e-mail and I knew immediately what their issue was – it was something I had seen a million times. So based on the little info they gave me, I replied, and lo and behold, problem solved. THAT is why you hire folks like me. Would I have dug in more to see what the issue was if it wasn’t what I suggested? You bet. They were happy and they were not blocked.
I would be lying if I said I know and retain every minutiae about Windows Server, SQL Server, Linux, storage, networking, and so on. It’s just not possible since I do not have a photographic memory. I retain a heck of a lot, and over the years, I joke but it’s probably true: I’ve forgotten more about clustering SQL Server and Windows Server than most people knew. It’s not an ego thing. I’ve just been doing it for 20 years. I still remember lots of little details – even about NT4 – but not everything. It all comes back to me when I’m hands on with the older stuff.
Some things to leave you with:
- Asking for help is not a sign of weakness, whether you are an expert or not. I’m at the car dealer because I’m not a mechanic. If I was an expert, would I be sitting here? NO! So if their customer this morning knew more than the tech, why didn’t he just fix it himself? Which leads into …
- Being a jerk is not called for in these scenarios whether you are the customer or the person working with him or her. Having been in in the tech’s shoes, I felt for the him. The service rep’s job is to handle these scenarios. The customer asked to speak to the tech, but the customer got indignant. Sometimes you get your dander up and no matter how you break things down, how nice you are, you’re attacking them. The right thing to do at that point is disengage.
- When you’re hiring someone, do your due diligence. When we get on a call before we do an engagement with a customer, it’s usually pretty clear we’ve been around the block a few times. It’s up to them at the end of the day whether or not they want to hire us. Some will just consider cost above all. We get that and always work with a customer’s budget whenever possible. But if you want the sun, moon, and stars for the price of a candy bar, chances are we may not be able to help you. The problem with putting budget above all is that often leads to bigger problems. Many times we come in after you’ve hired the wrong person and clean up an even bigger mess. Hiring the right resource up front saves you both time (and often downtime) and money. We’re mission critical guys. We get it. Time really is money – on a whole lot of levels. Work with people who understand the technical and non-technical factors and are invested in working with you.
- Good consultants don’t drain your proverbial blood like a vampire and will say no to work not in their wheelhouse. I’m not working for charity, but SQLHA isn’t going to take your money “just because”. We’ve had companies contact us who we said no to that come back later BECAUSE we said no and they liked that. We were up front and honest with them. No is not a bad word or negative in consulting, contrary to popular belief.
- Someone you hire’s job isn’t to insult your employees nor be a threat to them. Fun fact: I can tell you with 100% certainty I’m not looking to replace you as a DBA or admin, nor staff your company with my cronies. That’s not what we do at SQLHA.
Bottom line: trust your instincts. They are often right. We all need to ask for help, and we can’t know everything about everything, but be smart about where you get your advice and who you bring in to help. If you need some help, contact us and we’d be happy to see what we can do.