Blog

Business Continuity – We’re Here

By: on March 13, 2020 in Business Continuity, COVID-19, Disaster Recovery | No Comments

If I think about it, my entire career has been based on business continuity (BC) – making sure that things are up and running after something happens. In our technology world, we generally break this down to two categories: high availability, which means you can survive a relatively local event, and disaster recovery, where you can survive something more catastrophic. The reasons we’ve all fretted about – server failure, flooding, tornadoes, tsunamis, ransomware, and things like them – now have a new player at the table, COVID-19. COVID-19 does not respect borders, how much is (or is not) in your bank account, etc.

Quite literally everyone on the planet has been affected by COVID-19 both personally and professionally. BC is in action, but not in a traditional way. Business is, for the most part, still happening. How it is being done is different and will be for the forseeable future.

Travel has morphed into essential only for most of us. Conferences, concerts, and sporting events are postponed or outright cancelled. Nearly every company is asking employees to work remotely/from home – even ones that have shunned it in the past. The ones that are still requiring employees to come into the office will be changing their tune soon and show how prepared – or not – they were if something happened that wasn’t COVID-19. In essence, businesses are exercising, to a degree, their BC plans. Nothing is good until you test it, right? Some are failing. A longtime friend’s company is still saying everyone needs to be in the office. Read the damn tea leaves. This isn’t rocket science or something just for tech companies. Get your head out of your proverbial rear end.

What people forget sometimes in our industry is that BC is largely about people, too. It’s not just about servers (physical or virtual) and its associated tech. This big shift in how we will all be interacting and working raises some interesting things to think about as well as questions.

  • Will broadband providers be able to provide good connectivity and throughput now that more people are home during the day? I know they see it at night, but we’re talking about 24×7 load now.
  • Will broadband providers enforce data caps or lift them? That will certainly hamper some who use data a lot for their daily needs (and I’m not talking streaming Disney +, Netflix, etc.).
  • How will this affect supply chains for everything? We’re in a global economy. What happens, for example, if your SAN has a failure and you can’t get a disk? Can someone even get to your data center to fix it even if you do have a spare?
  • As my friend Joey D’Antoni speculates, what does this mean for cloud providers like Microsoft Azure, Amazon AWS, and Google Cloud? Right now their capacity is fine, but what about six months from now? Would it be cheaper to provision a bit more now? I can’t say yes, but you do not want to run into what I saw at my local Target last night. Hoarding is not good (yes, even for toilet paper or paper towels) but being prepared is. There’s a big difference.
Empty shelves

Empty shelves of toilet paper and other essentials at Target in Watertown, MA, on March 12, 2020.

  • More importantly, after we’re on the other side of this – whatever that looks like – what will things look like both in our daily lives and our professional lives? Will it go back to the way it was or something completely different? Will in-person events such as conferences go back to the normal way of doing things? I will be at the rescheduled SQLBits this fall and hope to see you there.

Business continuity is all about planning and being prepared. SQLHA is poised to help you do that that today and tomorrow. It’s what we do. COVID-19 has not affected us working with our customers new and old. Remote working has always been in our DNA. Don’t get me wrong – I know I love going onsite; my status with American Airlines and Marriott hotels can attest to that. But that doesn’t mean I don’t equally enjoy working with our customers remotely. It’s still satisfying knowing you’ve helped someone out and they’re pleased as punch at the end result. Don’t hesitate to reach out if you are looking for some help.

Now to the human side of things …

I know the next little while will put a strain on everyone in different ways. Be safe, smart, and not a superhero. There’s a reason extreme measures like social distancing are being put in place. Be kind and understanding. Even if people are not affected from an illness standpoint directly, some businesses – especially small, local ones like restaurants, bars, and stores – will be affected. If there’s some place you like, find a way to support them, too. Have patience if you are going out locally to a store, especially to places like supermarkets and Target. The people there are stocking shelves as fast as they can. Please and thank you should be in your vocabulary. If you are still going out to eat or picking up food, leave a generous tip if you can. These workers will be hit hard if they are not already.

When not working, I play music. All of my rehearsals and the one concert I had upcoming have been postponed or cancelled (and for good reason). Same with concerts I have tickets for. Similar to my local business comment, the arts will be greatly affected during all of this. Please support your local arts community; they’ll need it. Not going to get on my high horse, but pay for a download or CD of your favorite artist. Buy or rent a movie; don’t torrent. Now more than ever the arts community will need your support.

EDIT: Also – be polite to reps on the phone if you are calling to cancel/change travel plans (hotel, car, air, whatever). The reps didn’t make the policies; they want to help you. Luckily most companies have altered policies for this extraordinary circumstance. Not only should you say please and thank you, but if you normally skip the end-of-call survey, please take it. The reps are dealing with a very difficult situation and it helps them. Understand they are facing a challenge; have empathy.

I could not have predicted a global event such as this; most are fairly regional in nature. Maybe this will wake up some companies that what I’ve been saying for so long was not hyperbole. We’ll see.

Designing and Implementing Robust Solutions

By: on February 4, 2020 in High Availability, Scalability, Testing | No Comments

No matter where you live, you have most likely heard about the Iowa Caucus fiasco here in the USA this week. This is not going to be a political post, but what happened is something I have seen too many times in all my years working both on the dev and IT sides.

There are numerous topics that could be discussed in greater detail – the failure of redundancy (the availability guy in me is screaming to talk about this), the lack of testing (my QA background is also yelling loudly), but all of this points to a larger culprit: designing the solution properly to have the performance, availability, reliability, scalability, and security required. Do not forget the solution has to be usable by end users and easily maintained not to mentioned tested. The word “robust” is a good single word to sum it all up. According to this 9to5Mac post that has a good, concise summary of what is known to date (and a link to a bigger New York Times article), most of that didn’t happen or just flat out failed. Then there is accounting for data inconsistencies, which according to the New York Times, is a coding problem in the app. ArsTechnica also did a good writeup.

The Iowa Democratic Party issued a statement. It highlights ALL of what I said above, but especially around testing. The fact the data collected was right but the reporting was incorrect is damning. As a data person that makes me angry. Did the reports actually meet requirements and did they test with real (or at least simulated real) data? So many things …

You may ascertain this is not my first rodeo. Whether you buy a packaged application, roll your own, or customize an off-the-shelf package, what worked then may not work now. There’s a difference in scale with few users or a very different workload. How you handle 100 rows is very different than millions or billions. That is not necessarily a failure of design because at the time many applications were designed with what was known. Sometimes things grow organically. Other times some aspects are overlooked with the “we’ll deal with that later” mentality. It is always better to get it right (or as close to it …) as possible the first time around. Retrofitting, or worse, replacing existing solutions is expensive in every aspect of the word and may cause outages to do so. Hire smart people (FTEs or consultants) to help you. It is not a sign of weakness to admit you may not know what you don’t know and you need assistance. What does right look like? It would include, but is not limited to:

  • Proper application design
  • Proper schema design
  • Accounting for RTOs and RPOs for all components
  • Testing using production (or close to production-like) data and masked if necessary – functional, at scale (works on my machine with no load is not testing at scale …), and full business continuity
  • Securing everything but not hampering functionality and scalability while doing so (if possible)
  • A monitoring and overall management strategy that works including backup and restore
  • Take any lessons learned and drive them into improvements

All of this is valid whether you are fully on premises (physical or virtual), in a public cloud, or using a hybrid which bridges both. Choice of technology or platform does not matter. All of the fundamental principles are the same. Crap in, crap out. The cloud will not fix your poorly designed application nor will throwing more resources often resolve all – if any – issues. At some point you need to take a holistic approach; touching one thing has tentacles elsewhere. It is rare that is not the case.

The right design is always an end-to-end approach, no matter if it is an existing implementation or something new. The application matters just as much as the backend. Over the years, we’ve helped customers achieve their performance, availability, and security goals by helping them implement solutions that work. SQLHA has worked with some of the largest systems on the planet. If you are not a large company, bank, or government, that does not mean your solution needs any less robustness or aspects of mission criticality. We do this with companies of ALL sizes.

If you want to avoid scenarios like the Iowa Caucus, contact us today. We’ve been there, done that and will roll up our sleeves to get you where you need to go.

How Do You Use Data?

By: on January 27, 2020 in Data | No Comments

Today is the 75th anniversary of the liberation of Auschwitz. Auschwitz had more than one camp and this post is not about those details, the camp system itself, or things like that. It is also not a commentary on politics, corporations, religion, war, or any topic orbiting them. I’ve visited both Auschwitz and Dachau and blogged about my visit to Dachau in 2012. Auschwitz affected me in different ways and was its own distinct and sad experience for other reasons. It is profound and life changing.

Data drives the modern world – now more than ever. This is true in our personal as well as professional or business lives. Examples include how you invest money or how a company makes decisions. A lot of choices are based on information which is just another word for data. Some degree of luck and intuition can help, too.

Much of today’s information sits in databases (and not just in SQL Server-based ones). Queries and tools extract and present said data. It can be analyzed, filtered, and otherwise processed in numerous ways. Any data professional who has written queries knows that changing, say, a WHERE clause can make a huge difference in the result.

We tend to think of computers and mining data as recent innovations. They existed in more “primitive” ways before what we think of as modern computing . One early tool was the Hollerith Electric Tabulating System, first used for the 1890 US Census. It was later extended to use things like automatic card feeding as well as incorporated the first keypunch. The company which made those machines was combined with others in 1911 and renamed International Business Machines Corporation (IBM) in 1924. You may have heard of them.

Databases, relational or not, did not exist in the same way then as we think of them today. Those punch cards effectively made up what is similar to a database today. The reason I am specifically referencing the Hollerith is that those machines were used as part of the data-driven mechanism behind the tragedy in Europe which I first learned in Edwin Black’s excellent book IBM and the Holocaust. I highly recommend reading it.

I recently learned the US via the War Relocation Authority (WRA) also employed Hollerith machines during World War II. The WRA was responsible for the relocation and detainment of Japanese Americans. There’s no nice way to say that. IBM was subcontracted for that work. Accounting Information published an article about all of this in Volume 33 Number 1. The section “Big Brother Is Watching” discusses how the data was used, analyzed, and ultimately, reported. Here is a snippet:

His grand design for 1943 was a “locator file” in which would appear a Hollerith alphabetic punch card for each evacuee. These cards were to include standard demographic information about age, gender, education, occupation, family size, medical history, criminal record, and RC location. However, additional data categories about links to Japan were also maintained, such as years of residence in Japan and the extent of education received there.

Here is a what a Hollerith card looked like courtesy of Wikipedia.

Hollerith card

Hollerith punch card

Today many of us are annoyed by targeted ads, most of which stem from mined data about where you were looking online or by using data companies already have about you. American Express was recently able to identify fraudulent charges on my account due to data they have based on my history with them. Mining data is not inherently bad. How that data is used is where responsibility comes into play. There is a reason laws like HIPAA exist in the US.

As we commemorate this day which should be both celebrated for the aspect of freedom as well as respected for its remembrance of a terrible chapter in human history, I ask you to reflect on the use of data and both the responsibility and relationship to it. How do you deal with it in your job? In your personal life? Those are heady questions for a Monday, but it is an appropriate day to contemplate such weighty topics.

Security Is an Availability Problem

By: on January 17, 2020 in Security, SQL Server | No Comments

I’ve been hearing more and more from people I know as well as news stories that security is on people’s minds in one way or another. It’s a fundamental IT tenet that security should be a priority. With the rise of ransomware as well as the number of hacks and data leaks going up, things have reached a fever pitch – and I do not think we have hit a peak yet.

The reason I’m writing this blog is because every data breach and ransomware incident becomes an availability problem in one way or another. Mission critical encompasses many things – including both security and availability. Until recently not many saw the two as being in the same boat. Welcome to my world. Come in, have a dip in the pool – the water’s fine!

Let’s take the example of one of the most recent victims of ransomware, Travelex, one of the world’s largest travel-related companies. Travelex was hit with ransomware on December 31, 2019. Happy New Year, right? It hasn’t been for Travelex. They do business with lots of people including direct consumers. One of their businesses is money exchange at airports – I’ve seen their kiosks and storefronts at numerous ones including Heathrow.

They have been silent until today. On their customer information page in the UK about the incident, a video was posted from Tony D’Souza, the CEO of Travelex. First, taking nearly three weeks for a response to a very public incident is not necessarily a good thing in my estimation. I get the need to deal with things. I’m not perfect in all my communications but I’m also not Travelex. A big part of these incidents is incident management – including the public side of things. The customer page is reach, but I’m talking about the overall PR effort.

As of today, January 17, 2020, the screen grab below is their website’s front page in the USA (click to make larger).

Travelex Home Page

The Travelex home page as of 1/17/20

They are partially back up and working. According to a BBC report,

However, while he said the system used by staff is now working, there was no word on when the firm’s main UK website would be returned to service.

That means customers are still unable to order currency online, either from Travelex itself or through the network of banks that use its services, including Barclays, Lloyds, RBS, and the finance websites of Sainsbury’s and Tesco.

Ouch. That is a lot of lost business for their partners AND Travelex themselves with seemingly no end in sight. You get the idea.

Security starts well before someone accesses a databases or a system. FedEx had an issue a few years ago that cost them hundreds of millions of dollars because it got in through a subsidiary in Europe. IT needs to be able to cope with the modern threats. Most do a great job, but sadly, there will always be one attack vector you may not have thought of. Covering as many as you can is important. Not doing anything or saying, “No one will care” should have people polishing their resumes. Just look at the City of Baltimore which got hit with ransomware; it cost them at least $18.2 million.

Security is more than worrying about PII, credit card numbers, HIPAA, but as data professionals, that is one of our primary concerns. For every company that worries about what port to configure for SQL Server and frets that 1433 is insecure, worry about bigger problems like prioritizing the security of data at rest (including backups) as well as on the wire. A port scanner will find that SQL Server instance pretty quickly. Obfuscation may slow a hacker down and annoy them, not stop them. Put some effort into a robust backup strategy that has your backups safe, offsite, and tested so you know you can restore them if you get hit with something like ransomware. Even if you secure SQL Server but you have a ransomware incursion, you still may need those backups if you can’t access the systems. Everything matters here.

We always take security into account when we’re helping our customers design or evaluate solutions. Security is multi-layered. You do not want to let a security problem become an availability issue. Could your business survive being nearly completely down for over two weeks and counting like Travelex? I doubt it. Want to make sure you never find out the answer to that question? Contact us today.

We’re Only Immortal for a Limited Time

By: on January 15, 2020 in Advice, T-SQL Tuesday | 2 Comments

Back in 2015, I wrote the blog post In the End – Life (and IT) Lessons from Rush after I had seen what wound up being their last live show at the Forum in Los Angeles. It’s still one of my more popular blog posts, and in my opinion, one of the better ones I’ve authored.

I’ve referenced Rush before in blogs (such as Fun With Naming Conventions). If you’ve ever seen me present, I always put music references (among other things) in my demos. I often have a three-node AG with node names of Geddy, Alex, and Neil. You get the picture. I was first turned on to Rush via MTV in 1981 or 82 with the videos from Moving Pictures, so they’ve been a part of my life pretty much from the moment I started playing bass. Exit Stage Left was the first Rush album I learned to play back to front. Rush in one way or another has been part of my life now for nearly 40 years. Very few bands have stuck with me like Rush has (there are a few, and most people know Styx is one; that’s a story for another time).

Rush was a phoenix. After Neil’s tragedies post-Test for Echo, there was a high probability Rush would not play again. Between losing his daughter Selina in a car accident and his wife Jackie to cancer just about a year later, would you blame Neil for walking away forever? When Vapor Trails came out, it spawned a triumphant second act for Rush that in some was was probably more satisfying. Rush earned their retirement and then some.

On January 7, Neil Peart passed away after battling Glioblastoma, brain cancer, for about three and a half years. The news was announced last Friday the 10th and needless to say, it hit me like a ton of bricks. The last time I was this affected was from the death of my friend Mike who I’ve blogged about fairly extensively here (two examples Life Is Fragile and Some Anniversaries Are Just Not Happy Days). There are two similarities between the two: their passing were unexpected and sudden and both were WAY too young. Sure, Neil was 67, but these days, that’s really not old. It’s funny how when you are in your teens or twenties 30- or 40-something seems ancient. When I first saw Rush in December of 1987 at the venerable Spectrum (RIP) in Philadelphia, Neil was just barely 35. 35! I’m almost 50 now. 35 is a “baby”.

After Mike’s death and now Neil’s, it really brings the point home you have to make the most of your time on this planet. In the past few years I’ve been doing a bit of a self-reassessment and addressing things if needed. I’ve written before about how self care is important (one example: A Letter to Myself at 20). It shouldn’t take someone passing to put things in focus; sadly it sometimes does. Most of us get caught up and then time just flies by. When we stop to take a breath, sometimes we realize we got off track. If we screw up (and heaven knows I have), all you can do is own it and try to fix it. Getting back on course takes time. The results might even be better.

In a way, it’s crazy to feel so sad and mourn for someone you never met. Yet Neil (along with Geddy and Alex) have touched my life in so many ways for so long. I’m not the only one given all the tributes which have been from all walks of life. Very few knew Neil was sick. Some have admitted they did after the fact (such as his longtime drum tech Lorne Wheaton). You’ve got a good support system when something that major does not leak in three and a half years. I can’t imagine how Geddy or Alex felt every time they were asked about a possible Rush reunion knowing what they knew. Looking back at their statements – especially Geddy – he says it without saying it. Neil stopped publishing on his website about the time it seems he got the diagnosis. The bread crumbs were there; we just didn’t know.

Neil dealt with his cancer and his ultimate passing in the way he lived life. He was an intensely private person but at the same time, a fighter. The prognosis for Glioblastoma is grim and he outlasted the odds. Arguably his main reason for walking away in 2015 was to be there for his daughter Olivia. He got a second chance at life with Carrie which makes this all the more cruel. The last song on Rush’s last album Clockwork Angels, “The Garden”, always reminded me in a way of Genesis’ “Fading Lights” from We Can’t Dance – a goodbye. It’s a very poignant song besides being one of the more beautiful ones they’ve written. You can bingoogle the full lyrics (Neil was the lyricist for Rush besides being the drummer if you didn’t know), but this one always sticks with me:

“The measure of a life is a measure of love and respect”.

I think Neil would downplay it, but he had that love and respect in spades. Just look at the outpouring of tributes that started on Friday. Rest in peace, Professor. I hope Jackie and Selina welcomed you with open arms. My condolences to Carrie and Olivia, Geddy and Alex, and Neil’s friends and family. I hope all the tributes to him and his enduring work will somehow comfort you and know that he will never be forgotten.

Neil’s life is a lesson which can be tied to this month’s T-SQL Tuesday topic, “Imposter Syndrome“. Neil was a lifelong student. Even by the mid-80s, many considered him to be one of the greatest rock drummers of all time. What did he do? He took drum lessons as time went on from Freddie Gruber and Peter Erskine. Your drummer’s favorite drummer took private classes! At the same time, he was a teacher. His instructional videos are some of the best selling ones for drums of all time. The two are not mutually exclusive. Don’t believe me? Read this page of tributes at Hudson Music, the publisher of Neil’s videos, especially from Mr. Erskine.

What I hate is when someone wants to play “stump the chump”. Simply put, it is when someone asks you a question to intentionally show they are “smarter” than you. Look, I remember a lot of things. I even joke I’ve forgotten more about clusters than some have even learned, but I’m also not a walking encyclopedia or Wiki, either. There’s a reason Books Online and documentation exists. If you got me to admit “I don’t know” – words everyone needs in their vocabulary – congrats, jerk. I’m human just like you. I put my pants on like everyone else. I remember sitting at PASS Summit a few years ago and behind me surprised I was right in front of them. Um, what? I’m just a normal person and not sitting on a throne. I turned around, introduced myself and we had a great chat. A bit like Neil, I would rather talk about pretty much anything else than SQL Server when I’m not formally having to talk about it. There’s more to life than WSFCs, Pacemaker, AGs, and FCIs.

Neil always wanted to improve. I think the same way. I never think I know all the answers and the older I get, sometimes I feel the less I know. Believe it or not, I’m even wrong sometimes! Own it. I do. I’m constantly learning new things and am in awe of those who do other things such as esoteric performance tuning just as I’ve had people say they don’t know how they can do what I do when it comes to HA and are amazed at my skills. It’s humbling when people say you’ve influenced them or come up and say one thing you said made a difference. Like Neil, I try to pass what I know on, including from mistakes and failure. Successful people fail all the time. Failure or not knowing something is not a sign of weakness; far from it.

This doesn’t mean successful people will not have a bit of an ego or strong opinions; far from it. They just are open to feedback, criticism, and improvement. Sometimes you need to reinvent yourself even if no one knows it like Neil did with his drumming in the 90s. I’ve done it myself in my career even if no one noticed. If I was the same person I was 20+ years ago, never absorbed feedback – good and bad – I’d be out of a job. Does that mean I’m perfect? No! To quote the title of one of Neil’s videos, I’m a “work in progress”. If that makes me an imposter, I’m proud to be the founding member of the club.

The moral of the story: be like Neil. Be humble. Be a student and learn from those around you. Ask questions. Listen more than you speak. If you’re not doing any of these things, you are the imposter.