By: Allan Hirt on February 8, 2019 in Disaster Recovery, Downtime, Outage | No Comments
Disaster recovery is in the news this week for all the wrong reasons.
Stop me if you’ve heard this story before. A major company – in this case a financial institution – is having a technical outage for not only the first, but the second time in less than a week. Assuming you’re not working for Wells Fargo or one of their customers, chances are had a much better time these past few days. These are their recent tweets as of sometime in the afternoon Thursday, February 7. This post went live on Friday morning the 8th and Wells Fargo is still down – basically two days of downtime.
Figure 1. Wells Fargo acknowledging the problem
Ouch. The last tweet is not dissimilar to a British Airways outage back in 2017. This tweet from Wells Fargo claims it is not a cybersecurity attack. Right now that’s still not a lot of comfort to anyone affected.
Figure 2. It’s not a hack!
Their customers are not enjoying the outage, either. Three examples from Twitter:
Figure 3. Customer issue due to the outage
Figure 4. Day two …
Figure 5. More impact
Imagine the fallout: direct deposit from companies may not work, which means people do not get paid. People can’t access their money and do things like pay bills.For some, this could have a lifelong impact on things like credit ratings if you miss a loan, mortgage, or credit card payment. There is not just the business side of this incident.
There were of course, snarky replies about charging Wells Fargo for fees. I wanted to call out an honest-to-goodness impact to someone along with the possible longer term fallout to Wells Fargo themselves. I feel for the person who tweeted, but the reality is Chase or Citi could have an outage for some reason, too. I don’t think anyone is 100% infallible. I have no knowledge of what those other finanical institutions have in place to prevent an outage.
I’m doing what I do now largely because I lived through a series of outages similar to what Wells Fargo is most likely experiencing this week. Those outages happened over the course of about three months. They are painful for all involved. I worked many overnight shifts, a few 24+ hour days … well, you get the picture. I learned a lot during that timeframe, and one of the biggest lessons learned is that not only do you need to test your plans, but you have to be proactive by building disaster recovery into your solutions from day one. All companies, whether you are massive like Wells Fargo or a small shop has the same issues. The major difference is economy of scale.
Everyone has to answer this one question: how much does downtime cost your business – per minute, hour, day, week? That will guide your solution. Two days in a row of a bank being down is … costly.
There may be another impact, too – can Wells Fargo systems handle the load that will happen when the systems are online again? I bet it will be like a massive 9AM test. Stay tuned!
You don’t want to have a week like Wells Fargo. Take your local availablity as well as your disaster recovery strategy seriously. I’ve always found the following to be true: most do not want do put in place proper diaster recovey until their first major outage. Unfortunately at that point, it’s too late. Once the dust settles, they’ll suddenly buy into the religion of needing disaster recovery. Let me be clear – I do not know what the situation is over at Wells Fargo; zero inside information here. Did they have redundant data centers and the failover did not work? Were some systems redudnant but not others? Were there indications long before the downtime event that they missed? There are more questions than answers, and I’m sure we’ll find out in good time what happened. The truth always comes out.
A Different Kind of Mess – Quadriga
This week also saw a very different problem as it relates to finance: the death of Quadriga CEO Gerald Cotten. Why is his passing away impactful? When he died, apparently he was the only one who knew or had the password for the cryptocurrency vault. There are other alleged issues I won’t get into, but with his laptop encrypted (apparently his wife tried to have it cracked), literally no one whose money was in the vault can get it. The amount stored in there I’ve seen in various stories has been different, but it is well north of $100 million. The lesson learned here is that someone else always needs to know how to access systems and where keys are. Stuff happens – including death. There are real world impacts that can happen when systems cannot be accessed.
Watch Your Licenses
Since we are talking about outages, the Register published a story today about how a system would not come up after routine maintenance due to the software license expiring. I have been through this with a customer. We were in the middle of a data center (or centre, for you non-US folks) migration. We had to reboot a system and SQL Server would not come up. I looked in the SQL Server log. Lo and behold, someone had installed Evaluation Edition and never converted it to a real life. It also meant the system was never patched and never rebooted for a few years! Needless to say, there was no joy in mudville. That was a very different kind of outage.
The Bottom Line
If you do not want to be another disaster recovery statistic and prevent things like the above from happening, contact SQLHA today to figure out where you are and where you need to be.
By: Allan Hirt on February 7, 2019 in Conference, SQLbits, Training | No Comments
Hard to believe that SQLBits 2019 is only a few weeks away. I’m looking forward to speaking there again. It’s always an honor to be selected and Bits is one of my favorite conferences to attend if I can make it. This year, it’s in Manchester which is somewhere in the UK I’ve yet to visit, so I’m excited about that as well.
I’m currently finalizing the content and the lab for the Training Day I will be delivering on Thursday, February 28 – Modern SQL Server Availability Architectures. Hopefully the venue can support the lab, so we’ll see. That aspect is completely beyond my control, but what I have cooked up should hopefully be fun if we get to do it. You’ll need to bring your own laptop and make sure you run the test link that is linked in the description. Last I checked, seats were filling up quickly, so don’t miss out!
I’ll also be doing a session on Friday, March 1 – Common Troubleshooting Techniques for AGs and FCIs at 14:25 (2:25 PM for those of you on my side of the pond).
If you haven’t registered already, what are you waiting for? If you have, see you there. Come up and say hello!
By: Allan Hirt on February 4, 2019 in SQLHAU, Training | No Comments
Happy post-Super Bowl Monday, everyone. Almost time for pitchers and catchers!
New Online Classes
Back in November, I announced our first set of live online training and classes which covered through June of this year. I’m pleased to announce the dates for the second half of 2019. These classes will be live and instructor-led. Our fully demand courses will debut later this year. The two new classes and dates are as follows:
In August, I’ll be heading back to the Microsoft Technology Center in Chicago to deliver two classes:
I’m particularly excited about Modernizing Your SQL Server Infrastructure Boot Camp. It’s something that has been on my mind for awhile. It goes without saying that it will have our signature labs. The full course description can be found here. It’s going to be a lot of fun to teach.
The Modernizing Yout SQL Server Infrastructure Boot Camp is $995 and the Always On Availability Groups Boot Camp is $1495. You can get a bundle of both for $2195, a savings of $295. That price includes food, and trust me, you will not go hungry. Space is limited – reserve your spot today!
Both classes currently have their biggest discount – 25%. Don’t miss out!
Still Time to Register for the 2019 First Half Classes
Don’t forget about the classes announced in November – three clases with Europe/UK times in June as well as the SQL Server Availability Solutions in A Cloudy, Virtual World in April with US times.
Want To Stay “In the Know” for SQLHAU’s Training?
If you want to find out about training dates, new classes, and more, sign up to get updates from SQLHAU. While you’re at it, you can also choose to see when new Mission Critical Moment videos are published, get information about upcoming SQLHA webinar, and of course, subscribe to the Mission Critical Update – our newsletter. The first issue will go out this week, and for training, there’s an exclusive offer that you will not find anywhere else and applies to all SQLHAU classes currently scheduled.
By: Allan Hirt on December 13, 2018 in Always On, AlwaysOn, Availability Groups, Read Only Filegroups, SQL Server | 2 Comments
A question came across my inbox this week which I decided to investigate: what happens if you have read only filegroups as part of your database, and you want to use Always On Availability Groups? Let’s find out.
First, I have a database with two filegroups: one read write (PRIMARY) and one read only (ROFG).
Figure 1. Filegroups
I also have a table (ROTable) created on ROFG.
Figure 2. Table on a read only filegroup
I then created an AG named ROFGAG. I selected automatic seeding (more on that in a second) to initialize my secondary replica. According to the Wizard, all is good … or is it?
Figure 3. Successful AG creation
I checked the status of the AG and it wasn’t synchronizing. Seeing as this is not my first rodeo, I started looking and lo and behold, the database was not joined to the AG even though the Wizard said everything was OK at the end of creation.
Figure 4. DB not joined to the AG
So there appears to be a bug in the AG creation where if you have a read only filegroup and select automatic seeding, it didn’t initiate the seeding for some reason. Looking in the SQL Server log on ALEX,it gave me the telltale permissions issue.
Figure 5. Seeding permission not granted
Granted, this is SQL Server 2016 where there’s the 3 minute “issue”, but that generally worked OK through the Wizard. I manually issued the ALTER AVAILABILITY GROUP ROFGAG GRANT CREATE ANY DATABASE and lo and behold …
Figure 6. Seeding initiated
You can also manually back up the database, copy it to the other server, and restore it to the instance that is the secondary replica. Either way, once the database was on the secondary replica, with seeding , it joined to the AG just fine. If you manually restore the database with NORECOVERY, the database has to be joined to the AG.
Figure 7. Join the DB to the AG
I inserted a few rows into a table on the read write filegroup, and everything is synchronized as expected. Keep in mind that on the secondary replica you have the same files as the primary and both filegroups. So having a read only filegroup works just fine with an AG.
Figure 8. AG status
I now deviated from the plan. What if you just want the read write filegroup on the secondary replica, and not everything? Does that work? To do this, I removed ALEX as a secondary replica and then deleted ROFGTestDB from that instance. On ALEX, I then restored ROFGTestDB, but just the read write filegroup. That means I wouldn’t get ROFG or its corresponding .ndf file.
Figure 9. Restoring just the read write filegroup
Since I restored the database, in the Wizard I just selected the Join only option for initial data synchronization. SQL Server thinks all is fine … or is it?
Figure 10. Adding the replica with no read only filegroup
Unfortunately, the AG is not synchronizing even though it joined supposedly with no issues. The AlwaysOn_health extended events trace shows the join, but no other information indicating a problem. The same holds true for the SQL Server log. I was inserting rows into a table, and that DDL executed fine, so the issue is apparently not on the primary.
Figure 11. All is not well
At this point, I removed the secondary replica from the AG. That completed successfully, but after doing that, ROFGTestDB on ALEX (where the restore without the read only filegroup happened) is showing as Suspect.
Figure 12. ROFGTestDB is suspect
The only difference is that before when things worked, all of the filegroups were restored. I deleted ROFGTestDB from ALEX, and then restored it from a backup WITH NORECOVERY with both the read write and read only filegroups.
Figure 13. ROFGTestDB restored
I then joined ALEX to ROFGAG, inserted a few rows, and everything works as expected.
Figure 14. AG working
It looks like AGs work with DBs that have read only filegroups, but you have to make sure you have all files and filegroups. If you restore the database with only the read write filegroup(s), it appears the AG will not synchronize. I plan on following up with Microsoft to see if this is expected behavior, but it is definitely interesting because it reports success when it really is failing. That also means that at least the way things work today, you can’t have a smaller footprint on a secondary replica – it’s all or nothing.
Hope this helps some of you!
By: Allan Hirt on November 21, 2018 in Availability Groups, FCI, SQLHAU, Training | No Comments
Where has this year gone? It seems like it’s flown by. I’ve been on the road for the most part it seems since about mid-September with a mix of some personal, client work, and speaking engagements. However, one of the things I’ve been working on for quite some time is finally here – online training! Right now it is going to be live, instructor led but once I find some time in 2019, there will be more options. For more information on our approach to live, instructor led online training, click here.
There are seven class options/dates listed on our Events page, with a mix of US and EU/UK times. The times and all details are in each individual link on our Events page. Don’t worry – I’m eying other parts of the world for future deliveries. The seven offerings are span four different courses:
All feature our signature labs or some sort of hands on, which is something I have been doing for years when others thought labs were too much work. Yes, they are a lot of work but I know with complex topics like the availablity features, you need that hands on experience in some way, shape, or form.
To celebrate the “grand opening” of online training, we have two awesome sales you don’t want to miss: our Black Friday deal, which is 50% off everything through November 30th. If you somehow miss the Black Friday sale, all of the courses will be 30% throughout the month of December. Those who have signed up for our newsletter will get the first one in early December with a special deal, so if you haven’t signed up for some awesome content from SQLHA, what are you waiting for? Sign up today.
All sales are coded discounts which can be found in the individual links on the Event page.
If you need further justification for the classes, here’s a PDF to show why we have best-in-class offerings.
You can now stop asking me “When will you have online classes?” Go sign up for one!
If you’re in the US, have a Happy Thanksgiving if you are celebrating.