We Need RDMA for Availability Groups and in All Public Clouds
Hello from Microsoft Ignite on day two.
Yesterday was a big day between all of the Windows Server 2019, Azure, and SQL Server 2019 announcements. Others have covered things like new features at a broad glance (here’s the official list from Microsoft). I’ll get into some of those things over the next few weeks when I get some time to play with the bits for SQL Server 2019 and discuss things like the SQL Server Big Data Clusters. However, now that SQL Server 2019 CTP 2.0 has been officially announced, there’s something I want to address: the network transport for Always On Availability Groups (AGs) and how it must be improved.
Let’s Talk Disks
One of the keys to success for any AG configuration is the speed of the disks on all replicas – primary or secondary. If you want synchronous data movement, any secondary replicas have to be as fast or faster than the primary to keep up (network speed matters, and I’ll address that here in a minute). If you add read only workloads to a seconary replica, that increases disk usage, so again, speed matters.
We are beyond simple SSDs. The real speed is no longer there. Most people are looking at NVMe drives these days, which are faster flash-based drives than “traditional” SSDs. However there is a new (yet old) kid in town: persistent memory aka storage class memory aka PMEM. “Straight” memory is always going to be faster than going to disk due to the way systems are architected internally. Back in the day we had things like RAM SANs, but the idea of using memory for storage is coming back around again in the form of PMEM. SQL Server 2016 initially supported PMEM (NVDIMM only, and I think just NVDIMM-N) specifically for the tail of the log caching (see Bob Dorr’s blog post). Capacity for persistent memory was a bit small, so it made sense. SQL Server 2016 (and later) on Windows Server also now supports PMEM if the PMEM was formatted as block-based storage (i.e. it was configured like a normal disk to Windows Server).
There are two PMEM enhancements in SQL Server 2019:
- Support for Intel Optane
- For Linux, SQL Server data, transaction log, and In-Memory OLTP checkpoint files can now be placed on PMEM in enlightened mode
Many newer physical servers have slots for persistent memory; it’s not a passing fad.
What’s the point here, Allan? Whether done right with newer NVMe drives, PMEM, or both, you can get blazing fast IOPS for SQL Server. This is good news for busy systems that want to use AGs. However, there is a looming problem especially with these speeds: the network.
Why Is Networking Your Next Big Problem?
The stage is set: you’ve got blazing fast storage and a busy database (or databases) in an AG, but your network is as slow as two tin cans connected by a string. It won’t matter if your disks came straight from setting a a record at the Nürburgring. A slow network pipe will choke the ability to keep an AG that is synchronous in a synchronized state even with compression enabled. It’s that simple. The same could be said of a large database using seeding for the replicas.
Enter RDMA
I’ve talked about Remote Direct Memory Access (RDMA) in the past in two different blog posts (New SQL Server Benchmark – New Windows Server Feature and Windows Server 2012 R2 for the DBA: 10 Reasons It Matters), so I’m not going to rehash it in any kind of depth. TL;DR it’s really, really fast networking that at least on Windows Server, is lit up automatically when you have everything in place. However, not everything can use RDMA. Things such as SQL Server’s tabular data stream (TDS) need to be enabled for use on RDMA, just like Live Migration traffic in Hyper-V and SMB 3.0 (SMB Direct) was. SMB Direct can be used with FCIs, and has been supported for some time. It’s part of that benchmark linked.
Some good news, though:
- Windows Server and Linux both support RDMA (I’m not sure about containers, though … I’m guessing not, but I’d need to dig more)
- Both Hyper-V (Build 1709 or later) and ESXi (6.5 or later) now support RDMA inside guest VMs. The bad news: ESXi only supports it for Linux.
My Call(s) to Action
1. The Windows Server and SQL Server development teams need to work together to enable RDMA for AG traffic on Windows Server (which would most likely be Windows Server 2019 in a patch, or later; don’t hold your breath for Windows Server 2016), and SQL Server needs to get RDMA working on Linux.
2. VMware needs to support Windows Server workloads with their PVRDMA adapters. VMware really is missing an opportunity here.
3. We need RDMA for IaaS VMs in the public cloud that can be used with SQL Server. This is for two reasons: a) Storage Spaces Direct for FCIs b) AGs if RDMA traffic is enabled. For Azure, this would be enabled by the Azure compute and/or networking teams. Azure has some IaaS SKUs with RDMA networking so it’s possible, but they are for HPC and not general use such as D- and G-class VMs. There’s no RDMA that I can see in EC2 or GCP, so I think those are pipe dreams, but for those who want FCIs, it sure would be great to be able to deploy S2D right now and have it work well, and then also work for AGs down the road. Azure is our best hope here.
4. Assuming #1, Azure needs to enable RDMA so that Azure SQL Database and Azure SQL Database Managed Instance can take advantage of RDMA, and make sure it does things like work across Availability Zones in a region.
That’s it. I’m not asking for the sun, moon, and stars. Most, if not all, of this is doable. There’s already precedence for supporting RDMA for SQL Server via FCIs on Windows Server, and that also needs some cloud love if you want to use S2D up there. RDMA needs to be brought over the finish line for all of the SQL Server availablity scenarios regardless of platform. In a cloud first option, we should not be saddled by slow inter-server connectivity.