RDMA Archives

September 25, 2018

We Need RDMA for Availability Groups and in All Public Clouds

Hello from Microsoft Ignite on day two.

Yesterday was a big day between all of the Windows Server 2019, Azure, and SQL Server 2019 announcements. Others have covered things like new features at a broad glance (here’s the official list from Microsoft). I’ll get into some of those things over the next few weeks when I get some time to play with the bits for SQL Server 2019 and discuss things like the SQL Server Big Data Clusters. However, now that SQL Server 2019 CTP 2.0 has been officially announced, there’s something I want to address: the network transport for Always On Availability Groups (AGs) and how it must be improved.

Let’s Talk Disks

One of the keys to success for any AG configuration is the speed of the disks on all replicas – primary or secondary. If you want synchronous data movement, any secondary replicas have to be as fast or faster than the primary to keep up (network speed matters, and I’ll address that here in a minute). If you add read only workloads to a seconary replica, that increases disk usage, so again, speed matters.

We are beyond simple SSDs. The real speed is no longer there. Most people are looking at NVMe drives these days, which are faster flash-based drives than “traditional” SSDs. However there is a new (yet old) kid in town: persistent memory aka storage class memory aka PMEM. “Straight” memory is always going to be faster than going to disk due to the way systems are architected internally. Back in the day we had things like RAM SANs, but the idea of using memory for storage is coming back around again in the form of PMEM. SQL Server 2016 initially supported PMEM (NVDIMM only, and I think just NVDIMM-N) specifically for the tail of the log caching (see Bob Dorr’s blog post). Capacity for persistent memory was a bit small, so it made sense. SQL Server 2016 (and later) on Windows Server also now supports PMEM if the PMEM was formatted as block-based storage (i.e. it was configured like a normal disk to Windows Server).

There are two PMEM enhancements in SQL Server 2019:

Support for Intel Optane
For Linux, SQL Server data, transaction log, and In-Memory OLTP checkpoint files can now be placed on PMEM in enlightened mode

Many newer physical servers have slots for persistent memory; it’s not a passing fad.

What’s the point here, Allan? Whether done right with newer NVMe drives, PMEM, or both, you can get blazing fast IOPS for SQL Server. This is good news for busy systems that want to use AGs. However, there is a looming problem especially with these speeds: the network.

Why Is Networking Your Next Big Problem?

The stage is set: you’ve got blazing fast storage and a busy database (or databases) in an AG, but your network is as slow as two tin cans connected by a string. It won’t matter if your disks came straight from setting a a record at the Nürburgring. A slow network pipe will choke the ability to keep an AG that is synchronous in a synchronized state even with compression enabled. It’s that simple. The same could be said of a large database using seeding for the replicas.

Enter RDMA

I’ve talked about Remote Direct Memory Access (RDMA) in the past in two different blog posts (New SQL Server Benchmark – New Windows Server Feature and Windows Server 2012 R2 for the DBA: 10 Reasons It Matters), so I’m not going to rehash it in any kind of depth. TL;DR it’s really, really fast networking that at least on Windows Server, is lit up automatically when you have everything in place. However, not everything can use RDMA. Things such as SQL Server’s tabular data stream (TDS) need to be enabled for use on RDMA, just like Live Migration traffic in Hyper-V and SMB 3.0 (SMB Direct) was. SMB Direct can be used with FCIs, and has been supported for some time. It’s part of that benchmark linked.

Some good news, though:

Windows Server and Linux both support RDMA (I’m not sure about containers, though … I’m guessing not, but I’d need to dig more)
Both Hyper-V (Build 1709 or later) and ESXi (6.5 or later) now support RDMA inside guest VMs. The bad news: ESXi only supports it for Linux.

My Call(s) to Action

1. The Windows Server and SQL Server development teams need to work together to enable RDMA for AG traffic on Windows Server (which would most likely be Windows Server 2019 in a patch, or later; don’t hold your breath for Windows Server 2016), and SQL Server needs to get RDMA working on Linux.

2. VMware needs to support Windows Server workloads with their PVRDMA adapters. VMware really is missing an opportunity here.

3. We need RDMA for IaaS VMs in the public cloud that can be used with SQL Server. This is for two reasons: a) Storage Spaces Direct for FCIs b) AGs if RDMA traffic is enabled. For Azure, this would be enabled by the Azure compute and/or networking teams. Azure has some IaaS SKUs with RDMA networking so it’s possible, but they are for HPC and not general use such as D- and G-class VMs. There’s no RDMA that I can see in EC2 or GCP, so I think those are pipe dreams, but for those who want FCIs, it sure would be great to be able to deploy S2D right now and have it work well, and then also work for AGs down the road. Azure is our best hope here.

4. Assuming #1, Azure needs to enable RDMA so that Azure SQL Database and Azure SQL Database Managed Instance can take advantage of RDMA, and make sure it does things like work across Availability Zones in a region.

That’s it. I’m not asking for the sun, moon, and stars. Most, if not all, of this is doable. There’s already precedence for supporting RDMA for SQL Server via FCIs on Windows Server, and that also needs some cloud love if you want to use S2D up there. RDMA needs to be brought over the finish line for all of the SQL Server availablity scenarios regardless of platform. In a cloud first option, we should not be saddled by slow inter-server connectivity.

September 23, 2016

New SQL Server Benchmark – New Windows Server Feature

It hasn’t been widely publicized yet in SQL Server circles, but Intel just published a brand new benchmark with physical SQL Server 2016 instances and Windows Server 2016. There are a lot of good numbers in there, but the one that should raise an eyebrow (in a good way) is 28,223 transactions per second.

How did they do this? They used new feature of Windows Server 2016 Datacenter Edition called Storage Spaces Direct (S2D). S2D is a new way to deploy a WSFC using “shared storage”, and it can be used either with Hyper-V VMs or SQL Server FCIs directly running on physical hardware. While in some ways it can be compared to VMware’s VSAN or something like Nutanix, the reality is that S2D is a different beast and can be accessed by more than just virtual machines (hence bare metal SQL Server 2016). I’ve demoed S2D in the past with older builds of the Windows Server 2016 Technical Previews, and I can’t wait to get my hands on the RTM bits soon.

S2D allows you to configure very fast local storage such as NVMe-based flash/SSD in each of the WSFC nodes and have those nodes then utilize it (no really … local storage for things like FCIs, and not just TempDB). Note in the picture underneath the specs the hardware is using RDMA NICs. In the immortal words of Jeffrey Snover “don’t waste your money buying servers that don’t have RDMA NICs”. This is true in the Windows Server world on physical hardware. VMware does not support RDMA or Infiniband as of now, but they recently added support for 25 or 50 Gb networks in ESXi 6.0 Update 2. It’d be great if VMware supported RDMA since it would really help with vMotion traffic. Time will tell!

UPDATE: It does look like VMware is edging towards RDMA see here and here for public evidence.

So what is RDMA? RDMA stands for Remote Direct Memory Access, which is a very (VERY) fast way to do networking. You can bingoogle to find more information, such as there are different flavors (RoCE and iWARP), and some say Infiniband and RDMA are one in the same. RDMA connectivity can revolutionize your storage connectivity and is great for things like Live Migration (and in the future, hopefully vMotion) networks. Its massive bandwidth enables things like converged/hyperconverged solutions because there is an insane amount of bandwidth and speed. Hyperconverged is the latest marketing buzzword bingo that every company uses a bit differently, so you’ll want to understand how each one is using it. Here’s the bottom line, though: fast networking is going to be the key to most things going forward including storage access. If you’re still on 1Gb or even just doing 10Gb, you should really consider looking at faster things.

I’ve been talking about RDMA and Scale Out File Server (SOFS) with SQL Server for years. SOFS, when implemented right, uses RDMA. SQL Server natively supports RDMA and SOFS – there’s nothing that needs to be done other than using SMB 3.0 (well, SMB Multichannel and SMB Direct) to store your databases and use something like SOFS to serve it up. In fact, a few years back, I designed and helped to implement a hybrid Hyper-V/physical FCI solution for a customer using RDMA and SOFS. I remember the meeting where I proposed the RDMA aspect of the architecture – people looked at me like I had two heads because it is a left field concept in the SQL Server world. Six months later when we got into a lab, none of us had seen such speed and most of the concerns and doubts faded away. Having seen and played with S2D for over a year now, I’ve seen the potential for how it can be used with SQL Server, and Intel’s new benchmark confirms it. If you care about pure performance with SQL Server, this is going to be an awesome architecture (SQL Server + S2D).

Ignite is just around the corner with the official Windows Server 2016 launch. S2D is here. If you want to take advantage of the speed and power of Windows Server (including 2016), RDMA, S2D, SOFS, Hyper-V, or vSphere (especially when RDMA is released) for SQL Server, contact us. It’s a brave new world, and SQLHA can guide you through it.

Popular Posts

Categories