Recently Sagi Grimberg, co-founder and CTO at Lightbits Labs, engaged in a Q&A discussion on the future of data centers and delivering greater performance, flexibility, and lower TCO for NVMe over Fabrics (NVMe-oF™)-based disaggregated storage.  Below is a summary of the conversation:

How important is NVMe-over-TCP?
NVMe™/TCP is very important in my mind.  Folks have wanted an alternative to Fibre Channel Storage Area Network (FC SAN) that is performant, but much more affordable and allows consolidation of data center networks and practices. The Internet Small Computer Systems Interface (iSCSI) offers SAN with Ethernet TCP/IP networks but has been disappointing in its overall performance characteristics. NVMe/TCP on the other hand, offers the speed and low latency of NVMe, but more importantly preserves the ubiquity of Ethernet and capitalizes on the well-established base of TCP/IP networking knowledge and practices. In short, it is an industry standard, becoming widely supported, ubiquitous, performs better than the legacy iSCSI and Fibre Channel SAN, and leverages standard Ethernet, which costs less and offers higher bandwidth and flexibility compared to Fibre Channel.

How is this technology used?
NVMe-over-TCP can be used anywhere iSCSI is used but provides greatly improved latency and much higher levels of IOPs on the very same Ethernet/TCP networks. Moreover, It has a good fit for highly transactional workloads (databases, analytics and message streaming) as well as high bandwidth (real-time analytics, video processing, AI/ML) workloads with emphasis on large-scale deployments and higher network speeds and feeds.

Who currently offers, or is planning to offer, the technology?
Lightbits Labs was the first to offer a production NVMe/TCP based solution, but other smaller and larger companies also have products available, or have announced products that will support NVMe/TCP. Vendors include Pure Storage (future), Dell/EMC (future), NetApp have discussed NVMe/TCP in blogs, Infinidat, Fungible, Pavilion Data and more. In addition, Network vendors such as Intel, Mellanox (Now NVIDIA), Marvell, SolarFlare (Now Xilinx), Kazan-Networks (Now WD) have also announced support, offloads and enhancements for NVMe/TCP.

What is the target market?
The target market is exceedingly broad. First of all, the modern data centers look more and more like a cloud as we’ve come to know it, with enterprises adopting cloud practices as well as cloud native deployments in environments such as Kubernetes or Openstack®. It’s hard to imagine the cloud built on top of dedicated Fiber Channel networks, so NVMe/TCP becomes much more applicable due to its superior performance and latency compared to iSCSI (as well as other TCP based storage protocols). Secondly, as the maturity and ecosystem evolves for NVMe-oF and NVMe/TCP specifically, also traditional bare-metal deployments currently built on iSCSI or Fiber Channel SANs will benefit from adopting NVMe/TCP.

What are the potential advantages compared to other NVMe-over-fabric options like Fibre Channel or Infiniband?
NVMe/TCP’s biggest advantage over NVMe Remote Direct Memory Access (NVMe/RDMA) is simplicity. NVMe/TCP runs on every NIC under the sun. It lowers the barrier for users to evaluate different products without requiring non-standard practices or specific HW. Another advantage is scalability. Infiniband has been traditionally deployed as a backend fabric when it comes to storage systems where the scale is more limited and the freedom for specialization is acceptable. Front-end fabrics however, require strict ubiquity and typically higher scalability. That is why TCP/IP and FC are used much more broadly there. The advantage compared to Fiber Channel-NVME is mainly cost reduction, consolidation and increased bandwidth.

What are the potential roadblocks to adoption?
Unlike some of the storage vendors, VMware and Microsoft do not currently support NVMe/TCP yet.  Thus today, NVMe/TCP is really limited to Linux environments. Additionally, you need a fairly modern kernel/distribution (such as recent versions of RHEL, SUSE, Oracle Linux, etc.) to have the full NVMe/TCP driver and multipath capabilities.  These “roadblocks” should evaporate as organizations upgrade to newer releases and when Microsoft and/or VMware support NVMe/TCP.

At the same time, NVMe-oF still has some gaps compared to the mature and complete SCSI and FC standards mainly around in-band authentication as well as automated discovery and enumeration that are important mainly in enterprise environments. These are areas that are actively being worked on as we speak. Having these gaps addressed, in combination with the evolving ecosystem support, and capable products becoming predominant in the market, will make NVMe-oF and NVMe/TCP viable options for almost every deployment out there.

Are the major storage vendors planning to adopt this technology?
Yes – see above.

What does the future look like for NVMe-over-TCP?
I believe that the ubiquity of Ethernet & TCP/IP will naturally drive people toward NVMe/TCP. If you could have the simplicity of iSCSI but with substantially more IOPs and much lower latency on the same Ethernet fabric, why wouldn’t you switch?

We’re seeing the entire ecosystem spectrum, hardware, system and platform vendors, as well as customers making substantial investments in NVMe-oF and specifically in NVMe/TCP. I believe it is no longer a question of “if” but rather a matter of time until it will be the de-facto standard for block storage over the network.

As trailblazers in this field, Lightbits Labs NVMe/TCP solution has been successfully tested and deployed at industry leading cloud data centers. Lightbits Labs NVMe architecture provides efficient and robust disaggregation with low latency, delivering data faster to applications and unlocking system performance capability.

All product names, trademarks, registered trademarks, and/or servicemarks may be claimed as the property of their respective owners.

NVM Express™ (NVMe™) and NVMe™ over Fabrics (NVMe-oF™) is a trademark of NVM Express, Inc. PCI-SIG® and PCIe® are registered trademarks of PCI-SIG.

Additional Resources

NVMe over TCP
Kubernetes Persistent Storage
Edge Cloud Storage
Ceph Storage
Disaggregated Storage
NVMe Storage Explained: NVMe Shared Storage, NVMe-oF, and More