NVMe over TCP details and features you need to know
NVMe/TCP is here. The new specification lays out how to deliver data across an existing TCP network, making implementation simple and cost-efficient for organizations.
The latest revision of the NVMe over fabrics specification, NVMe-oF 1.1, includes support for the TCP transport binding. The addition of NVMe over TCP makes it possible to use NVMe-oF across a standard Ethernet network without having to make configuration changes or implement special equipment.
Although a young technology, NVMe-oF has already been widely incorporated into network architectures, providing a state-of-the-art storage protocol that can take full advantage of today’s SSDs, unlike older protocols such as iSCSI.
NVMe-oF helps bridge the gap between DAS and SANs and enables organizations to more effectively support workloads that require high throughputs and low latencies, including AI, machine learning and real-time analytics. The addition of TCP to the standard makes NVMe-oF more valuable than ever.
What is TCP transport binding?
Prior to NVMe-oF 1.1, the NVMe-oF specification was limited to Fibre Channel and remote direct memory access fabrics such as InfiniBand, RDMA over Converged Ethernet and the Internet Wide Area RDMA Protocol. Although these are well-known network technologies, they can be complex to implement or require special equipment and configurations. The TCP transport binding can be used over any Ethernet network, as well as the internet, eliminating many of these challenges.
TCP is a widely accepted standard that defines how to establish and maintain network communications when exchanging application data across a network. The protocol determines how messages are assembled into smaller packets before transmitting them and how to reassemble them at their destination, ensuring that all data is transmitted and received correctly.
TCP works in conjunction with the Internet Protocol, which determines how to address and route each packet so it reaches the correct destination. Both TCP and IP are part of the TCP/IP suite of communication protocols used to facilitate communications across the internet and private networks.
TCP/IP is divided into four layers: application, transport, network and physical. The transport layer includes TCP, the network layer includes IP and the physical layer includes Ethernet, which defines a standard for interconnecting TCP/IP nodes in a LAN.
How data is encapsulated and delivered
The TCP transport binding in NVMe-oF defines the methodology used to encapsulate and deliver data between a host and a non-volatile memory subsystem. Although the NVMe/TCP specification focuses primarily on software-based implementations that use TCP application interfaces, there is nothing in the specification to preclude NVMe over TCP from also being used for hardware-based implementations.
The TCP binding defines how queues, capsules and data should be mapped to support TCP-based communications between NVMe-oF hosts and controllers across standard IP networks. The host and controller communicate by exchanging protocol data units (PDUs), which provide a structure for transferring data, capsules, or control and status information. The exact length and configuration of a PDU depends on its specific purpose in the data transfer process.
NVMe/TCP communications also incorporate a messaging and queuing model that establishes the communication sequence. The TCP binding uses a transport-specific mechanism and, optionally, an in-capsule data component to support data transfers. Each connection incorporates a single queuing pair made up of an admin or I/O submission queue, along with a completion queue.
The first step in transferring data is to establish a connection between the host and controller. The host initiates the connection by sending a request message to the controller. The controller is a passive component that listens for connection requests. Upon receiving a request, the controller sends a response to the host, acknowledging that initial communications have been established.
The host then sends a PDU requesting the connection be initialized. The controller responds by sending its own PDU, confirming the connection has been initialized. The host and controller also share connection configuration parameters. After the connection has been initialized, they can carry out the data exchange.
Benefits and drawbacks
The NVMe/TCP specification offers several important benefits, one of which is the ubiquitous nature of TCP. Not only does TCP help drive the internet, it’s implemented extensively across networks around the world, making it one of the most common transports in use. The protocol is well-understood, well-known and actively developed by key players that contribute to maintaining and enhancing its capabilities.
In addition, NVMe over TCP is designed to work with existing TCP-based systems without requiring changes to the network infrastructure or client devices. Organizations can deploy NVMe/TCP on existing Ethernet routers, switches, adapters and other standard equipment, while keeping implementation simple and minimizing downtime and costs.
Because TCP is fully routable, it supports large-scale deployments that can span great distances, while maintaining relatively high levels of performance and low levels of latency. Enterprise data centers can implement NVMe over TCP on their existing network infrastructures and multilayered switch technologies without needing to invest in new or specialized equipment or the resources necessary to implement and maintain that equipment.
Despite these advantages, NVMe over TCP does have its downsides. To begin with, the specification can increase system processor loads, because TCP requires processing power to manage certain operations, such as calculating checksums.
NVMe over TCP also can result in higher latency rates, in part because of the additional copies of data that must be maintained in the TCP stack. The extent of this latency is still being debated and may depend on how the specification is implemented and the type of workloads being supported.
Early performance tests suggest that NVMe over TCP latency rates could range from 10 microseconds to 80 microseconds over RDMA-based NVMe-oF. Although these rates might not be acceptable for certain workloads, they might be within a tolerable range for others, especially when taking into account the ease of implementing NVMe/TCP on existing network infrastructures. NVMe over TCP performance is likely to improve as more vendors incorporate the technology into their products.
NVMe/TCP in action
Several vendors have been leading the way in adopting NVMe over TCP into their network offerings, with LightbitsLabs.com and Solarflare Communications Inc. at the forefront. For example, Lightbits offers a hyperscale storage platform that incorporates NVMe over TCP to provide composable block storage that can be implemented without affecting the network infrastructure or data center clients.
Solarflare also offers composable storage based on NVMe over TCP, enabling data centers to use their existing network infrastructures. The vendor is working with Super Micro Computer Inc. to provide offerings that use Supermicro Ultra SuperServer systems and Solarflare’s TCP-optimized network interface cards.
Given the extent to which TCP is used across the internet and within enterprise networks, the NVMe over TCP spec could pave the way for a new generation of storage offerings. These approaches would use existing TCP-based network infrastructures and state-of-the-art SSDs to help ease the transition from legacy storage systems to ones that meet the demands of today’s dynamic workloads and massive amounts of data.
NVMe/TCP might come with its own latency issues compared with RDMA-based NVMe-oF, but for many organizations, the trade-off could be worth it given how easy NVMe over TCP is to implement and how cost-effective it is.