NVMe/TCP Addressing Challenges of a Stateful Containerized Environment

At Lightbits, we get the appeal of container technology.

We see that data center infrastructure and operations groups are under immense pressure to deliver a scalable, high-performance and fault-tolerant environment with zero downtime. Running containerized applications, orchestrated by a flexible scheduler, addresses this challenge as containers can be quickly spun up anywhere and terminated with minimal overhead. In fact, Gartner predicts that, by 2020, more than 50% of global organizations will be running containerized applications in production, up from less than 20% today.

However, with the constant growth of data in the forever available data center, one of the critical challenges is that applications and services running at scale must stay stateful as they migrate around the data center in order to keep services available and efficient in the presence of constant failures. The application response time can be heavily impacted by the gravity of its data and state.

With a direct-attached storage architecture, when an application or a service is rescheduled to run on a node that is remote from its data, the application needs to undergo a “warm-up” stage restoring its state by reconstructing its data set from neighbor nodes hosting replicas or shards. Introducing the concept of persistent volumes in the scheduler addresses the “warm-up” stage problem. Today, the leading container schedulers (e.g., Kubernetes, Apache Mesos, Pivotal Cloud Foundry, etc.) all support the notion of persistent volumes through a standard Container Storage Interface (CSI).

This architecture requires a disaggregated storage technology that can deliver the scalability and performance for efficiently handling and processing large data sets. But adoption of new technologies for containerized applications in a running production environment must be done in steps that produce minimal friction and minimal or no new hardware requirements.

At Lightbits, we are working with our customers to understand the mobility challenges of running containerized applications and identifying the best solutions that are deployable in their current environment at minimal friction.

We understand that our customers want to spin up containers without any limitation of the data and application affinity. Storage at cloud scale is way beyond any distance limitations or niche controlled use cases. It is really about stretching data access across the data center. Therefore, we defined our solution around reliable and widely used network infrastructure and protocols that are proven to scale.

The solution that we came to is software-defined storage interfaced over TCP/IP. This solution enables deploying lightweight containerized applications with storage access to persistent volumes but does not force any new hardware requirements on the existing infrastructure or new operational practices.



Scale-out storage solves the data mobility problem by exposing fine-grained access controls to data from different server locations in the data center. Understanding that scale matters and that NVMe and NVMe-oF are gaining momentum as key enablers in meeting the demands of the modern data center, Lightbits came up with NVMe/TCP, which recently became an officially ratified NVMe transport binding standard supported in the Linux kernel and soon in your favorite operating system.

Lightbits’ NVMe/TCP based solution can be immediately deployed in data centers and enable a lightweight containerized service application architecture backed by a scalable, high-performant disaggregated storage system.

About the Writer:

Sagi Grimberg (@sagigrim) is a co-founder and CTO at Lightbits Labs, a storage company developing next-gen hyper scale storage solutions. He has more than 10 years of experience in storage, networking and Remote Direct Memory Access (RDMA) technologies and distributed systems. Sagi is a co-maintainer of the Linux NVMe subsystem and the lead author of the NVMe/TCP standard.  Prior to Lightbits Labs, Sagi Grimberg came from Mellanox Technologies (now owned by NVIDIA) where he served as the Storage Software manager. Sagi has written various technical papers, and made conference and meet up presentations on the innovative Lightbits NVMe/TCP capabilities.