Persistent Storage For Kubernetes Applications

Going Up? Persistent Storage for Kubernetes Applications is on the Rise

Fun fact: Kubernetes is the Greek word for “helmsman.” The open-source Kubernetes platform acts as a helmsman for automating deployment, scaling, and management of containerized applications. It’s this orchestration capability that helps enterprises group containers that make up applications into logical units called pods. Those logical units can then be managed in various deployments, locations and models and the capabilities and popularity of Kubernetes is growing quickly.

Open Source Containers

A container image – such as Docker — is a lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries and settings. Containers are based on open source solutions and are free to use. Unlike virtual machines (VMs), containers do not require a full operating system to run; instead, many containers might share the same single host operating system. Splitting an application into smaller pieces and making each of these pieces independent is known as “containerizing” that application. For example, a web application that uses a database may have one container for the HTTP server, another container for the load balancer and a third for the back-end database. Once containerized, these “micro-services” can be scaled independently, helping enterprises operate more efficiently. This promotes scalable, portable and flexible designs.

Containers make applications services portable, separating them from server physicality so they can move between hosts, data centers and environments. If a server fails, the container instance can begin on an alternate server. Containers are also completely scalable: either up or down, in or out. If only part of the application needs to scale (like the back-end database) then just that portion can scale

Due to their portability, it is a small wonder, then, that the popularity of containers is climbing. Gartner research predictsthat by 2023, 70% of organizations will be running 3 or more containerized applications in production.

Bring on Kubernetes

Orchestrating containers efficiently and cost-effectively is often left to Kubernetes. As more enterprises get immersed in the benefits Kubernetes provides, many find the allure of the platform goes beyond deploying simple cloud-native applications and microservices. Now, many enterprises are using Kubernetes for launching machine learning (ML) and artificial intelligence (AI) applications, which are data-intensive, requiring high performance and high availability.

In order to meet their resource requirements and deliver the best service experiences to their customers, the best practice for data storage for these demanding applications is to use direct attached storage (DAS), such as Local Persistent Volumes for Solid State Drives (SSDs). The problem is that this is like applying a pair of handcuffs around your applications wrists – they’ll no longer be free to move. In a technical sense the application portability is lost, because data can no longer be moved off that physical server.

Using direct attached storage (DAS) on applications in a containerized environment breaks the advantageous promise of the Kubernetes model that stresses container/pod portability. Leading enterprises to lose the flexibility that they are along with the ability to simply scale their business.

Secondarily, deploying local SSDs on every application server results in severe under-utilization, as up to 85% of overall capacity goes wasted, leading to millions of dollars in wasted expenditures. The industry is addressing this problem via disaggregation – accessing storage remotely though it may appear to be local. Protocols like NVMe-over-Fabrics (NVMe-oF) now exist so applications that work best with local NVMe flash can realize the same performance with disaggregated NVMe without impacting the portability associated with containers and pods.

Since late 2018, Kubernetes has supported Container Storage Interface(CSI) plug-ins that allow third-party support for persistent storage needs. This allows for network-attached storage or remote storage for data-centric, cloud-native applications but then suffers in performance, or in data services.

Block storage technologies with low latency are most useful in highly transactional workloads such as databases or message streaming services. There are essentially three block storage technology solutions that support the CSI plug-in: direct-attached storage (DAS), storage area networks (SAN) and NVMe-oF.

NVMe-oF 

With the ease of operation of SAN and high-performance of DAS, NVMe-oF solutions are making inroads at many enterprises that value flexible, high-availability storage for persistent Kubernetes storage. A key benefit of NVMe-oF solutions for cloud-native applications is disaggregation while preserving local performance. This maximizes Kubernetes functionality and provides ease of migration, without compromising performance or availability, helping enterprises lower their total cost of ownership through greater operational efficiencies.

The disadvantages of NVMe-oF are complexity and reliability. Systems that use NVMe/RoCE (RDMA over Converged Ethernet) require changes to network infrastructure (special NICs) and Ethernet switch settings that are foreign to the ubiquitous TCP/IP settings found in most Kubernetes environments. NVMe/TCP does not require these special settings or NICs, and as such is positioned for simpler adoption, but is just starting to see wider adoption. When it comes to reliability – some proprietary arrays support NVMe-oF with failover but don’t follow the philosophy of standard servers and software defined “everything.” Open-source NVMe-oF targets tend to not offer high availability or data services.

Moving Forward

The move to containerize “everything” is driving the need for persistent container storage. The applications requiring persistent volumes (such as databases, analytics, message streaming services and log processing) are often the same ones that require low-latency, high-performance storage. The industry is looking for solutions that allow for the performance of local storage while preserving application portability at affordable costs. Software-defined storage solutions that utilize NVMe/TCP are in a position to fill the market gap,  enabling the independent scaling of storage and compute to maximize the value of applications in containerized applications in Kubernetes environments.

About the Writer: