Removing Storage Bottlenecks for CPUs

Here are some realities for businesses in 2020: Bandwidth is less costly now than ever before; flash storage prices continue to tumble; and latency has dropped. With that in mind, we are all closely watching how the largest hyperscalers like Google and Amazon utilize these factors to achieve optimal operational efficiencies because hyperscale innovation today becomes tomorrow’s enterprise practice.

The Importance of CPU Utilization

Most hyperscalers maintain their own private clouds to control costs and because they have the resources to build them. In doing so, they look to lower expense in every area with the most important area being CPU utilization.

CPUs are the reason for a computer to exist with everything else playing a supporting role. They utilize flash storage so their CPUs can run as quickly and efficiently as possible. But perhaps one of the biggest lessons we can learn from hyperscalers is that decoupling the CPU from flash means each can grow and scale independently.

The Impact from the Rise of Cloud Native Applications

Today, there is a clear trend in application architecture to transition to cloud-native applications – regardless of deployment in private or public cloud. Most of these applications are some sort of database, whether classic SQL or NoSQL; they may reside in memory and many are distributed. These applications share a need for low latency and sometimes high bandwidth. They also need consistent response times because whether the enterprise is building a web service or other modern application, they are generally stacked on top of other applications. Inconsistency or latency in the lower levels affects everything above them, leading to inconsistency in the end user experience.

Poor Flash Utilization

To achieve low latency and consistency, cloud-native applications have turned to dedicated resources per server: local flash. These environments may be bare-metal, virtualized or containerized depending on preference, performance, and operational needs. Regardless of environment, when flash is inside every application server, the result is poor flash utilization:

A human must decide how large that flash solid-state drive (SSD) should be.
– Growth (of data) over the life of the server must be estimated
– Most of us select a larger drive than we really need because no one wants to be the one who chose a drive that’s too small
– In dynamic application environments, not every application needs local flash, and not all applications will need the entire drive

The result is chronic underutilization of the flash resource. If (and when) the drive or host fails, an oversized drive will take longer to recover. For example, recovering an 8-terabyte drive running Cassandra might take between 2 to 6 hours depending on network speed, utilization, and database load, all the while taxing the network as it recovers.

Removing the Bottlenecks

LightOS addresses these issues. It’s disaggregated, virtualized NVMe running over TCP that performs like local flash. Users gain high-performance, software-defined storage from commodity servers. Storage and compute are disaggregated, so each scale independently — alleviating the problem of either having too much or too little compute power or too much or too little storage. It allows users to achieve the same kind of efficiencies as hyperscalers in removing storage bottlenecks for CPUs while avoiding the pitfalls and wasted expense of underutilized local flash.

It’s Simple

Because LightOS works with standard clients and servers over a ubiquitous networking protocol, there is not a lot for users already familiar with TCP/IP to learn. If they already know how to do things like port bonding, routing and MLAG (for example), that knowledge is directly transferable to storage. LightOS will help reduce costs by maximizing flash utilization and improving flash endurance. In fact, it has a complete flash manager built in that ensures that when quad-level cell (QLC) drives are used, LightOS writes linearly across multiple drives but utilizes large blocks ensuring both high performance and optimal drive endurance. Together, these innovations all improve operational efficiency and lower cost.

LightOS also does all these things with rich data services. While NVMe over fabrics is known for providing extremely high speeds, data services suffer. LightOS offers thin provisioning, compression, dynamic volumes at scale, data protection, quality of service and more. Users gain the data services to which they are accustomed in things like all flash arrays, but it performs like it is local NVMe.

Also, worth mentioning: Lightbits requires no proprietary software on clients/initiators. Distributions such as Red Hat, CentOS, Ubuntu, and SUSE all have the NVMe/TCP initiator included. If you build a custom kernel, the drivers are included in the upstream kernel and can be easily back-ported – or we can do this for you.

If you’re interested in learning more, check out this webinar – Scale-out Flash Storage: Breaking Old School Storage Rules

About the Writer: