Is it time to retire Ceph for flash?
The scalable storage solution was created in 2005 by Sage Weil and it’s gone through many iterations. Its biggest challenge is that in today’s hardware environments, despite many improvements, Ceph simply lags behind modern solutions in terms of tail latency and speed.
Sure, Ceph is highly scalable. It’s truly a fantastic, one-size-fits-all solution (it performs block, object and file for example). The problem is that it was created when hard drives ruled the day. Today’s fast flash drives with NVMe are really showing Ceph’s inherent architectural issues.
BlueStore, a back-end object store for Ceph OSDs, offers speed improvements especially in the area of random writes. While BlueStore makes Ceph better, it can’t overcome the inherent latency multipliers in Ceph architecture, making it ill suited to NVMe media. For users of NVMe the promise of BlueStore for reducing overall latency, especially tail latency and boosting performance is simply not realized. That’s because NVMe isn’t the bottleneck – it’s Ceph itself.
A RedHat project last year that configured Ceph for high-performance storage showed great promise for extending its life. It should be noted that RedHat used among the very best (expensive) CPUs in the object storage device (OSD) servers coupled with NVMe for data pools, and Optane NVMe devices for BlueStore OSDs. In the real world, that’s an expensive way to go about getting the most value from Ceph installations. It’s noteworthy that even with Ceph tuned by experts on very high performing systems, the testing showed about 3msecs of average latency during writes.
That’s where our LightOS comes in.
When Ceph Isn’t Enough
Enterprises working with the public cloud, using their own private cloud or even simply moving internal IT to new styles of applications (scale out databases, especially) want low latency and consistent response times. BlueStore was supposed to improve average and tail latency with Ceph and in some respects it does, but it cannot take advantage of NVMe. Modern architectures typically deploy local flash, usually NVMe, on bare metal to gain the best possible performance and Ceph is a bottleneck – it simply cannot realize the performance of this new media.
Enterprises also desire shared storage, and Ceph is often used for this purpose. The drawbacks, however, are that Ceph has relatively poor flash utilization at just 15 percent to 25 percent. And if there is a failure with Ceph or the host, the rebuild time can be painfully slow because there will be a great deal of traffic going over the network for a long period of time.
There’s a New Kid on the Block
In contrast, our software-defined storage solution for NVMe over TCP, LightOS, gives local NVMe performance while also acting as a shared resource. It is resilient and provides features not typically associated with NVMe, like thin provisioning and optional compression. We also can provide our optional hardware accelerator card, LightField, for those users who have a compressible load.
LightOS works with any commodity hardware. Users can add in as many SSDs as will fit. They can use standard application servers and we offer plugins for OpenStack and Kubernetes and more so it can be used with those environments or with bare metal. With LightOS, the block driver is standard in the upstream kernel and we use NVMe/TCP, which gives NVMe performance without the need for remote direct memory access (RDMA). This means you get great performance without having to learn about a new network protocols and distinct NIC and switch settings.
With LightOS, when a drive fails the rebuild occurs inside the chassis rather than on the network, speeding the rebuild so there is virtually no disruption. Anyone who knows how to use TCP already can simply start using LightOS and gain incredible performance – particularly for those applications that require very low latency and high IOPs. LightOS works great with cloud native application environments because we have plugins for OpenStack (Cinder) and Kubernetes (CSI), and it can be used on bare metal – all while offering incredible scalability.
While Ceph is a great choice for applications that are OK with spinning drive performance, its architectural shortcomings make it sub-optimal for high performance, scale-out databases and other key web-scale software infrastructure solutions. There are some applications that simply are better served by LightOS, which can double the network traffic while also giving a boost in read performance. Simply put, nothing is faster than NVMe. In fact, in head-to-head comparisons against Ceph, LightOS showed 3x more IOPs in reads, 6x more IOPs in mixed workloads, 17x lower latency for reads and 22x lower latency for mixed workloads and more, all on commodity hardware and at a much lower cost
My advice? Use Ceph where it shines. It is cheap and deep – so use it for spinning, object and file. Use LightOS when low latency and consistent performance are the priorities. These two solutions can coexist to service OpenStack, Kubernetes and bare metal. We’re happy to discuss with you how to make it happen. To learn more, watch my webinar When Ceph Isn’t Enough, There’s a New Kid on the Block.