Ceph Storage and the NVMe Era

Ceph storage, a scalable solution created by Sage Weil in 2005, has undergone numerous iterations to become a common choice for many organizations. It’s known for its unified storage capabilities, handling block, object, and file storage within a single system. This “one-size-fits-all” versatility has made it popular for many use cases.

However, the evolution of hardware, particularly the rise of fast flash drives with NVMe storage, presents challenges for Ceph. While Ceph storage was designed in an era dominated by hard drives, today’s modern hardware environments reveal Ceph’s inherent challenges, impacting its performance in terms of tail latency and speed when compared to modern solutions.

One of the biggest challenges facing Ceph storage is keeping pace with NVMe flash drives. BlueStore, a back-end object store for Ceph OSDs, was introduced to improve speed, especially for random writes. While BlueStore does enhance Ceph storage performance, it doesn’t fully resolve the inherent latency bottlenecks in Ceph’s architecture, limiting its ability to utilize the potential of NVMe media fully. In essence, for users seeking to maximize NVMe performance, Ceph storage itself becomes the bottleneck.

 

NVMe/TCP Support for Ceph

While Ceph strives to support modern high-speed protocols such as NVMe over TCP, the current approach involves using protocol gateways and translation layers atop the existing Ceph architecture. This model may improve Ceph’s interoperability, but it deviates from the initially intended design of NVMe/TCP fabric architectures. This design is implemented by Lightbits, which is meticulously engineered to offer direct and high-performance host connectivity.

Ceph Storage NVMe-oF gateway

Ceph NVMe-oF gateway from IBM Storage Ceph product documentation, “Ceph NVMe-oF gateway (Technology Preview),”

When to Consider Alternatives to Ceph Storage

Organizations undergoing infrastructure modernization initiatives for modern Kubernetes-based applications require low latency and consistent response times. While Ceph storage with BlueStore aimed to address latency concerns, it still falls short of fully leveraging NVMe’s capabilities. Modern architectures often favor deploying local flash, typically NVMe, on bare metal servers to achieve optimal performance, an area where Ceph storage can become a bottleneck.

Moreover, while Ceph storage is often used for shared storage, it exhibits relatively poor flash utilization, ranging from 15% to 25%. In the event of a failure, rebuild times in Ceph storage can be slow due to the extensive network traffic required for data recovery.

In our webinar, “Lightbits: The Next Evolution Beyond Ceph,” it was clear that IT folks in a broad range of industries are considering Lightbits as a Ceph alternative for block storage.

The New Kid on the Block

Lightbits, a software-defined storage solution natively designed with NVMe over TCP, offers an alternative by providing local NVMe performance with shared resource capabilities. These solutions provide features like thin provisioning and optional compression, often lacking in basic NVMe deployments. They can work with commodity hardware, allowing users to scale by adding SSDs and integrating with platforms like OpenStack and Kubernetes, or deploying on bare metal.

NVMe/TCP solutions deliver NVMe performance without the complexity of RDMA, using standard TCP/IP networks. This simplifies deployment and management, eliminating the need for specialized network protocols and configurations. Furthermore, drive rebuilds in these systems occur within the chassis, minimizing disruption and accelerating recovery.

Benchmarks have demonstrated that solutions like Lightbits software can significantly outperform Ceph storage, offering substantially higher IOPS and lower latency at a potentially lower total cost of ownership.

Ceph vs Lightbits I/O Rate

 

Optimizing Your Block Storage Strategy

The key is to deploy storage solutions strategically based on workload requirements. Ceph storage remains a cost-effective option for applications that prioritize capacity and can tolerate higher latency, such as spinning disk, object, and file storage. Solutions optimized for NVMe are more suitable for applications demanding low latency and consistent high performance.

In conclusion, while Ceph storage has been a valuable solution for many years, its architectural limitations can hinder performance in modern, flash-centric environments. By understanding the strengths and weaknesses of Ceph storage and exploring alternative solutions, organizations can optimize their storage infrastructure to meet the evolving demands of their applications and workloads.

If you want to learn more about the performance benchmarks of Lightbits versus Ceph storage, download the whitepaper.

Additional Resources

Ceph Storage [A Complete Explanation]
Disaggregated Storage
Kubernetes Persistent Storage
Edge Cloud Storage
NVMe over TCP
scsi vs iscsi
Persistent Storage

About the Writer: