Software-Defined Storage: SSD Journaling, Observability & Resiliency

We are pleased to announce the availability of Lightbits v3.18.1. This version delivers key advancements that expand deployment flexibility by enabling a controlled transition from DCPMM-based journaling to SSD journaling, while also strengthening observability, operational safety, and overall cluster robustness across production environments. As a leader in high-performance Software-Defined Storage that leverages NVMe over TCP block storage, Lightbits continues to provide a modern, efficient, and scalable data platform designed for demanding AI Training and Inference, cloud service providers, e-commerce, financial, and enterprise workloads.

Building on an technical preview in v3.16.1 and a successful GA rollout in v3.17.1, Lightbits continues to advance its SSD Journaling architecture. This proven capability improves cluster resilience and operational efficiency by utilizing SSD-based journaling, allowing customers to adopt a standardized hardware profile while maintaining the industry-leading reliability and durability guarantees expected from Lightbits.

v3.18 introduces a transitional deployment option that simplifies the migration of existing DCPMM-based clusters to SSD journaling through a gradual, controlled path, while further hardening cluster reliability and improving operational visibility.

Enable a Gradual Transition from DCPMM Journaling to SSD Journaling

Lightbits v3.18 adds support for an intermediate heterogeneous cluster state, where a cluster can temporarily include a mix of:

nodes using DCPMM-based journaling, and
newly added nodes using SSD journaling.

This mixed-mode configuration is designed specifically to support step-by-step hardware transitions: storage administrators can add new nodes with SSD journaling enabled, evolve the cluster composition over time, and then remove older DCPMM-based nodes, ultimately converging on a fully SSD-journaling cluster.

Important note on operational guidance: This heterogeneous configuration is supported only as a temporary transition state. It is not intended to be maintained as a long-term steady-state deployment. We recommend using it for limited periods as part of a controlled migration plan, and avoiding prolonged mixed-mode operation beyond the time required to complete the transition.

Resiliency Hardening Safeguards Operations and Simplifies Monitoring

Lightbits v3.18 makes clusters simpler to monitor, safer to operate, and easier to manage – especially for ongoing operations and maintenance, where administrators need clear visibility and predictable behavior.

Overall system reliability, scalability, and resilience were strengthened through targeted improvements across the stack. For example, overhead after delete-volume operations was reduced by limiting AEN notifications for deleted volumes to only the necessary clients, lowering load on both cluster nodes and client machines and helping clusters stay responsive during administrative activity. This release also adds support for 30TB drives, with up to 10 drives per instance, and removes previous limitations for AMD dual-socket-based deployments.

For evaluation purposes only, new maintenance APIs are introduced that enable administrators to stop, start, and restart a specific logical node on a Lightbits server. The motivation behind this capability is to support more controlled, repeatable operational procedures, allowing storage teams to coordinate service restarts in a predictable way (for example, during troubleshooting, maintenance, or validation workflows) rather than relying on manual, ad hoc sequences. This feature was released as a technical preview and is not recommended for production environments. We encourage customers interested in more granular operational tooling to test these APIs in a lab or staging cluster and share feedback with us on the Lightbits Users Community Hub, including suggested workflows, expected API behavior, and any gaps that would make this capability more useful for real-world troubleshooting and maintenance scenarios.

DMS 1.4: Workflow Scale, Stronger Security, and Improved Operability

Alongside v3.18, we are also releasing DMS 1.4, which further improves operability and reliability. Key capabilities of DMS 1.4 include: improved workflow queuing (semaphore-based) to better handle scale; dmscli enhancements, such as a configurable command timeout and a new analyze command for workflow troubleshooting; and stronger security with field-level encryption for ClusterAccess data within workflows. DMS 1.4 also simplifies thick clone execution by exposing thick snapshot and thick clone operations as single workflows, and introduces UUIDv7 workflow IDs to make debugging and timeline analysis easier.

This release also includes operational hardening, improving the cleanup playbook and Ansible role behavior (clearer separation of worker vs. service responsibilities and safer defaults), and enhancing the thick clone workflow, including more accurate progress reporting and more conservative CPU utilization defaults.

Looking ahead, the product roadmap includes releases with capabilities that enhance cluster manageability, strengthen resiliency, and provide clear migration paths as customers modernize their infrastructure.

For more details on these releases, please refer to the full release notes in the product documentation and book time to talk with an expert if you would like guidance on adopting SSD journaling or planning a controlled migration from DCPMM-based deployments.

Join us on the Lightbits User Community Hub to share your experiences and connect with our team.

Discover

Deploy

Decide

Book a Meeting with us at KubeCon EU

Crusoe AI Cloud

Nebul AI Cloud

Big Financial Services Firm Breaks Free from Storage Constraints

Financial Services on AWS

Boost Transactions and Cuts Storage Costs

Power Millions of Kubernetes CPU Cores

Edge Cloud Services

FI-TS

Kubernetes as a Service

Explore resources

5 Reasons Why Lightbits Outperforms Ceph for Private Clouds

A Guide to Infrastructure Modernization for CSPs and Service Platforms

Asian eCommerce Giant Builds a Real-time Data Platform

Lightbits v3.18 Release: Enabling a Smooth Transition to SSD Journaling, with Stronger Observability and Cluster Resiliency

Enable a Gradual Transition from DCPMM Journaling to SSD Journaling

Resiliency Hardening Safeguards Operations and Simplifies Monitoring

DMS 1.4: Workflow Scale, Stronger Security, and Improved Operability

About the writer

Ready to get started?