We are pleased to announce the availability of Lightbits v3.18.1. This version delivers key advancements that expand deployment flexibility by enabling a controlled transition from DCPMM-based journaling to SSD journaling, while also strengthening observability, operational safety, and overall cluster robustness across production environments. As a leader in high-performance Software-Defined Storage that leverages NVMe over TCP block storage, Lightbits continues to provide a modern, efficient, and scalable data platform designed for demanding AI Training and Inference, cloud service providers, e-commerce, financial, and enterprise workloads.
Building on an technical preview in v3.16.1 and a successful GA rollout in v3.17.1, Lightbits continues to advance its SSD Journaling architecture. This proven capability improves cluster resilience and operational efficiency by utilizing SSD-based journaling, allowing customers to adopt a standardized hardware profile while maintaining the industry-leading reliability and durability guarantees expected from Lightbits.
v3.18 introduces a transitional deployment option that simplifies the migration of existing DCPMM-based clusters to SSD journaling through a gradual, controlled path, while further hardening cluster reliability and improving operational visibility.
Enable a Gradual Transition from DCPMM Journaling to SSD Journaling
Lightbits v3.18 adds support for an intermediate heterogeneous cluster state, where a cluster can temporarily include a mix of:
- nodes using DCPMM-based journaling, and
- newly added nodes using SSD journaling.
This mixed-mode configuration is designed specifically to support step-by-step hardware transitions: storage administrators can add new nodes with SSD journaling enabled, evolve the cluster composition over time, and then remove older DCPMM-based nodes, ultimately converging on a fully SSD-journaling cluster.
Important note on operational guidance: This heterogeneous configuration is supported only as a temporary transition state. It is not intended to be maintained as a long-term steady-state deployment. We recommend using it for limited periods as part of a controlled migration plan, and avoiding prolonged mixed-mode operation beyond the time required to complete the transition.
Resiliency Hardening Safeguards Operations and Simplifies Monitoring
Lightbits v3.18 makes clusters simpler to monitor, safer to operate, and easier to manage – especially for ongoing operations and maintenance, where administrators need clear visibility and predictable behavior.
Observability has been improved by expanding the dashboards with new Grafana panels across both the node and cluster dashboards, including a provisioning ratio gauge that makes it easier to spot capacity pressure and overcommitment trends at a glance. In addition, new Prometheus alert rules have been added for key over-provisioning thresholds (200% and 400%), enabling teams to proactively detect risky growth patterns before they become operational incidents. We also enhanced the server performance dashboards with new plots for journal SSD latency and bandwidth, improving visibility into journaling behavior and helping teams quickly correlate performance shifts with underlying device behavior.
Overall system reliability and resilience were strengthened through targeted improvements across the stack. For example, overhead after delete-volume operations was reduced by limiting AEN notifications for deleted volumes to only the necessary clients – lowering load on both cluster nodes and client machines and helping clusters stay responsive during administrative activity.
For evaluation purposes only, new maintenance APIs are introduced that enable administrators to stop, start, and restart a specific logical node on a Lightbits server. The motivation behind this capability is to support more controlled, repeatable operational procedures, allowing storage teams to coordinate service restarts in a predictable way (for example, during troubleshooting, maintenance, or validation workflows) rather than relying on manual, ad hoc sequences. This feature was released as a technical preview and is not recommended for production environments. We encourage customers interested in more granular operational tooling to test these APIs in a lab or staging cluster and share feedback with us on the Lightbits Users Community Hub, including suggested workflows, expected API behavior, and any gaps that would make this capability more useful for real-world troubleshooting and maintenance scenarios.
DMS 1.4: Workflow Scale, Stronger Security, and Improved Operability
Alongside v3.18, we are also releasing DMS 1.4, which further improves operability and reliability. Key capabilities of DMS 1.4 include: improved workflow queuing (semaphore-based) to better handle scale; dmscli enhancements, such as a configurable command timeout and a new analyze command for workflow troubleshooting; and stronger security with field-level encryption for ClusterAccess data within workflows. DMS 1.4 also simplifies thick clone execution by exposing thick snapshot and thick clone operations as single workflows, and introduces UUIDv7 workflow IDs to make debugging and timeline analysis easier.
This release also includes operational hardening, improving the cleanup playbook and Ansible role behavior (clearer separation of worker vs. service responsibilities and safer defaults), and enhancing the thick clone workflow, including more accurate progress reporting and more conservative CPU utilization defaults.
Looking ahead, the product roadmap includes releases with capabilities that enhance cluster manageability, strengthen resiliency, and provide clear migration paths as customers modernize their infrastructure.
For more details on these releases, please refer to the full release notes in the product documentation and book time to talk with an expert if you would like guidance on adopting SSD journaling or planning a controlled migration from DCPMM-based deployments.
Join us on the Lightbits User Community Hub to share your experiences and connect with our team.