Zero-Loss PostgreSQL Failover with NVMe over TCP

If you’ve been running stateful workloads on Kubernetes, you know the “Storage Detach” nightmare. Traditionally, moving a block-backed volume between nodes is a game of patience – waiting for CSI timeouts and forced detachments while your database sits in Pending.

But what if the system was actually designed to wait, flush, and hand over?

We recently stress-tested a PostgreSQL architecture on Lightbits NVMe over TCP, and it proved resilience isn’t about being “instant” – it’s about being atomic. Even with a 25-second handover window, we didn’t miss a single write transaction.

The Anatomy of a Graceful Failover

We tested a scenario where an external pod hammered the PostgreSQL database with write queries while we deleted the leading PostgreSQL pod. Unlike a “crash” scenario, this was a coordinated relay race between two worker nodes sharing a Lightbits RWX (ReadWriteMany) PVC.

Here is exactly what happened during those 25 seconds::

1. The Flush and Freeze

When the leading pod receives the deletion signal, it doesn’t just vanish. It performs a graceful shutdown, flushing all memory buffers to the Lightbits. This ensures the ext4 filesystem is in a “clean” state before the pod exits.

2. The Coordinated Handover

The secondary pod – residing on a different worker node – is already connected to the storage at the block level. However, it intelligently waits. It stays “suspended” until the first pod has fully cleared its locks and exited.

3. The Instant Mount

Image showing ACL mode for Lightbits NVMe over TCP storage

Once the first pod is gone, the secondary pod takes the reins. Because the volume is already mapped via NVMe over TCP, there is no “attaching” delay. It simply mounts the ext4 filesystem. Since the buffers were flushed, the filesystem is ready immediately; no lengthy recovery or journal replays are required.

4. The Database Resume

PostgreSQL starts up on the secondary pod, sees the intact data, and begins accepting connections.

The Result: 25 Seconds of Silence, 100% Data Integrity

During this cycle, the database is briefly unavailable for receiving or storing transactions. However, the Linux stack and the application layer are fully prepared for this:

The Timeout Cycle: The Linux system and the networking stack remain aware of the transition, maintaining the connection state.
The Application Retry: The external “writer” pod doesn’t crash or throw a 500 error. It simply enters a retry loop.
Zero Data Loss: After the 25-second handover, the external pod successfully connects to the new leader and commits its queued transactions.

Bottom line: we experienced a 25-second delay, but no transactions were lost.

Why This is a Game Changer

Predictable Recovery: By using Lightbits and a mounted ext4 file system on top of a Lightbits volume, we replaced the “randomness” of CSI timeouts with a predictable, coordinated 25-second window.
No Storage Re-Claims: The block device is already there on the secondary node. We cut out the “Detach/Attach” dance that usually takes minutes.
NVMe Speed: Lightbits delivers the low-latency performance needed to flush that buffer and restart the database as fast as the hardware allows.

Stop Waiting for Timeouts, Start Designing for Resilience

This architecture is the “cheat code” for anyone tired of the slow-motion failovers inherent in legacy storage. By combining the raw throughput of Lightbits with the intelligent coordination of ext4 and PostgreSQL, you aren’t just building a cluster, you’re building a high-performance engine that knows how to switch gears without stalling.

Ready to see the blueprint? From the mount-management logic to the retry configurations of the external pods, we have the data to help you build this.

Download the white paper now, and start building the resilient, RWX-powered future your PostgreSQL and Kubernetes clusters deserve!

Discover

Deploy

Decide

See us at STAC Summit London

Crusoe AI Cloud

Nebul AI Cloud

Big Financial Services Firm Breaks Free from Storage Constraints

Financial Services on AWS

Boost Transactions and Cuts Storage Costs

Power Millions of Kubernetes CPU Cores

Edge Cloud Services

FI-TS

Kubernetes as a Service

Explore resources

5 Reasons Why Lightbits Outperforms Ceph for Private Clouds

A Guide to Infrastructure Modernization for CSPs and Service Platforms

Asian eCommerce Giant Builds a Real-time Data Platform

25 Seconds to Perfection: How We Achieved Zero-Loss PostgreSQL Failover

The Anatomy of a Graceful Failover

4. The Database Resume

The Result: 25 Seconds of Silence, 100% Data Integrity

Why This is a Game Changer

Stop Waiting for Timeouts, Start Designing for Resilience

About the writer

Ready to get started?