Lightbits Raising the Bar in MLPerf Storage v2.0 Benchmark with AI Storage

MLPerf Storage is a benchmark designed to evaluate how storage systems perform under the I/O demands of machine learning and AI training. It uses synthetic data to simulate workloads based on models like ResNet-50, 3D U-Net, and CosmoFlow, simulating realistic patterns such as random reads and high-throughput parallel data loading. By mimicking the behavior of large-scale training jobs, MLPerf Storage helps infrastructure teams make informed decisions about storage solutions and system design for their AI requirements.

Our v2.0 Performance Improvement

The official results for the MLPerf Storage v2.0 benchmark are in, and we are excited to finally share the details of our submission. As we hinted in our pre-results blog: Raising the Bar, Again, we’ve not only met but exceeded our own performance from v1.0. Using the exact same hardware, a minimal deployment of just three commodity storage servers with Micron 7500 PRO NVMe drives, we saw remarkable improvements in all 3 models:
  • ResNet-50: We made substantial gains, with a 51% improvement for the A100 accelerators and a 16% improvement for the H100.
  • 3D U-Net: We delivered a 34% increase in performance with the A100 accelerators and a 25% increase with the H100.
  • CosmoFlow: Our performance with the A100 accelerators saw a 7% increase, and with the H100, we achieved a 24% boost.

Lightbits v2.0 submission results:

Lightbits MLPerf v2.0 submission results:

ModelAccelerator Type# of AcceleratorsThroughput (GiB/s)
CosmoFlowA1004516.64
H1003921.84
Resnet-50A10043239.32
H10024041.79
3D U-NetA1003041.24
H1001541.26

 

Normalizing Performance Data

Reviewing the submission results, it’s obvious that performance can be viewed from multiple perspectives. To understand how to normalize the results data, you must first understand that different storage solutions can rely on different storage hardware, and larger numbers of legacy specialized storage servers may be needed, which can be costly to keep pace with the demanding I/O of the AI workloads. Our approach at Lightbits completely differs; we prove that a minimal configuration, using software running on standard commodity servers, can not only compete but also excel. This explains a key principle of efficiency: achieving good results by optimizing the software, rather than using or scaling a lot of hardware, can offer a much more cost-effective path to high-performance AI storage.

For example, Lightbits achieved the best throughput with the CosmoFlow model out of every other vendor in its storage Fabric-Attached Block (Remote-Block Storage) category, and better results than other solutions.

Some CosmoFlow results comparison:

Storage System TypeAccelerator TypeThroughput per Accelerator (GiB/s)Accelerator per Storage Node Throughput (MiB/s)
LightbitsFabric-Attached blockA100379191
H100573126
Vendor 1Fabric-Attached blockH100532177
Vendor 2 Shared-File (dedicated appliance)H10053734
Vendor 3Shared-File (dedicated appliance)H100532N/A
Vendor 4CloudA100353N/A
H100533N/A
** N/A in the table means the “storage node” data is irrelevant or unknown

Some Resnet50 results comparison:

Storage System TypeAccelerator TypeThroughput per Accelerator (GiB/s)Accelerator per Storage Node Throughput (MiB/s)
LightbitsFabric-Attached blockA1009331
H10017859
Vendor 1Shared-File (dedicated appliance)A100911.33
H10018211.36
Vendor 2Shared-File (dedicated appliances)H1001861.45
Vendor 3Shared-File (dedicated appliances)H10017658

Some 3D-Unet results comparison:

Storage System TypeAccelerator TypeThroughput per Accelerator (GiB/s)Accelerator per Storage Node Throughput (MiB/s)
LightbitsFabric-Attached blockA10041.23467
H10041.26939
Vendor 1Shared-File (dedicated appliance)A100455462
H100470910
Vendor 2Shared-File (dedicated appliance)H10099941
The results data suggest that Lightbits achieves highly competitive “accelerator per storage node throughput”, often matching and outperforming other Fabric-Attached Block and other storage systems as well. This efficiency, combined with the use of commodity servers instead of specialized appliances, demonstrates that Lightbits can deliver top-tier performance in a more cost-effective and scalable manner.

Lightbits with Micron Advanced Storage and Memory Powers AI Clouds

With deep engineering collaboration, Micron and Lightbits are working together to enable integrated, optimized NVMe storage solutions. AI demands the high-performance and consistent low latency that NVMe is known for. Lightbits creates a single pool of storage built upon industry-standard servers hosting advanced Micron NVMe SSDs and memory to accelerate your AI workloads while lowering infrastructure costs and improving efficiency.

“Lightbits on Micron SSDs blew everything else out of the water, with consistently higher throughput, IOPS, and lower latency across the board. We were impressed with the performance and consistency of Lightbits.”
Mike McDonald, Director of Product Management, Crusoe

One powerful AI Cloud that rises above the hyperscalers is Crusoe. Crusoe implemented Lightbits software-defined storage with Micron 7000 series SSDs for their reliability, low latency, and high density, suited for the demanding AI/ML applications that run on Crusoe Cloud. The Micron 7000 series has a low, average, active power consumption of only 17 watts for more than 15TB of storage, which helps Crusoe Cloud save energy and reduce power costs. Crusoe performed extensive performance tests across several block storage options, including against Ceph. “Lightbits on Micron SSDs blew everything else out of the water, with consistently higher throughput, IOPS, and lower latency across the board. We were impressed with the performance and consistency of Lightbits,” said Mike McDonald, director of product management at Crusoe. Lightbits demonstrated up to a 4x performance advantage in terms of bandwidth. Lightbits also scales IOPS with increased load while maintaining low latencies. It outperformed the competition by consistently maintaining latencies under 0.5 milliseconds (ms) compared to the competition, which exceeded 2.5ms under random access. They were also pleased that Lightbits would meet their ambitious high availability and data protection goals through its fast, efficient snapshotting technology across multiple availability zones.

To learn more about how Crusoe built their AI Cloud with Lightbits and Micron technologies, read the case study.

About Lightbits

Lightbits’ software-defined storage solution is built on a foundation of disaggregation and efficiency, giving you a flexible architecture that supports all kinds of AI workloads. By separating storage from the compute layer, we eliminate bottlenecks and unleash the full potential of NVMe drives, delivering the low latency and high throughput you need for demanding AI training. This architecture also ensures robust redundancy and availability, giving you a level of trust and control that’s often hard to find with other storage solutions like direct-attached storage or public cloud platforms that have their own access and availability challenges. You can start with a cost-effective, minimal setup and seamlessly scale performance by adding commodity hardware when you need it, ensuring a storage solution without vendor lock-in or the high costs associated with proprietary appliances and their requirements.

The full MLPerf Storage benchmark v2.0 results are available here: https://mlcommons.org/benchmarks/storage

About the Writer: