Our v2.0 Performance Improvement
- ResNet-50: We made substantial gains, with a 51% improvement for the A100 accelerators and a 16% improvement for the H100.
- 3D U-Net: We delivered a 34% increase in performance with the A100 accelerators and a 25% increase with the H100.
- CosmoFlow: Our performance with the A100 accelerators saw a 7% increase, and with the H100, we achieved a 24% boost.
Lightbits v2.0 submission results:
Lightbits MLPerf v2.0 submission results:
Model | Accelerator Type | # of Accelerators | Throughput (GiB/s) |
---|---|---|---|
CosmoFlow | A100 | 45 | 16.64 |
H100 | 39 | 21.84 | |
Resnet-50 | A100 | 432 | 39.32 |
H100 | 240 | 41.79 | |
3D U-Net | A100 | 30 | 41.24 |
H100 | 15 | 41.26 |
Normalizing Performance Data
Reviewing the submission results, it’s obvious that performance can be viewed from multiple perspectives. To understand how to normalize the results data, you must first understand that different storage solutions can rely on different storage hardware, and larger numbers of legacy specialized storage servers may be needed, which can be costly to keep pace with the demanding I/O of the AI workloads. Our approach at Lightbits completely differs; we prove that a minimal configuration, using software running on standard commodity servers, can not only compete but also excel. This explains a key principle of efficiency: achieving good results by optimizing the software, rather than using or scaling a lot of hardware, can offer a much more cost-effective path to high-performance AI storage.
For example, Lightbits achieved the best throughput with the CosmoFlow model out of every other vendor in its storage Fabric-Attached Block (Remote-Block Storage) category, and better results than other solutions.
Some CosmoFlow results comparison:
Storage System Type | Accelerator Type | Throughput per Accelerator (GiB/s) | Accelerator per Storage Node Throughput (MiB/s) | |
---|---|---|---|---|
Lightbits | Fabric-Attached block | A100 | 379 | 191 |
H100 | 573 | 126 | ||
Vendor 1 | Fabric-Attached block | H100 | 532 | 177 |
Vendor 2 | Shared-File (dedicated appliance) | H100 | 537 | 34 |
Vendor 3 | Shared-File (dedicated appliance) | H100 | 532 | N/A |
Vendor 4 | Cloud | A100 | 353 | N/A |
H100 | 533 | N/A |
Some Resnet50 results comparison:
Storage System Type | Accelerator Type | Throughput per Accelerator (GiB/s) | Accelerator per Storage Node Throughput (MiB/s) | |
---|---|---|---|---|
Lightbits | Fabric-Attached block | A100 | 93 | 31 |
H100 | 178 | 59 | ||
Vendor 1 | Shared-File (dedicated appliance) | A100 | 91 | 1.33 |
H100 | 182 | 11.36 | ||
Vendor 2 | Shared-File (dedicated appliances) | H100 | 186 | 1.45 |
Vendor 3 | Shared-File (dedicated appliances) | H100 | 176 | 58 |
Some 3D-Unet results comparison:
Storage System Type | Accelerator Type | Throughput per Accelerator (GiB/s) | Accelerator per Storage Node Throughput (MiB/s) | |
---|---|---|---|---|
Lightbits | Fabric-Attached block | A100 | 41.23 | 467 |
H100 | 41.26 | 939 | ||
Vendor 1 | Shared-File (dedicated appliance) | A100 | 455 | 462 |
H100 | 470 | 910 | ||
Vendor 2 | Shared-File (dedicated appliance) | H100 | 99 | 941 |
Lightbits with Micron Advanced Storage and Memory Powers AI Clouds
With deep engineering collaboration, Micron and Lightbits are working together to enable integrated, optimized NVMe storage solutions. AI demands the high-performance and consistent low latency that NVMe is known for. Lightbits creates a single pool of storage built upon industry-standard servers hosting advanced Micron NVMe SSDs and memory to accelerate your AI workloads while lowering infrastructure costs and improving efficiency.
“Lightbits on Micron SSDs blew everything else out of the water, with consistently higher throughput, IOPS, and lower latency across the board. We were impressed with the performance and consistency of Lightbits.”
Mike McDonald, Director of Product Management, Crusoe
One powerful AI Cloud that rises above the hyperscalers is Crusoe. Crusoe implemented Lightbits software-defined storage with Micron 7000 series SSDs for their reliability, low latency, and high density, suited for the demanding AI/ML applications that run on Crusoe Cloud. The Micron 7000 series has a low, average, active power consumption of only 17 watts for more than 15TB of storage, which helps Crusoe Cloud save energy and reduce power costs. Crusoe performed extensive performance tests across several block storage options, including against Ceph. “Lightbits on Micron SSDs blew everything else out of the water, with consistently higher throughput, IOPS, and lower latency across the board. We were impressed with the performance and consistency of Lightbits,” said Mike McDonald, director of product management at Crusoe. Lightbits demonstrated up to a 4x performance advantage in terms of bandwidth. Lightbits also scales IOPS with increased load while maintaining low latencies. It outperformed the competition by consistently maintaining latencies under 0.5 milliseconds (ms) compared to the competition, which exceeded 2.5ms under random access. They were also pleased that Lightbits would meet their ambitious high availability and data protection goals through its fast, efficient snapshotting technology across multiple availability zones.
To learn more about how Crusoe built their AI Cloud with Lightbits and Micron technologies, read the case study.
About Lightbits
Lightbits’ software-defined storage solution is built on a foundation of disaggregation and efficiency, giving you a flexible architecture that supports all kinds of AI workloads. By separating storage from the compute layer, we eliminate bottlenecks and unleash the full potential of NVMe drives, delivering the low latency and high throughput you need for demanding AI training. This architecture also ensures robust redundancy and availability, giving you a level of trust and control that’s often hard to find with other storage solutions like direct-attached storage or public cloud platforms that have their own access and availability challenges. You can start with a cost-effective, minimal setup and seamlessly scale performance by adding commodity hardware when you need it, ensuring a storage solution without vendor lock-in or the high costs associated with proprietary appliances and their requirements.
The full MLPerf Storage benchmark v2.0 results are available here: https://mlcommons.org/benchmarks/storage