Would you like some CPU to go along with your SSDs?
Ordering a combo meal from your favorite burger joint isn’t all that different from deploying a server with SSDs in the data center. Each server comes with CPUs, DRAM, and SSDs.
However, with servers, your applications may not have an appetite for all of these other components. A more likely scenario is where at least one of these resources mostly sits idle. In deployments with multiple SSD based applications, you are leaving money on the table—or leaving food on the plate, if we extend the combo meal metaphor—in the form of unused CPU, DRAM, or SSDs.
These underutilized CPUs, DRAM, and SSD resources are difficult to repurpose and become stranded resources, resulting in lost capital and operating expenditures. These purchased (or financed) resources consume power, real estate, and require cooling yet they don’t provide any useful benefit to the application. Reducing or eliminating stranded resources represents significant cost savings for your enterprise.
The major challenge has always been how to deploy CPU, DRAM, and flash resources in just the right quantities. Infrastructure architects employ different techniques to minimize the waste of these underutilized resources. For instance, some architects use a large number of system configurations with each configuration matching a specific application. Deploying a large number of system configurations, however, comes with significant management and operational overhead, which doesn’t align with the operational efficiency of hyperscale data centers.
Another technique is to share storage using distributed storage software. But this method can result in a performance penalty when compared with direct attached storage. Many enterprise storage solutions offer high performance, but their cost is prohibitive for scale-out infrastructure.
It is essential to understand why architects go to great lengths to share resources with neighboring servers. This technique is known as disaggregation. Let us specifically focus on SSD disaggregation since technologies to effectively disaggregate SSDs exist today.
For example, let’s use a hypothetical 24 SSD server. The following table shows the bill of materials (BOM) and costs for this server using rough October 2018 market pricing. For simplicity, this analysis does not include the overhead cost of the data center, power, rack, network, and capital costs.
Hypothetical 24 SSD server | |||
Component | Qty | Unit cost | Total Cost |
CPU |
2 |
$1,500 |
$3,000 |
DRAM (GB) |
256 |
$8 |
$2,048 |
SSD (960 GB) |
24 |
$330 |
$7,920 |
Balance of server BOM |
1 |
$1,500 |
$1,500 |
Total |
$14,468 |
This example assumes that each new cluster is built to support five different SSD-based applications. Each application has different CPU, DRAM, and SSD requirements, and this hypothetical server meets all of these requirements. In other words, the server configuration is the lowest common denominator for a server that meets all the minimum requirements.
In my experience, large server deployments always experience low utilization rates. Typically, the storage utilization ratios are between 30–40%. I have measured rates in the low single digits for CPU, DRAM, and SSDs. In some more efficient implementations, I have measured utilization rates of CPU resources close to 30% and storage resources of 50%.
Building a cluster with 20PB of usable SSD space with direct attached SSDs, assuming a generous 50% SSD utilization rate, would require 40PB of SSDs, deployed across 1738 servers costing $25,145,384.
By using a disaggregated SSD design, building a cluster with 20 PB of usable SSD space to serve five different applications with 100% SSD utilization efficiency requires 869 servers costing $12,572,692. Half the cost! If the direct-attached SSD storage utilization is even lower than 50%, the cost savings could be even higher.
This simple cost analysis illustrates why minimizing stranded resources is a big lever that can alter infrastructure costs. Reclaiming stranded resources through disaggregation can help architects size infrastructure strictly to actual application demand and reduce infrastructure costs.
Disaggregating resources by letting applications “share their combo meals” results in less application starvation, underutilization, infrastructure, lower total cost of ownership, and happier end users.