Supercharge MongoDB on AWS with Lightbits and Openshift!

Introduction:

In the world of AI, where every data point over the web is searched very easily (and fast), it is sometimes good to go back to the basics and run a good ol’ fashion performance testing.

I’ve been playing with the idea of seeing how performant, from a storage-to-transactions-per-second point of view, can a NoSQL database operate, and I figured there’s no other database best for the task than MongoDB. I’ve paired my MongoDB databases with two storage options, the Lightbits SAN-in-the-cloud offering and the EBS io2 Block Express option.

A few words on MongoDB. It is one of the most prominent NoSQL databases, a BSON-based database that stores JSON in a binary format. It’s used globally for all industry verticals. MongoDB runs on a wide variety of platforms both on and off premise and can utilize several storage options.

One of the most widely used platforms to run MongoDB is Kubernetes, mainly because Kubernetes enables flexible scalability, high availability, resources optimization and simplified deployment of MongoDB databases. To use a Lightbits volume in Kubernetes, all you need is to install the Lightbits CSI. The drivers are available in static format, Helm Chart and also via a Kubernetes operator.

In this blog, I will present the results of my testing and why choosing the right storage for your MongoDB databases is critical for both transactions performance (Lightbits outperforms EBS by an average of 30% for various workloads) as well for significantly reducing the cost of running databases in the cloud (EBS is about 2.5x more expensive than Lightbits).

Test Setup:

Kubernetes: Since Kubernetes is one of the most common platforms to deploy, manage and run MongoDB we chose the recently released Red Hat’s Openshift Container Platform (OCP) version 4.13 — it is the only OCP version that has an EBS CSI driver that supports EBS io2 Block Express.

MongoDB: The MongoDB containers were based on the “latest” tag, (which at the time contributed to 6.0.4). Each container had a request/limit of 12 “cpu” and 2Gi memory (I explain more on the container resource settings later).

Benchmark: For the purpose of stressing all the MongoDB instances and the storage, I used the YCSB (Yahoo! Cloud Service Benchmark). While written a few years back, it is still one of the best benchmarking tools for creating collections in MongoDB and then measuring performance via controlling how much data we read vs how much data we update.

Performance information gathering: To run, monitor and collect all the required information  I used the sherlock database performance tool.

Storage: This blog compares and evaluates two storage options in AWS — EBS io2 Block Express and the Lightbits Cloud Block Storage offering via the AWS Marketplace.

Let’s delve into some important aspects of storage used for MongoDB:

  1. The EBS io2 Block Express is the fastest cloud-native block storage solution available in AWS. Using this type of storage you not only pay for the capacity but also have to decide upfront on the (provisioned) IOPS for each volume and pay a monthly fee per IO. While io2.bx has a limit of 1000 IOPS per GiB when creating a volume from the AWS console or via the AWS cli/API, the EBS CSI implementation has a limit of 500 IOPS per GiB, meaning for example, that to get to 125,000 IOPS using the EBS CSI you need to deploy a 300GiB volume — hence this size was used for all testing.
  2. Lightbits has no limits on IOPS per volume – you can impose limits via QoS. This implies that if your application requires a further increase in IOPS over time, there is no need to make any adjustments from the storage perspective. Moreover, you won’t incur any additional costs for higher IOPS, unlike with EBS.
  3. For the testing, I used Lighbits version 3.1.1 from the AWS marketplace. I deployed the minimal configuration of 3 x i4i.16xlarge instances as the building blocks for the Lightbits cluster — you can also create a Lightbits cluster from various i3en instances and different i4i instances, going from single NVMe devices per instance to 8 per instance.
  4. For the Kubernetes worker (compute) nodes I used the R6in.8x instance which supports io2.bx. Other EC2 instances are restricted to only use the slower io2 EBS volume type (as opposed to io2.bx). Lightbits imposes no such constraints and any EC2 instance can use Lightbits NVMe/TCP volumes with only the network bandwidth as the throughput limit.

 

Test Setup Diagrams:

Lightbits storage:

EBS io2 Block Express:

(Note: The diagram excludes the Openshift Master nodes as they are not directly involved in storage access operations).

Running the Benchmarks:

As the diagram above shows, I ran in parallel a total of  6 MongoDB pods, 2 pods per worker node. For every MongoDB pod, a corresponding YCSB pod was deployed on the same node where the MongoDB pod is running — eliminating any out-of-node network traffic between the “client”/YCSB and the database. The Sherlock framework also deploys a tiny pod on each worker to collect statistics.

Each MongoDB was populated using the “load” function in YCSB with a record count of 105 Million documents, which resulted in a database size of approximately 250GB (on the file system). This size allocation ensures sufficient space on the persistent volume claim (PVC) to accommodate data updates during the testing phase, With a limit of 2Gi memory on each MongoDB pod, most of the IO operations bypass the cache layers and hit the storage. The distribution used was uniform to make the read probability of each row as equal as possible.

We used 90 threads per YCSB run when connecting to each MongoDB pod. Each run was 30 minutes in time.

We used 4 types of YCSB runs, alternating the values of the “readproportion” and “updateproportion” variables to achieve a read-only run, update-only run, 70% reads 30% updates and 50% reads 50% updates.

Each run type was executed 4 times.

Performance Test Results:

YCSB Throughput(Ops/Sec):

The results shown in this graph are averages from all the pods running in parallel.

As you can see, the pods using the Lightbits storage were approximately 30% faster per run, demonstrating Lightbits storage is more efficient than io2.bx

YCSB Number of Operations:

The table above shows the average of each run for each type of storage. Since these numbers basically derive the throughput results from the previous table, we see the same behavior of Lightbits storage providing 30% more performance, on average, than EBS io2 Block Express volumes.

A couple of notes about the tests:

  1. MongoDB was limited with the performance it could provide in the sense that increasing cpu per pod – or cores per database – and also increasing threads didn’t provide more performance.
  2. We’ve used 125,000 provisioned IOPS for the io2 Block Express volumes since increasing the provisioned IOPS did not result in performance gain.
  3. During the load phase of YCSB – when the YCSB pod generates random data and inserts it into the MongoDB collection – we notice an inconsistent behavior when io2 Block Express volumes were used. The average load time – the load portion is also done in parallel on all 6 pods/databases – was ranging from 45 to 120 minutes. When loading the data using Lightbits volumes the load time was consistent on all pods and on all runs at around 30 minutes. This behavior is most likely due to EBS io2 Block Express design where reaching a volume maximum performance can sometimes take up to 48 hours. No such behavior exists in Lightbits.

Storage Cost Comparison:

TCO is a major factor when making a decision on what storage to use for your MongoDB databases, so let’s go over some calculations of cost.

The calculations are based on using 6 databases, each with 300GiB volume, running 24 hours for 30 days (basically a monthly cost). I have not included the pricing for Openshift support from Red Hat, since it will be the same price whether you used Lighbits or EBS.

Using us-east-2 as a sample region, the list price for running a Lightbits cluster that consists of 3 x i4i.16xlarge instances for one hour is roughly $18, or $13,160 for one month.

Using EBS io2 Block Express with 125,000 provisioned IOPS will cost you $33,250, so roughly a 2.5x higher cost when using EBS io2 Block Express.

Since the compute pricing for both options will be the same – same number of workers nodes, same number of master nodes, same type of instances for both – if we add the compute price and look at cost per database per month, using the Lightbits storage one MongoDB database will cost around $3200 vs $5550 using EBS io2 Block Express. That’s a savings of $2,350 per month or $28,200 annually per MongoDB database.

It is important to note that the size of the Lightbits cluster referenced in this blog would still have plenty of capacity to grow and can also provide storage to other applications running on your Openshift cluster or outside the Openshift cluster.

Other factors to consider are that when using EBS, snapshots cost money. You pay for the action of taking a snapshot (copying the data to S3) and for the capacity used. In Lightbits snapshots and clones are part of the license and are also instant (as in, no data copy is required). Lightbits allows you to create as many snapshots as you need without any extra fee.

In EBS, restoring from a snapshot can take a long time because EBS snapshots are stored in S3. In Lightbits the action of restoring or cloning a snapshot is handled at the same performance rate as a normal volume.

Conclusion:

In the realm of MongoDB storage performance on AWS (and other NoSQL databases), Lightbits emerges as the clear winner, surpassing io2 Block Express in multiple aspects. With 30% better performance, substantial cost savings, and faster restore times, Lightbits proves to be the optimal choice for MongoDB workloads.

By leveraging Lightbits’ superior IOPS performance, cost savings, and faster restore times, businesses can enhance application responsiveness, optimize their budgets, and achieve efficient data management. Evaluate your specific workload requirements, considering factors such as cost, restore times, throughput, IOPS, and scalability, to make an informed decision.

Lightbits’ exceptional performance and cost-effectiveness make it the ideal solution for high-performance and budget-conscious MongoDB deployments on AWS. Ensure optimal MongoDB storage performance and data management with Lightbits, and elevate the success of your applications. The Lightbits solution is also available on other clouds such as Azure and of course on private on-premise clouds.

 

Additional resources:

About the Writer: