Maximize your Oracle Performance on AWS While Keeping Your Costs Low − the Lightbits Way

In the last few years, more and more enterprise companies have moved their applications from local data centers (aka on-premise) to the cloud (off-premise). Most of these companies have also committed to new projects and applications being cloud-only (or at least cloud-native).

We’re now in the phase where entities such as banks, health, insurance, and federal are also moving their Tier 1 applications into the cloud − and one of the main Tier 1 storage consumers is Oracle databases.

Now, the storage domain in the cloud is unique to each provider, and sometimes it’s difficult to understand all of the differences and nuances for each option. Additionally, each cloud provider has their own way of calculating storage usage via “GBs used” and “IOs used”, making things even more complex.

While cloud infrastructure provides almost anything needed to run or migrate your on-premise data center to the cloud (including features such as automation, security, instance management, do-it-yourself, and so on), the storage area lacks basic SAN capabilities, such as ease of use, centralized management of storage, cost efficiency, and performance. This is where Lightbits comes into play.

This is the first blog in a series that will focus on running Oracle on AWS using the Lightbits marketplace offering. This first blog is an introduction to running Oracle on AWS using Lightbits and will detail the test environment as well as initial performance results.

In subsequent blogs, I will provide a more technical in-depth look at specific EC2 clients and workloads, performance tuning, cost, resiliency, and other applications usage.

 

Lightbits – Today’s Fastest (Disaggregated) Software-Defined Storage

Let’s start first by explaining Lightbits and our AWS marketplace offering.

Lightbits is a software-defined storage solution (SDS) using NVMe® over TCP (NVMe/TCP). In fact, we authored the NVMe/TCP spec, provided it to the Linux community, and wrote the first NVMe/TCP kernel module. This means that you can run Lightbits on all sorts of hardware platforms (Intel, AMD, and ARM), and all you need is a supported Linux OS, a few NVMe devices on each server (instance), and a TCP/IP network between the servers (storage targets) and consumers (clients or initiators). This also means that you can potentially run Lightbits on any cloud provider. Naturally, as with all other SDS solutions, performance will depend on the quality of the platform, the number of NVMe devices, the network bandwidth, and other environmental factors.

 

Lightbits on AWS Marketplace

Since Lightbits is a software-defined storage solution, it was straightforward to port it to run on AWS. We use CloudFormation to deploy our own AMIs on a set of instances − the i3en or i4i family of EC2 storage instances − which have NVMe devices directly attached to them. We allow users to choose the type of instance (note that the type of instance will impact how many NVMe devices each instance will hold as well as the network bandwidth and number of cores) − and the number of instances in the cluster (3 is the minimum, and 32 is the current maximum).

Together, these options determine the size of the cluster (as in the raw and usable storage capacity), and impact performance capabilities (more NVMe devices and a network bandwidth means more IOPS). We also provide an option to deploy the Lightbits cluster into a new VPC (creating subnets, security groups, and so on) − or an existing VPC (using your own subnets and security groups). We use ASG (Auto Scaling Groups) to protect clusters in the case of an instance failure. When an instance fails, a new one is automatically created. The whole deployment process is just a couple of web forms, and then roughly 10 minutes later you will have a Lightbits cluster up and running in AWS.

For this series of blog posts, I’ve chosen the i4i.metal instance type, so all of the performance numbers mentioned in this blog are based on a set of 3 x i4i.metal instances. As I wrote previously, you can choose to use smaller instances with less capacity (these instances also have a smaller network bandwidth). Because Lightbits is an SDS, you can tune your application demands to the type of Lightbits cluster you want.

 

Client and Oracle Setup

For the clients that will run Oracle, I’ve used three types of instances. On the lower-to-middle end of the spectrum, I’ve chosen the r5n.16xlarge instance. This instance provides 64 cores and 512GB of memory, which will probably be the low-to-mid point for running Oracle anywhere − including in AWS. The benefit of the r5n instances is that they provide massive network bandwidth going all the way up to 100Gbps, while keeping the cost to a minimum. And since Lightbits is using NVMe/TCP as the data transport, high bandwidth impacts the performance that a client (or the number of clients) can achieve.

On the upper end of the spectrum, I’ve used x2idn.32x. The x2idn family contains very large memory instances and is one of the preferred AWS options for large Oracle databases.

Lastly, I’ve briefly tested R6in.24xlarge from the just released R6in EC2 family of instances. This instance provides massive network bandwidth, with less memory and CPU.

For the Oracle layout and setup, I’ve chosen 19c as the version, and each EC2 instance that runs Oracle has a single Oracle database. Since we have three servers/targets in our Lightbits cluster (this is the minimum for a Lightbits cluster), a copy (replica) of each volume will be automatically kept on each server. The Lightbits cluster automatically spreads the copies to balance the IOPS and capacity load across the cluster.

To maximize the read utilization of the Lightbits cluster, I’ve decided to create 16 datafiles (meaning 16 Lightbits volumes). So, when IOPS are running on the database, all three Lightbits instances will be involved. This method is sometimes called “lacing” in the Oracle world (having tablespace or tablespaces spread across many datafiles). There is also another volume created to hold the database system, aux and undo tablespaces, the control files, and redo logs.

The database of course uses large memory pages (hugepages) and has a very small cache (4GB) to deliberately force IOs to the storage for both reads and writes. Our goal here is to test the storage rather than how well the CPU is working.

 

SLOB Setup

I’ve chosen SLOB because for more than 10 years it has been the de facto workload generator used to test storage for Oracle usage. The uniqueness of SLOB is that it is written in a manner to simply test the storage that Oracle uses and not (or less emphasize) on testing the platform Oracle uses (as in the CPU/RAM hardware platform). It is the best method to test how your database will run and behave in real life when your data is larger than the available cache, which is what happens in most cases for large, long-running Oracle databases.

For the SLOB layout or setup, as I mentioned earlier, I’ve created 16 datafiles, each is 75GB, and each assigned to a different tablespace, and used by a separate SLOB user schema − forcing each tablespace/datafile to be filled almost completely with SLOB data (about 98%). Keep in mind that larger volumes/datafiles will not impact the performance on Lightbits, it does take a while for SLOB to generate random data, so I’ve preferred to use smaller datasets.

I have deliberately tried to keep the SLOB layout and configuration simple and consistent for all the test variations (percentage of read/write), and comparison between Lightbits volumes and io2 volumes. The database cache size (db_cache_size) is set to 4G, because again  we’re not trying to test CPU or memory.

 

Performance

In general, the high-end storage in EBS − io1, io2, and io2 Block Express − have very good throughput and latency, and also good enough IOPS limits per volume. However, the instances these volumes are connected to have limits on the bandwidth they can provide and more importantly an IOPS limitation that maxes out at 350,000 IOPS per instance (regardless of how many types of these volumes you attached to it).

From Oracle’s perspective, this means that once your queries are going to the drive, EBS is limited in what it can provide. In my tests, EBS io2 express maxes out at roughly 250k 8k block reads (or write mix), compared to Lightbits which happily sustained 1.4M 8k block reads (or write mix).

In my tests I concentrated on the most common scenarios that are typically used to test Oracle storage, the read-only option (queries that run select only), the 75/25 scenario (75% of the queries are select only and 25% are queries that updates data), and finally, write-only (100% update of the data). The effort here is on the number of IOPS the database can achieve from each storage subsystem.

Below is a table comparing the performance achieved by the three instance types:

Instance TypeUpdate%Physical ReadsRead Bandwidth (MB/s)Physical WritesWrite Bandwidth (MB/s)Total Physical IOPSTotal Bandwidth
R5n.16x09190407188709190477188
X2idn.32x014226181104160142262411041
R6in.2401124470910115011244859101
R5n.16x25571242446314182813867130705849
X2idn.32x25712344556518260017628949447328
R6in.2425648806506916631316028151196671
R5n.16x100268492212026063320105291254129
X2idn.32x100415041324541685632458318976490
R6in.24100204239160620000516014042443207

Note: The “Update%” column refers to SLOB’s UPDATE_PCT configuration variable. This impacts the number of transactions that will be an update query vs. select query, this impacts how many IOs are read vs. write directly to the storage (if you recall, I’ve used a 4GB cache for 1.2TB data size, so hardly anything is cached). It is also important to note that for every update (because of the small cache size), Oracle first reads (selects) the data and then updates it.

Some interesting findings that you can see here, first of all with 16 Lightbits volumes, both the X2idn.32x and R6in.24x are topping at roughly 1.4M physical reads, that is 1.4 million 8k block reads from storage running on AWS (if you noticed, on read-only I’ve saturated the 100 Gbps network that this instance uses).

What is also interesting is that the R6in.24x gets to this number with 25% fewer cores and about 40% less cost (of course in real-life scenarios for very large databases you might want the large memory and even the local NVMe devices for cache that the X2idn.32x provides).

Another finding is that the older generation (Cascade Lake Intel CPU) R5n.16x with less CPUs (64 cores) is not far behind the more modern Ice Lake Intel platform that the X2idn and R6in instances have, and with about a ¼ and ⅓ of the prices, respectively.

I’ve done another short test to reach the maximum IOPS performance that we get from the smallest Lightbits cluster. With four databases (four EC2 instances of R5n.16x), I was able to get to 1.85M 8k block read-only using SLOB. And using 25 updates in SLOB, I’ve reached 1M physical reads IOPS and about 280k physical writes IOPS. To add more IOPS performance to the Lightbits cluster, all you need is to add more Lightbits servers to the cluster. It’s as simple as that.

This also has implications on your Oracle licensing, which is usually tied to the number of cores. It will be cheaper to use the newer EC2 instances and get more bang for your buck (not to mention you’ll have to split the data between several instances or move to use Oracle RAC, which are both viable options, but more expensive from Oracle licensing perspective).

 

Summary

In this first blog, I’ve concentrated on the performance and differences between several EC2 instances and Lightbits when using Oracle on AWS.

What I’ve shown is the performance capabilities of the minimal Lightbits cluster using x4i.metal EC2 instances, the IOPS difference for different workloads using SLOB.

Just to compare, the same setup using a single X2idn.32x and io2 block express maxes out at roughly 250k 8k block reads (or write mix), compared to 1.4M 8k block reads with Lightbits.

I’ve only provided a brief summary of the test findings in this blog. Watch for a technical paper coming soon with a much more detailed explanation of the setup, scripts, and comparison graphs.

There are of course plenty of permutations for how to run Oracle on AWS and what EC2 instances to use. Keep an eye out for future blogs that will perform similar comparisons to IO2 Block Express, but with very large EC2 instances like x2idn or the new R6in instances. You will also be able to find comparisons of other performance measurements like latency.

The following are some additional resources that you might find helpful:

 

About the Writer: