NVMe/TCP Storage PoC that Resulted in Great Things

Authors:
Mathijs Dubbe, Consulting Engineer, Lightbits
Falk Rösing, Independent Technology Consultant and Engineer

At Lightbits, we love working with our customers, trying out new things, and pushing the boundaries of what hardware and software can do for them. Sometimes we’re faced with challenges that our customers face during their day-to-day, and in some situations, our customers can teach us how to get the most out of a certain environment. In such a situation, you have experts in their respective fields on both sides of the table and you can do great things.

Falk Rösing, an Independent Technology Consultant and Engineer from the Osnabrück area in Germany, was looking for a high-performance storage environment for several customers. Investments in new IT equipment have been put off by the Covid pandemic and by the resulting shortages and high prices on the market. Also, some of these customers were reaching the limits of what their aging hardware could provide. Falk needed to come up with something new, simple, and easy to implement and administer. Something that wouldn’t break the bank, and that would seamlessly integrate into their existing TCP/IP Network. His research led him to NVMe-over-TCP (NVMe/TCP) technology and the people who invented it, Lightbits.

“My first impression of Falk was that he was very well informed about our technology. He surprised me by talking in such details, that we normally expect from one of our own engineers.” – Mathijs Dubbe

“I am always looking for new and better solutions for my customers. I became aware of Lightbits Labs through VMware HCL and gathered all the available information on the Internet.

Since I want to understand a solution in detail in order to evaluate it correctly, I met with Mathijs who was able to explain my open questions. I was already convinced of the design of the solution and so it was time to go further into detail with a real test.” – Falk Rösing

One thing led to another, and only after a few meetings discussing our technology, we agreed that there should be a Proof of Concept (PoC). With the help of PrimeLine Solutions, Falk’s preferred hardware supplier, some standard components were acquired and set up.

Most of Falk’s customers use VMware, so any testing that needed to be done would revolve around that. “Day to day management of the storage system should be easy, and it has to be reliable, and able to fit customers that have at least two server rooms,” he said.

At Lightbits, proof of concepts generally contains the following elements; Performance, Resiliency, and application integration. Because of the VMware requirement, all tests were therefore oriented to provide the best solution for that virtual workload, so even performance testing took place from inside both Linux and Windows VMs.

“Falks VMware setup, his settings, and the way he ran the PoC was one of the quickest, most efficient, and well-configured installations I’ve seen. He even managed to outperform previously run, similar tests on virtual Windows environments”  – Asaf Matan, Solution Engineering Manager.

During the PoC we were able to find some bottlenecks in the hypervisor, but also found ways to work around them by using multiple virtual disks on multiple ‘paravirtual controllers’ while making sure cores came out of the same physical CPU with just the right amount of vCPU provisioned to reach optimal performance levels. Some other tweaks were made to VMware and we tested by both directly connecting Lightbits volumes to the VMs and by running from a datastore.

Our tests concluded with reaching almost 1 Million IOps in a single Linux VM, (4K read) with an average latency of 199us at 40% CPU usage. Windows could almost keep up testing the larger 16K read block size, at 700K IOps and average 273us latency.

“The installation was quite easy, especially the very good Ansible playbook with which the cluster is rolled out has shown me that here people work with a lot of attention to detail. That the solution brings a very good performance, was quickly clear, so you had also immediately seen effects when you have made optimizations to the vSphere VMs.

The solution is also robust, of course, and the use of Optane memory as a cache turned out to be a very good fundamental decision. In some failure scenarios, a persistent cache makes recovery much easier with guaranteed data integrity.

My conclusion is that this solution is very easy to implement, especially using NVMe over TCP, and can deliver very good performance and robustness for the many vSphere environments. ” – Falk Rösing

Looking back at the PoC, both parties were able to learn from the experience and take the results back for further study. We managed to achieve a lot, during a short period of time while crushing previously reached results in virtual Windows environments.

For more information on Falk Rösing consulting services, go to: https://www.roesing-consulting.de/

For more information on PrimeLine Solutions, go to: https://www.primeline-solutions.com/

Related Blogs:

About the Writer: