OCP! GTC! NVME/TCP! OH MY!
What a crazy few weeks! First, Lightbits™ announced the First Production NVMe/TCP Solution (March 12) while demonstrating at the Open Compute Global Summit (March 14 & 15) how LightOS® beautifully integrates with containerized Cassandra—orchestrated by Kubernetes (say that three times fast). The next day we headed to the NVIDIA GPU Technology Conference (March 18 to 21) to showcase LightOS® with the DGX-1 Deep Learning server. At least both events were at the San Jose Convention Center, so I just slept at a coffee shop in the nearby hotel.
Open Compute Project (OCP) Global Summit 2019
The Open Compute Project (OCP) originally started as an internal Facebook project. It opened Facebook’s data center hardware designs to the public to drive down costs in data centers and make them more efficient, and open and encourage broad industry cooperation and co-development— similar to open-source software.
Attending this summit connects Lightbits with key partners leading the cloud revolution— over 3,600 engineers, architects, and some suits sprinkled in. I highly recommend reviewing this great session video from Facebook showing how the NVMe/TCP technology that Lightbits pioneered and brought to market blows iSCSI out of the water and even performs better than local NVMe DAS!
At the Micron booth, we showcased the LightOS Software Defined Storage solution managing a pool of NVMe SSDs and serving volumes over NVMe/TCP to containerized Cassandra pods in a Kubernetes cluster. In this video of the demonstration, you can see how LightOS enables a Kubernetes fast migration of a Cassandra container from one node to another using persistent volumes presented to Kubernetes through the Lightbits Container Storage Interface (CSI) plugin.
Then we took a deep breath and set up our Lightbits booth for GTC!
NVIDIA GPU Technology Conference 2019
I think my fellow Solutions Engineer Asaf Matan said it best, “It’s great to get paid to be part of the latest technologies.”
This conference was mind-blowing. The areas of interest on display all around us and discussed in over 600 sessions included: AI/deep learning, self-driving cars, intelligent machines (IOT), high-performance computing, and data center cloud.
Here you can see Lightbits busily answering questions about our powerful but simple approach to helping deep learning scale. With our solution, you can easily scale up the DGX cluster and let Lightbits take care of the high-performance, low latency storage.
Our demo showed a DGX-1 utilizing NVMe/TCP connected storage to keep all eight GPUs in the DGX fully utilized. On the DGX was a workload driven by TensorFlow Machine Learning software executing a RESNET-50 Neural Network AI training program.
It was an eye-opening experience for our visitors to see a solution to the problems they are encountering with managing ever-growing data sets needed for complex machine learning tasks. As the workload crunched through the data, our demo showed the real-time statistics for the achievable performance with storage directly installed in the deep learning server. Same performance as DAS, but now you can scale more and more DGX servers to a single LightOS NVMe/TCP connected target. No changes required to the DGX or the underlying network.
Very near our booth was this “Agro Bot.” A full size AI enabled tractor. It had little tiny precision pincers used to pick strawberries. So it only picks a strawberry and not a clump of dirt, these robots can use high-performance, low latency NVMe/TCP based storage as they stream millions of images of strawberries into their machine learning and training algorithms.