Data scientists are currently in high demand as more and more companies uses machine learning to support their businesses. Most of them are more scientists or mathematicians than engineers, and handling infrastructure is usually outside of their comfort zone. The business challenge here was to make their work as painless as possible.
This goal needed to be addressed, taking into account also another challenge. The need for computing power varies greatly between different machine learning projects, and even within a single project, the requirements for the number of GPUs can rapidly change.
We decided to use Kubernetes as a layer of abstraction that separates data scientists from the low-level infrastructure problems. Moreover, Neptune delivers a tailor-made autoscaling solution that is faster than those available off the shelf. It uses Kubernetes to smoothly handle the fluctuating amount of resources. As a result, users are presented with a cost-effective platform that fits their needs.
done in the
of GPU virtual machines used
Neptune runs on Kubernetes and uses Helm templates to reduce the time needed to run new machines and start an experiment. The underlying Kubernetes cluster smoothes the process of establishing and closing experiment containers. At the same time, by leveraging MooseFS (a distributed filesystem), Neptune ensures that all containers share access to the training dataset, making additional storage for every machine unnecessary. Finally, Kubernetes makes Neptune infrastructure-agnostic, so it can be established in either a private or public cloud. Neptune can be run on a laptop, using cloud resources, or on a bare-metal infrastructure.