Skip to content

Naveen Chakrapani

Spinning up cost effective clusters for training sessions

We have been running a number of internal and external (with partners/customers) enablement sessions over the last few weeks to provide "hands-on, labs based training" on some recently introduced capabilities in the Rafay Kubernetes Operations Platform.

Here's what we setup for those enablement sessions:

  • Each attendee was provided with their own Kubernetes cluster
  • We spun up ~25 "ephemeral" Kubernetes clusters on Digital Ocean (for life of the session)
  • We needed the clusters to be provisioned in just a few minutes for the training exercise
  • Each attendee had their own dedicated "Project" in the Rafay Org

A question that we frequently got asked after those enablement sessions was "I would love to run similar sessions with my extended team, how much did it cost to run those clusters?".

Our total spend for ~25 ephemeral clusters on Digital Ocean for these enablement sessions was less than $15. It was no wonder there has been so much interest in this.

We decided that it would help everyone if we shared the automation scripts and the methodology we have been using to provision Digital Ocean clusters and to import them to Rafay's platform here.

Digital Ocean

Multi-tenancy: Best practices for shared Kubernetes clusters

Some of the key questions that platform teams have to think about very early on in their K8s journey are:

  • How many clusters should I have? What is the right number for my organization?
  • Should I set up dedicated or shared clusters for my application teams?
  • What are the governance controls that need to be in place?

The model that customers are increasingly adopting is to standardize on shared clusters as the default and create a dedicated cluster only when certain considerations are met.

graph LR
  A[Request for compute from Application teams] --> B[Evaluate against list of considerations] --> C[Dedicated or shared clusters];

A few example scenarios for which Platform teams often end up setting up dedicated clusters are:

  • Application has low latency requirements (target SLA/SLO is significantly different from others)
  • Application has specific requirements that are unique to it (e.g. GPU worker nodes, CNI plugin)
  • Based on Type of environment - ‘Prod’ has a dedicated clusters and 'Dev', 'Test' environments have shared clusters

With shared clusters (which is the most cost efficient and therefore the default model in most customer environments), there are certain challenges that platform teams have to solve for around security and operational efficiencies.