Skip to content

MKS on AWS

For users that have a need for declarative cluster specs, Rafay also provides the means to do this via turnkey scripts.

You can create and manage "cluster specs" to provision Rafay MKS Clusters on demand using the Rafay Controller. An illustrative workflow is shown below.

Illustrative Workflow


Cluster Specification

Use the cluster specs below to auto provision Rafay MKS clusters on AWS EC2 in minutes.

The Rafay Controller will automatically provision the required infrastructure on AWS EC2 for the Kubernetes cluster. Once the cluster is provisioned, you can operate and manage the cluster using the Rafay Console.

# Unique name of cluster in Rafay Controller 
cluster_name: test-dev

# Rafay MKS Cluster in AWS EC2 environments 
cluster_type: aws-ec2

# AWS Region 
cluster_provider_region: us-west-1

# Name of existing Cloud Credentials in Rafay Controller. 
cluster_provider_credentials: aws-dev

# ec2 Instance type for master and worker nodes 
cluster_instance_type: t3.2xlarge

# Multi Master/HA Cluster 
cluster_multi_master: false

# GPU support 
cluster_gpu_provision: false

# Name of cluster bluprint configured in Rafay Controller 
cluster_blueprint: default

# Name of project where the cluster will be managed in the Rafay Controller
project: defaultproject

Automation Scripts

Rafay has developed high level, easy to use "bash scripts" that you can use for both provisioning and deprovisioning. You can embed these scripts "as is" into an existing CI system such as Jenkins for build out automation. Alternatively, users can modify the provided code for the scripts to suit their requirements.


Requirements

  • Linux or macOS
  • For macOS, ensure that you have JQ (CLI based JSON Processor) and cURLinstalled.
  • On Linux systems, the script will automatically detect, download and install these components.
  • You have configured a cloud credential in the Rafay Controller so that it can securely provision required infrastructure.

Example Jenkins Pipeline

Have a look at an illustrative example Jenkins pipeline showcasing how you can automate provisioning of a Rafay MKS cluster on Amazon AWS.


Provision Cluster

Use this script to provision a Rafay MKS cluster on AWS. The script expects you to provide a declarative "cluster spec" as input along with credentials to access the Rafay Controller.

The script expects the following items as input

  • Rafay Controller Access Credentials
  • Cluster Specification file

Provision Rafay MKS Cluster

The illustrative screenshot below displays what you should see when you provision a "HA" Rafay MKS Cluster using this approach.

Successful Provisioning

Source code for the script shown below.

#!/bin/bash

SCRIPT='rafay_mks_cluster.sh'

function HELP {
  echo -e "\nUsage: $SCRIPT [-u <Username>] [-p <Password>] [-f <cluster meta file>] "
  echo -e "\t-u  Username to login to the Rafay Console"
  echo -e "\t-p  Password to login to the Rafay Console"
  echo -e "\t-f  cluster meta file"
  echo
  echo -e "\nExample:  $SCRIPT -u test@example.com -p test123 -f cluster_meta.yaml"
  echo
  exit 1
}

NUMARGS=$#
if [ $NUMARGS -eq 0 ]; then
  HELP
fi

while getopts :u:p:f:h FLAG; do
  case $FLAG in
    u)
      USERNAME=$OPTARG
      ;;
    p)
      PASSWORD=$OPTARG
      ;;
    f)
      CLUSTER_META_FILE=$OPTARG
      ;;
    h)  #show help
      HELP
      ;;
    \?) #unrecognized option - show help
      echo -e \\n"Option -$OPTARG not recognized."
      HELP
      ;;
  esac
done


if [[ "$OSTYPE" == "linux-gnu"* ]];
then
    #Install jq if not installed
    JQ_PKG_OK=$(dpkg-query -W --showformat='${Status}\n' jq|grep "install ok installed")
    echo Checking for jq package: $JQ_PKG_OK
    if [ "" == "$JQ_PKG_OK" ]; then
        echo "jq package not installed. Installing jq..."
        sudo apt-get -y install jq
    fi

    #Install curl if not installed
    CURL_PKG_OK=$(dpkg-query -W --showformat='${Status}\n' curl|grep "install ok installed")
    echo Checking for curl package: $CURL_PKG_OK
    if [ "" == "$CURL_PKG_OK" ]; then
        echo "curl package not installed. Installing curl..."
        sudo apt-get -y install curl
    fi
elif [[ "$OSTYPE" == "darwin"* ]];
then
    if !(which jq > /dev/null 2>&1);
    then
        echo "jq installation not found, please install jq !! Exiting" && exit -1
    fi
    if !(which curl > /dev/null 2>&1);
    then
        echo "curl installation not found, please install curl !! Exiting" && exit -1
    fi
fi


if which python > /dev/null 2>&1;
then
   python=python
elif which python3 > /dev/null 2>&1;
then
   python=python3
else
   echo "python installation not found, please install python or python3 !! Exiting" && exit -1
fi

USERDATA='{"username":"'"${USERNAME}"'","password":"'"${PASSWORD}"'"}'
OPS_HOST="console.rafay.dev"

CLUSTER_NAME=`cat ${CLUSTER_META_FILE} | $python -c 'import sys, yaml, json; y=yaml.safe_load(sys.stdin.read()); print(json.dumps(y))' | jq '.cluster_name' | tr \" " " | awk '{print $1}' | tr -d "\n"`
CLUSTER_TYPE=`cat ${CLUSTER_META_FILE} | $python -c 'import sys, yaml, json; y=yaml.safe_load(sys.stdin.read()); print(json.dumps(y))' | jq '.cluster_type' | tr \" " " | awk '{print $1}' | tr -d "\n"`
CLUSTER_PROVIDER_REGION=`cat ${CLUSTER_META_FILE} | $python -c 'import sys, yaml, json; y=yaml.safe_load(sys.stdin.read()); print(json.dumps(y))' | jq '.cluster_provider_region' | tr \" " " | awk '{print $1}' | tr -d "\n"`
CLUSTER_PROVIDER_CREDENTIALS=`cat ${CLUSTER_META_FILE} | $python -c 'import sys, yaml, json; y=yaml.safe_load(sys.stdin.read()); print(json.dumps(y))' | jq '.cluster_provider_credentials' | tr \" " " | awk '{print $1}' | tr -d "\n"`
CLUSTER_INSTANCE_TYPE=`cat ${CLUSTER_META_FILE} | $python -c 'import sys, yaml, json; y=yaml.safe_load(sys.stdin.read()); print(json.dumps(y))' | jq '.cluster_instance_type' | tr \" " " | awk '{print $1}' | tr -d "\n"`
CLUSTER_MULTI_MASTER=`cat ${CLUSTER_META_FILE} | $python -c 'import sys, yaml, json; y=yaml.safe_load(sys.stdin.read()); print(json.dumps(y))' | jq '.cluster_multi_master' | tr \" " " | awk '{print $1}' | tr -d "\n"`
CLUSTER_GPU_PROVISION=`cat ${CLUSTER_META_FILE} | $python -c 'import sys, yaml, json; y=yaml.safe_load(sys.stdin.read()); print(json.dumps(y))' | jq '.cluster_gpu_provision' | tr \" " " | awk '{print $1}' | tr -d "\n"`
CLUSTER_BLUEPRINT=`cat ${CLUSTER_META_FILE} | $python -c 'import sys, yaml, json; y=yaml.safe_load(sys.stdin.read()); print(json.dumps(y))' | jq '.cluster_blueprint' | tr \" " " | awk '{print $1}' | tr -d "\n"`
PROJECT=`cat ${CLUSTER_META_FILE} | $python -c 'import sys, yaml, json; y=yaml.safe_load(sys.stdin.read()); print(json.dumps(y))' | jq '.project' | tr \" " " | awk '{print $1}' | tr -d "\n"`


if [ $CLUSTER_TYPE != "aws-ec2" ];
then
    echo "Valid input for cluster_type is "aws-ec2" !! Exiting" && exit -1
fi

case $CLUSTER_MULTI_MASTER in
  (true|false) ;;
  (*) echo "Valid input for cluster_multi_master is "true" or "false" !! Exiting" && exit -1;;
esac

case $CLUSTER_GPU_PROVISION in
  (true|false) ;;
  (*) echo "Valid input for cluster_gpu_provision is "true" or "false" !! Exiting" && exit -1;;
esac


curl -k -vvvv -d ${USERDATA} -H "x-rafay-partner: rx28oml" -H "content-type: application/json;charset=UTF-8"  -X POST  https://${OPS_HOST}/auth/v1/login/ > /tmp/$$_curl 2>&1
csrf_token=`grep -inr "set-cookie: csrftoken" /tmp/$$_curl |  cut -d'=' -f2 | cut -d';' -f1`
rsid=`grep -inr "set-cookie: rsid" /tmp/$$_curl |  cut -d'=' -f2 | cut -d';' -f1`
rm /tmp/$$_curl

LookupProvider () {
  local cluster match="$1"
  shift
  for cluster; do [[ "$cluster" == "$match" ]] && return 0; done
  return 1
}

projects=`curl -k -s -H "x-rafay-partner: rx28oml" -H "content-type: application/json;charset=UTF-8" -H "x-csrftoken: ${csrf_token}" -H "cookie: partnerID=rx28oml; csrftoken=${csrf_token}; rsid=${rsid}" https://${OPS_HOST}/auth/v1/projects/ | jq '.results[]|.name,.id' |cut -d'"' -f2`


PROJECTS_ARRAY=( $projects )
LookupProvider $PROJECT "${PROJECTS_ARRAY[@]}"
[ $? -ne 0 ] && echo -e " !! Could not find the project with the name $PROJECT !! Exiting  " && exit -1

for i in "${!PROJECTS_ARRAY[@]}";
do
    if [ ${PROJECTS_ARRAY[$i]} == "null" ];
    then
        echo "No Projects found !! Exiting" && exit -1
    elif [ ${PROJECTS_ARRAY[$i]} == $PROJECT ];
    then
        PROJECT_ID=${PROJECTS_ARRAY[$(( $i + 1))]}
        break
    fi
done

providers=`curl -k -s -H "x-rafay-partner: rx28oml" -H "content-type: application/json;charset=UTF-8" -H "x-csrftoken: ${csrf_token}" -H "cookie: partnerID=rx28oml; csrftoken=${csrf_token}; rsid=${rsid}" https://${OPS_HOST}/edge/v1/projects/${PROJECT_ID}/providers/ | jq '.results[]|.name,.ID' |cut -d'"' -f2`


PROVIDERS_ARRAY=( $providers )
LookupProvider $CLUSTER_PROVIDER_CREDENTIALS "${PROVIDERS_ARRAY[@]}"
[ $? -ne 0 ] && echo -e " !! Could not find the provider with the name $CLUSTER_PROVIDER_CREDENTIALS !! Exiting  " && exit -1

for i in "${!PROVIDERS_ARRAY[@]}";
do
    if [ ${PROVIDERS_ARRAY[$i]} == "null" ];
    then
        echo "No Cloud Providers found !! Exiting" && exit -1
    elif [ ${PROVIDERS_ARRAY[$i]} == $CLUSTER_PROVIDER_CREDENTIALS ];
    then
        PROVIDER_ID=${PROVIDERS_ARRAY[$(( $i + 1))]}
        break
    fi
done

if [ ${CLUSTER_MULTI_MASTER} == "true" ] && [ $CLUSTER_GPU_PROVISION == "false" ];
then
    cluster_data='{"ha_enabled":true,"gpu_enabled":false,"gpu_vendor":"","cluster_blueprint":"'"${CLUSTER_BLUEPRINT}"'","capacity":[],"edge_provider_params":{"params":"{\"region\":\"aws/'"${CLUSTER_PROVIDER_REGION}"'\",\"instance_type\":\"'"${CLUSTER_INSTANCE_TYPE}"'\"}"},"name":"'"${CLUSTER_NAME}"'","auto_create":true,"cluster_type":"aws-ec2","cloud_provider":"aws","provider_id":"'"${PROVIDER_ID}"'","provider_type":1,"labels":{},"auto_approve_nodes":true,"metro":{"name":"aws/'"${CLUSTER_PROVIDER_REGION}"'"}}'
elif [ ${CLUSTER_MULTI_MASTER} == "false" ] && [ $CLUSTER_GPU_PROVISION == "false" ];
then
    cluster_data='{"ha_enabled":false,"gpu_enabled":false,"gpu_vendor":"","cluster_blueprint":"'"${CLUSTER_BLUEPRINT}"'","capacity":[],"edge_provider_params":{"params":"{\"region\":\"aws/'"${CLUSTER_PROVIDER_REGION}"'\",\"instance_type\":\"'"${CLUSTER_INSTANCE_TYPE}"'\"}"},"name":"'"${CLUSTER_NAME}"'","auto_create":true,"cluster_type":"aws-ec2","cloud_provider":"aws","provider_id":"'"${PROVIDER_ID}"'","provider_type":1,"labels":{},"auto_approve_nodes":true,"metro":{"name":"aws/'"${CLUSTER_PROVIDER_REGION}"'"}}'
elif [ ${CLUSTER_MULTI_MASTER} == "false" ] && [ $CLUSTER_GPU_PROVISION == "true" ];
then
    cluster_data='{"ha_enabled":false,"gpu_enabled":true,"gpu_vendor":"","cluster_blueprint":"'"${CLUSTER_BLUEPRINT}"'","capacity":[],"edge_provider_params":{"params":"{\"region\":\"aws/'"${CLUSTER_PROVIDER_REGION}"'\",\"instance_type\":\"'"${CLUSTER_INSTANCE_TYPE}"'\"}"},"name":"'"${CLUSTER_NAME}"'","auto_create":true,"cluster_type":"aws-ec2","cloud_provider":"aws","provider_id":"'"${PROVIDER_ID}"'","provider_type":1,"labels":{},"auto_approve_nodes":true,"metro":{"name":"aws/'"${CLUSTER_PROVIDER_REGION}"'"}}'
elif [ ${CLUSTER_MULTI_MASTER} == "true" ] && [ $CLUSTER_GPU_PROVISION == "true" ];
then
    cluster_data='{"ha_enabled":true,"gpu_enabled":true,"gpu_vendor":"","cluster_blueprint":"'"${CLUSTER_BLUEPRINT}"'","capacity":[],"edge_provider_params":{"params":"{\"region\":\"aws/'"${CLUSTER_PROVIDER_REGION}"'\",\"instance_type\":\"'"${CLUSTER_INSTANCE_TYPE}"'\"}"},"name":"'"${CLUSTER_NAME}"'","auto_create":true,"cluster_type":"aws-ec2","cloud_provider":"aws","provider_id":"'"${PROVIDER_ID}"'","provider_type":1,"labels":{},"auto_approve_nodes":true,"metro":{"name":"aws/'"${CLUSTER_PROVIDER_REGION}"'"}}'
fi

curl -k -vvvvv -d ${cluster_data} -H "content-type: application/json;charset=UTF-8" -H "referer: https://${OPS_HOST}/" -H "x-rafay-partner: rx28oml" -H "x-csrftoken: ${csrf_token}" -H "cookie: partnerID=rx28oml; csrftoken=${csrf_token}; rsid=${rsid}" https://${OPS_HOST}/edge/v1/projects/${PROJECT_ID}/edges/ -o /tmp/rafay_edge > /tmp/$$_curl 2>&1
grep 'HTTP/2 201'  /tmp/$$_curl > /dev/null 2>&1
[ $? -ne 0 ] && DBG=`cat /tmp/rafay_edge` && echo -e " !! Detected failure adding cluster ${cluster_data} ${DBG}!! Exiting  " && exit -1
rm /tmp/$$_curl
echo "[+] Successfully added cluster ${cluster_data}"
EDGE_ID=`cat /tmp/rafay_edge |jq '.id'|cut -d'"' -f2`
EDGE_STATUS=`cat /tmp/rafay_edge |jq '.status'|cut -d'"' -f2`
CLUSTER_HEALTH=`cat /tmp/rafay_edge |jq '.health'|cut -d'"' -f2`
rm /tmp/rafay_edge
EDGE_STATUS_ITERATIONS=1
CLUSTER_HEALTH_ITERATIONS=1
while [ "$EDGE_STATUS" != "READY" ]
do
  #echo "Iteration-${EDGE_STATUS_ITERATIONS} : Waiting 60 seconds for cluster to be provisioned..."
  sleep 60
  if [ $EDGE_STATUS_ITERATIONS -ge 50 ];
  then
    break
  fi
  EDGE_STATUS_ITERATIONS=$((EDGE_STATUS_ITERATIONS+1))
  EDGE_STATUS=`curl -k -s -H "x-rafay-partner: rx28oml" -H "x-csrftoken: ${csrf_token}" -H "cookie: partnerID=rx28oml; csrftoken=${csrf_token}; rsid=${rsid}" https://${OPS_HOST}/edge/v1/projects/${PROJECT_ID}/edges/${EDGE_ID}/ | jq '.status' |cut -d'"' -f2`
  if [ $EDGE_STATUS == "PROVISION_FAILED" ];
  then
    echo -e " !! Cluster provision failed !!  "
    PROVISION_SUMMARY=`curl -k -s -H "x-rafay-partner: rx28oml" -H "x-csrftoken: ${csrf_token}" -H "cookie: partnerID=rx28oml; csrftoken=${csrf_token}; rsid=${rsid}" https://${OPS_HOST}/edge/v1/projects/${PROJECT_ID}/edges/${EDGE_ID}/provision/log/ | jq '.[].provisionSummary' |cut -d'"' -f2`
    PROVISION_FAILURE_SUMMARY=`curl -k -s -H "x-rafay-partner: rx28oml" -H "x-csrftoken: ${csrf_token}" -H "cookie: partnerID=rx28oml; csrftoken=${csrf_token}; rsid=${rsid}" https://${OPS_HOST}/edge/v1/projects/${PROJECT_ID}/edges/${EDGE_ID}/provision/log/ | jq '.[].provisionFailureSummary'`
    echo -e "$PROVISION_SUMMARY"
    echo -e "$PROVISION_FAILURE_SUMMARY"
    echo -e " !! Exiting !!  " && exit -1
  fi

  if [ $EDGE_STATUS == "PRETEST_FAILED" ];
  then
    echo -e " !! Cluster provision failed !!  "
    PRETEST_SUMMARY=`curl -k -s -H "x-rafay-partner: rx28oml" -H "x-csrftoken: ${csrf_token}" -H "cookie: partnerID=rx28oml; csrftoken=${csrf_token}; rsid=${rsid}" https://${OPS_HOST}/edge/v1/projects/${PROJECT_ID}/edges/${EDGE_ID}/provision/log/ | jq '.[].pretestSummary' |cut -d'"' -f2`
    PRETEST_FAILURE_SUMMARY=`curl -k -s -H "x-rafay-partner: rx28oml" -H "x-csrftoken: ${csrf_token}" -H "cookie: partnerID=rx28oml; csrftoken=${csrf_token}; rsid=${rsid}" https://${OPS_HOST}/edge/v1/projects/${PROJECT_ID}/edges/${EDGE_ID}/provision/log/ | jq '.[].pretestFailureSummary'`
    echo -e "$PRETEST_SUMMARY"
    echo -e "$PRETEST_FAILURE_SUMMARY"
    echo -e " !! Exiting !!  " && exit -1
  fi

  PROVISION_STATUS=`curl -k -s -H "x-rafay-partner: rx28oml" -H "x-csrftoken: ${csrf_token}" -H "cookie: partnerID=rx28oml; csrftoken=${csrf_token}; rsid=${rsid}" https://${OPS_HOST}/edge/v1/projects/${PROJECT_ID}/edges/${EDGE_ID}/ | jq '.provision.status' |cut -d'"' -f2`

  if [ $PROVISION_STATUS == "INFRA_CREATION_FAILED" ];
  then
    echo -e " !! Cluster provision failed !!  "
    COMMENTS=`curl -k -s -H "x-rafay-partner: rx28oml" -H "x-csrftoken: ${csrf_token}" -H "cookie: partnerID=rx28oml; csrftoken=${csrf_token}; rsid=${rsid}" https://${OPS_HOST}/edge/v1/projects/${PROJECT_ID}/edges/${EDGE_ID}/ | jq '.provision.comments' |cut -d'"' -f2`
    echo -e "$COMMENTS"
    echo -e " !! Exiting !!  " && exit -1
  fi

  PROVISION_STATE=`curl -k -s -H "x-rafay-partner: rx28oml" -H "x-csrftoken: ${csrf_token}" -H "cookie: partnerID=rx28oml; csrftoken=${csrf_token}; rsid=${rsid}" https://${OPS_HOST}/edge/v1/projects/${PROJECT_ID}/edges/${EDGE_ID}/provision/progress | jq '.stateName' |cut -d'"' -f2`
  if [ "$PROVISION_STATE" == "null" ];
  then
    echo "Provisioning Node"
  else
    echo "Provisioning ${PROVISION_STATE}"
  fi
done
if [ $EDGE_STATUS != "READY" ];
then
    echo -e " !! Cluster provision failed !!  "
    echo -e " !! Exiting !!  " && exit -1
fi
if [ $EDGE_STATUS == "READY" ];
then
    echo "[+] Cluster Provisioned Successfully waiting for it to be healthy"
    CLUSTER_HEALTH=`curl -k -s -H "x-rafay-partner: rx28oml" -H "x-csrftoken: ${csrf_token}" -H "cookie: partnerID=rx28oml; csrftoken=${csrf_token}; rsid=${rsid}" https://${OPS_HOST}/edge/v1/projects/${PROJECT_ID}/edges/${EDGE_ID}/ | jq '.health' |cut -d'"' -f2`
    while [ "$CLUSTER_HEALTH" != 1 ]
    do
      echo "Iteration-${CLUSTER_HEALTH_ITERATIONS} : Waiting 60 seconds for cluster to be healthy..."
      sleep 60
      if [ $CLUSTER_HEALTH_ITERATIONS -ge 15 ];
      then
        break
      fi
      CLUSTER_HEALTH_ITERATIONS=$((CLUSTER_HEALTH_ITERATIONS+1))
      CLUSTER_HEALTH=`curl -k -s -H "x-rafay-partner: rx28oml" -H "x-csrftoken: ${csrf_token}" -H "cookie: partnerID=rx28oml; csrftoken=${csrf_token}; rsid=${rsid}" https://${OPS_HOST}/edge/v1/projects/${PROJECT_ID}/edges/${EDGE_ID}/ | jq '.health' |cut -d'"' -f2`
    done
fi

if [[ $CLUSTER_HEALTH == 0 ]];
then
    echo -e " !! Cluster is not healthy !!  "
    echo -e " !! Exiting !!  " && exit -1
fi
if [[ $CLUSTER_HEALTH == 1 ]];
then
    echo "[+] Cluster Provisioned Successfully and is Healthy"
fi