Skip to content

Provisioning

The following sequence diagram describes the high-level steps carried out in a sequence during the provisioning process. Customers can optionally automate the entire sequence using the RCTL CLI or REST APIs or automation tools.

Provisioning Worklow


Demo Video

Watch a video of provisioning of a "Multi Master" upstream Kubernetes cluster on "CentOS" with only storage for persistent volumes.


STEP 1: Select Cluster Configuration

Review the supported cluster configurations and select your desired cluster configuration. This will determine the number of nodes you need to prepare to initiate cluster provisioning.

Type Number of Initial Nodes
Converged, Single Master 1 Nodes (1 Master/Worker)
Dedicated, Single Master 2 Nodes (1 Master + 1 Worker)
Converged, Multi Master 3 Nodes (3 Masters + 1 Worker)
Dedicated, Multi Master 4 Nodes (3 Masters + 1 Worker)

STEP 2: Prepare Nodes

Create VMs or bare metal instances compatible with the infrastructure requirements. Ensure that you have SSH access to all the instances/VMs

Important

Ensure you have the exact number of nodes for initial provisioning as per the cluster configuration from the previous step. Additional worker nodes can be added once the cluster is successfully provisioned.


STEP 3: Create a Cluster

  • Login into the Console
  • Navigate to the project where you would like the cluster provisioned.
  • Click New Cluster
  • Select Create a New Cluster and click Continue
  • Select the Environment Data center / Edge
  • Select Package Type "Linux Installer"
  • Select Kubernetes Distribution "Upstream Kubernetes"
  • Enter the Cluster Name and click Continue

General

All the defaults are automatically selected and presented. Several options are available for customization.

Location

  • Specify a location for the cluster (for multi-cluster workflows)

Blueprint

  • Select cluster blueprint and version. Users can select the default-upstream blueprint based on the requirement. To use the default-upstream blueprint, a storage node is mandatory

Note: On Day 1 operation, you can select any blueprint other than default-upstream if storage node is not available

Kubernetes Version

  • Select the Kubernetes version required to deploy. For arm worker nodes, the supported k8s version is always 1.20 onwards and the supported Operating System is Ubuntu

Operating System

  • Select the OS and Version you used for the nodes

General Settings

Advanced

Integrated Storage

  1. Users can set the storage provider details along with the cluster creation process only when selecting a blueprint other than default-upstream. This is because no storage providers are integrated to those blueprint options.

    • If distributed storage is required, select GlusterFS
    • If you have selected multiple storage types, select the default storage class

Storage GlusterFS

Important

For windows worker node, deselect the GlusterFs storage provider

  1. The Managed Storage is available as an add-on with the blueprint default-upstream, thus the users are not allowed to edit or change the default storage class rook-ceph at the time of cluster creation/provision.

    • To provision a HA Cluster using rook-ceph, it is recommended to have a minimum of three (3) storage nodes
    • To provision a non-HA Cluster using rook-cpeh, it is recommended to have a minimum of one (1) storage node

Storage Rook/

Security

By default, as a security precaution, nodes need to be approved before joining a cluster. Auto Approval of nodes is available, and this can help streamline the cluster provisioning and expansion workflows.

  • Enable Approve Nodes Automatically if you do not require an approval gate for nodes to join the cluster

Kubernetes Masters

  • To provision a HA cluster using rook-ceph storage, enable the High Availability (Multi Master)
  • Select Dedicated Master if k8s masters to be tainted not to allow workload pods

Advanced Settings

HTTP Proxy

  • Select Enable Proxy if the infrastructure being used to provision the cluster is behind a forward proxy.

  • Configure the http proxy with the proxy information (ex: http://proxy.example.com:8080)

  • Configure the https proxy with the proxy information (ex: http://proxy.example.com:8080)
  • Configure No Proxy with Comma separated list of hosts that need connectivity without proxy
  • Configure the Root CA certificate of the proxy if the proxy is terminating non MTLS traffic
  • Enable" TLS Termination Proxy" if the proxy is terminating non MTLS traffic and cannot provide the Root CA certificate of the proxy.

Important

Proxy configuration cannot be changed once the cluster is created.

Forward Proxy

Cluster Networking

Default subnet used for pod networking is "10.244.0.0/16" Default subnet used for k8s services is "10.96.0.0/12"

If you want to customize the subnets used for Pod Networking and K8s Services:

  • Select CNI Providers from the drop-down, either Calico or Flannel. On selecting Flannel, Windows Support is enabled by default. To provision a worker node for Upstream K8s, enable this Windows Support option

Note: To provision a windows worker node, it is mandatory to already have a Linux master node (Control Plane).

Refer Add Worker Nodes for more information on adding a window worker node to the upstream cluster

  • Configure the "Pod Subnet" with the subnet that you want to use
  • Configure the "Service Subnet" with the subnet that you want to use

Important

Cluster Networking cannot be changed once the cluster is created

Custom CIDR


STEP 4: Download Conjurer and Secrets

  • Review the Node Installation Instructions section on the Console
  • Download the cluster bootstrap binary (i.e. Conjurer)
  • Select Download Linux amd64 (x86-64) Installer for the system using 64bit processor developed by the AMD processor
  • Select Download Linux arm64 Installer for the system using 64bit processor developed by the ARM processor
  • Select Download Windows amd64 (x86-64) Installer for the system using windows processor developed by the ARM processor
  • Download the cluster activation secrets, i.e., Passphrase and Credential files
  • SCP the three (3) files to the nodes you created in the previous step

Important

The activation secrets (passphrase and credentials) are unique per cluster. You cannot reuse this for other clusters.

Download Conjurer and Secrets

An illustrative example is provided below. This assumes that you have the three downloaded files in the current working directory. The three files will be securely uploaded to the “/tmp” folder on the instance.

$ scp -i <keypairfile.pem> * ubuntu@<Node's External IP Address>:/tmp

STEP 5: Preflight Checks

It is strongly recommended to perform automated preflight tests on every node to ensure that it has "compatible" hardware, software, and configuration. View the detailed list of preflight checks.

  • SSH into the node and run the Linux amd64 (or) arm64 installer based on the requirement using the provided passphrase and credentials
  • SSH into the node and run the Windows amd64 installer based on the requirement using the provided passphrase and credentials.
  • From the node installation instructions, copy the preflight check command and run it
  • If there are no errors, proceed to the next step
  • If there are warnings or errors, fix the issues, run the preflight check before proceeding to the next step

Preflight Checks


STEP 6: Run Conjurer

  • From the node installation instructions, copy the provided command to run the conjurer binary
  • SSH into the nodes and run the installer using the provided passphrase and credentials.

An illustrative example is provided below

sudo ./conjurer -edge-name="onpremcluster" -passphrase-file="onpremcluster-passphrase.txt" -creds-file="onpremcluster.pem -t

[+]  Initiating edge node install

[+] Provisioning node
      [+] Step 1. Installing node-agent
      [+] Step 2. Setting hostname to node-72djl2g-192-168-0-20-onpremcluster
      [+] Step 3. Installing credentials on node
      [+] Step 4. Configuring node-agent
      [+] Step 5. Starting node-agent

[+] Successfully provisioned node

Important

For arm nodes, only worker nodes are supported

Conjurer is a “cluster bootstrap agent” that connects and registers the nodes with the Controller. Information about the Controller and authentication credentials for registration is available in the activation secrets files.

Note: After successful run of conjurer binary on windows node, reboot the windows node

Step 6.1: Salt Minion coexists with customer's salt minion

Conjurer binary has introduced a multi-minion option that allows installing Rafay salt-minion, which can run along with other salt-minion(s), already installed on the node by the customer.

  • Install multi-minion: Use -m (compatible with the pre-existing salt minions) along with the run conjurer command as shown in the below example to install the multi-minion:
sudo ./conjurer -m -edge-name="onpremcluster" -passphrase-file="onpremcluster-passphrase.txt" -creds-file="onpremcluster.pem -t
  • Uninstall multi-minion: Use -m -d as shown in the below example to remove salt-minion software from the customer node and not delete the pre-existed salt-minion’s default configuration or logs.
sudo ./conjurer -m -d -edge-name="onpremcluster" -passphrase-file="onpremcluster-passphrase.txt" -creds-file="onpremcluster.pem -t

Once the run conjurer step is complete, the node will show up on the Web Console with the status as DISCOVERED.

Run Conjurer


STEP 7: Approve Node

This is an optional approval step that acts as a security control to ensure that administrators can inspect and approve a node before it can become part of the cluster.

  • Click Approve button to approve the node to this cluster
  • In a few seconds, you will see the status of the node being updated to “Approved" in the Web Console
  • Once approved the status changes to APPROVED. The node is automatically probed and all information about the node is presented to the administrator on the Web Console.

Approve Node


STEP 8: Configure Node

This is a mandatory configuration step that allows the infrastructure administrator to specify the “role” for the node.

Important

Without the configuration step, cluster provisioning cannot be initiated.

  • Select the cluster and click the Nodes tab
  • Click Configure
  • If the node is meant to be a k8s master, select the "master" role
  • If the node is meant to handle storage, select the storage location from the automatically detected list
  • Select the network interface that will be used for cluster networking from the automatically detected list of interfaces
  • Click Save

STEP 9: Provisioning

All the necessary configurations are provided, and the Controller can start provisioning Kubernetes with all required software add-ons. These will be automatically provisioned and configured to operationalize the cluster.

  • Click Provision. A message appears to confirm the provisioning process with a note "It can take 5-10 mins for the provision to complete"
  • Click Yes to proceed, and the status changes to Provisioning

Successful Provisioning

Important

The end-to-end provisioning process can take time and is dependent on the number of nodes you are provisioning and the Internet bandwidth available to your nodes to download the necessary software.

Below is an example of an upstream cluster with windows worker node

Successful Provisioning

Provisioning Successful

Once all the steps are complete and if the provision process was successful, you should see details about the cluster on the Web Console.

Successful Provisioning

On successful upstream cluster provision, users can view the detailed operations and workflow of the cluster by clicking the Operation Status Ready icon. The screen shows all the stages that occurred during cluster deployment

Successful Provisioning

Users can view the Nodes status and health in this page.

Successful Provisioning


Reset Cluster

Post-provisioning, users are allowed to RESET the upstream cluster to reuse the same cluster object in the console while a master node is reinstalled/re-provisioned.

  • Click Reset Cluster

Successful Provisioning

The below message appears to confirm the deletion

  • Click OK to proceed with the node deletion

Successful Provisioning

On successful cluster reset, nodes are deleted and you will see the below screen with no nodes

Successful Provisioning


Troubleshooting

Once the "minion/node agent" is installed on the node by conjurer successfully and is registered with the controller, it establishes a "long running" web socket with the controller providing "continuous updates" to the controller about progress and status. This information is then presented to authorized administrators via the Console for insights. Optionally, administrators can also view the logs generated by the minion/node agent for detailed visibility.

tail -f /var/log/salt/minion

View Cluster Configuration

Administrators can view the provisioned cluster's configuration by clicking on the cluster and selecting the "Configuration" tab.

Provisioned Cluster Config


Common Issues

Although "conjurer" provides a built-in battery of "preflight tests" that can be used to verify the environment and configuration, there are some scenarios where provisioning can fail.

Host Firewall

If your instances (for the nodes) have a host firewall such as firewall or iptables rules, it may be silently drop all packets destined for the Controller. This will result in provisioning failure. Ensure that the host firewall is configured to allow outbound communications on tcp/443 to the controller.

MTU

The Maximum Transmission Unit (MTU) is the largest block of data that can be handled at Layer-3 (IP). MTU usually refers to the maximum size a packet can be. Certain MTU/MSS settings can result in fragmentation related issues with mTLS connections between the agents and the controller.

Unstable Network

Unstable or unreliable network connectivity. Remote cluster provisioning in remote, low bandwidth locations with unstable networks can be very challenging. Review how the retry and backoff mechanisms work by default and how they can be customized to suit your requirements.