For purposes of automation, it is strongly recommended that users create "version controlled" declarative cluster specification files to provision and manage the lifecycle of Kubernetes clusters.
Important
Users need to use only a single command (rctl apply -f cluster_spec.yaml) for both provisioning and ongoing lifecycle operations. The controller will automatically determine the required changes and seamlessly map them to the associated action (e.g. add nodes, remove nodes, upgrade Kubernetes, update blueprint etc).
You can create an Upstream k8s cluster based on a version controlled cluster spec that you can manage in a Git repository. This enables users to develop automation for reproducible infrastructure.
./rctlapply-f<clusterfilename.yaml>
An illustrative example of the split cluster spec YAML file for MKS is shown below
Below is an extended MKS config specification that includes additional parameters to facilitate system synchronization for MKS clusters.
systemComponentsPlacement
ssh: SSH credentials used by the GitOps Agent to perform actions on the cluster nodes. The cluster-level SSH section provides the option to override node-level SSH settings.
All fields can be overridden at the node level. This is useful when different SSH keys are used for each node. If an IP address is not provided for SSH, use the private IP to connect to the nodes and run the installer
SSH configurations are primarily intended for use with RCTL
Extended MKS Configuration Spec for System Sync Operation¶
Below is an extended MKS config specification that includes additional parameters to facilitate system synchronization for MKS clusters.
To create a cluster using this extended config spec, run the below command:
./rctl apply -f <cluster file name.yaml> --v3
Conjurer Changes for rctl apply
The below change is only for users who are using the rctl apply path for cluster provisioning.
Previous Behavior:
Conjurer binaries were stored in the user's home directory.
This could lead to file corruption when multiple nodes attempted to write to the same file during rctl apply in environments using NFS volumes.
New Behavior:
The location of Conjurer binaries has been moved to /usr/bin.
This ensures that concurrent writes by multiple nodes no longer cause corruption.
Passphrase and Certificate Storage:
The passphrase (txt) and certificate (PEM file) are now stored in /tmp.
Changes Required:
If you use automation scripts with rctl apply that reference the old home directory path, you will need to update the path to /usr/bin for executing the Conjurer when provisioing cluster or removing cluster nodes . This change is necessary to ensure correct execution of the Conjurer binary.
Note: The passphrase and PEM file are stored in the /tmp folder, which will be automatically removed on reboot.
Once the rctl create command is executed successfully, following actions will be done:
Create cluster on the controller
Download conjurer & credentials
SCP conjurer & credentials to node
Run conjurer
Configure role, interface
Start provision
Note
At this time only SSH key based authentication is supportted to scp into the nodes
The MKS Bulk Node Deletion feature allows for the simultaneous deletion of multiple nodes in a MKS environment, using RCTL. It is recommended to delete up to 100 nodes at a time.
Below is an example of a cluster with 30 nodes:
To remove these nodes from the MKS Cluster using RCTL, run the command: ./rctl apply -f <cluster_filename.yaml>. Below is the output of this deletion process
{
"taskset_id": "dk64021",
"operations": [
{
"operation": "BulkNodeDelete",
"resource_name": "mks-scale-aug",
"status": "PROVISION_TASK_STATUS_INPROGRESS"
}
],
"comments": "Configuration is being applied to the cluster",
"status": "PROVISION_TASKSET_STATUS_INPROGRESS"
}
Now, the status of all nodes will be Deleting.
To check the status of the node deletion operation, run the command using the status id: ./rctl status apply <taskset_id>
Once the node(s) deletion is complete, the status will be shown as Deleted
Note: It takes approximately 12 minutes to complete the deletion of 100 nodes.
To force delete a node, remove the nodes to be deleted from the cluster configuration and apply the updated configuration using the --force flag, as shown in the command below. This ensures the nodes are deleted immediately, bypassing any validation or blocking errors.
To upgrade the nodes, incorporate the strategy parameters into the specification, whether opting for a concurrent or sequential approach. Here is an illustrative configuration file where the corresponding parameters have been integrated.
For the Concurrent strategy, assign a value to workerConcurrency, whereas in the case of the Sequential strategy, workerConcurrency is not required. Refer K8s Upgrade page for more information.
The dry run command is utilized for operations such as Cluster Provisioning, K8s upgrades, blueprint upgrades, and node operations(e.g. :Node Addition/Node Deletion/Labels/Taints). It provides a pre-execution preview of changes, enabling users to assess potential modifications before implementation. This proactive approach is beneficial for identifying and addressing issues, ensuring that the intended changes align seamlessly with infrastructure requirements. Whether provisioning a new cluster or managing updates, incorporating a dry run enhances the predictability, reliability, and overall stability of your infrastructure.
./rctl apply -f <cluster_filename.yaml> --dry-run
Example
Node Addition - Day 2 operation
Below is an example of the output from the dry run command when a user tries to add a node on Day 2:
./rctl apply -f cluster_file.yaml --dry-run
Running echo $HOSTNAME on 34.211.224.152
Running PATH=$PATH:/usr/sbin ip -f inet addr show on 34.211.224.152
Running command -v bzip2 on 34.211.224.152 to verify if bzip2 is present on node
Running command -v wget on 34.211.224.152 to verify if wget is present on node
{
"operations": [
{
"operation": "NodeAddition",
"resource_name": "ip-172-31-27-101"
}
]
}
Dry Run Output
The output indicates a successful dry run
The operation specified is "NodeAddition," indicating the intent to add a node to the cluster
The resource_name is "ip-172-31-27-101," representing the hostname or identifier for the node being added
Incorrect K8s version - Day 2 operation
Below is an example of the output from the dry run command when a user tries to add an incorrect K8s version:
./rctl apply -f check1.yaml --dry-run
Error: Error performing apply on cluster "mks-test-1-calico": server error [return code: 400]: {"operations":null,"error":{"type":"Processing Error","status":400,"title":"Processing request failed","detail":{"Message":"Kubernetes version v1.28.3 not supported\n"}}}
Dry Run Output
Indicates that an error occurred while trying to apply changes to the cluster named "mks-test-1-calico," and the server returned an HTTP status code 400 (Bad Request).