Design Guidelines

Environment template¶

Agent Management: Associating an agent or a pool of agents with an environment template simplifies management compared to specifying agents at the resource template level. This can be overridden at environment launch by the end user if needed
Dependency and Parallelism: Leverage resource template dependency ordering and parallelism to align with your application deployment requirements
Variable Management: Prefer using vanilla variables in the environment template over an external config context unless the config settings need to be shared across multiple environment templates
Inline Config Context: Use inline config context at the environment template level only when files need to be included
Input Collection: Define input variables at the environment template level and chain or alias them to corresponding variables in resource templates using selectors. This approach allows you to collect inputs from users once and pass them to multiple resource templates
User Control and Restriction: Gain better control over user permissions, such as restricting values, hiding variables, and defining overrides. Customizing variables at the environment template level (a singleton) is also simpler than doing so across multiple resource templates
JSON Variable Type: Variables of type JSON are highly flexible, supporting other variable types and allowing the inclusion of expressions within JSON

Resource template¶

Isolate Functionality: It is recommended to isolate functionality for each resource template and link one resource template to another at the environment template level. Avoid creating a monolithic resource template that handles multiple tasks
Input Variables: Expose input variables as mandatory at the resource template level without restrictions. This allows higher-level environment templates to handle customization, such as restricting values or pre-setting defaults
Variable Management: Prefer using vanilla variables in resource templates over an external config context
Flexible Value Types: Use JSON as the value type for its flexibility, as it supports all other types and allows expressions to be included
Underlying Code:
Use Terraform/OpenTofu code for Infrastructure as Code (IaC), referenced by the resource template through repository associations
If a specific version or engine of Terraform/OpenTofu is required, use a custom packaged driver in place of Rafay’s packaged driver
If Terraform/OpenTofu is not a suitable option, utilize a packaged driver as a custom provider to execute the required logic as tasks
Output Variables: Expose output variables from the resource template to make module values available as they are populated during execution, below is an example,

output "public_subnets" {
 value = module.vpc.public_subnets
}

Drivers/Workflow Handlers¶

For shorter tasks, opt for a container-based driver
For long-running, repeat-polling tasks, utilize function drivers
Use the HTTP driver when the task involves simply making HTTP calls to check status

Agents¶

It is recommended to define multiple agents for high availability, either at the environment template level or during environment launch/runtime. If agents are defined at both levels, the agent specified at the runtime level takes precedence
Prefer using an agent running within the cluster if IAM Roles for Service Accounts (IRSA) can be leveraged for provisioning resources in the environment

IaC code¶

Variables¶

Every variable should have
name
description
default value (if optional)
Every variable other than optional should have validation

variable "eks_devworkspace_cluster_routing_sufix" {
  type = string
  description = "*Required: Supports multi-tenancy will generate url with the suffix provided e.g. for suffix devworkspace.dev.rafay-edge.net and name provided tem1 for every workspace we will get name <workspaceid>.<name>-devworkspace.dev.rafay-edge.net"
  validation {
   condition = (length(var.eks_devworkspace_cluster_routing_sufix)>0)
   error_message = "Value is required and a hosted zone should be created."
 }
}

All variables are declared in locals block at start of your main.tf and use appropriate logic required for naming conventions

locals {
 name = "${replace(var.name, "_", "-")}-eks-cluster"
 namespace = "${local.name}-ns"
 aws_public_subnet = "${local.name}-public-subnet"
 tags = merge(var.tags,{
    ManagedBy = "Rafay Eaas"
    Resource = "${local.name}"
 })
}

In cases where there are restrictions for a resource for e.g., RDS Oracle will only accept 8 characters, a random name can be generated and added to the original name in tags

locals {
 rds_oracle_name = "${substr(local.name,1,4)}${random.name_randomizer.hex}"
 tags = merge(var.tags,{
   OriginalName = "${local.name}"
 })
}
resource "random" "name_randomizer"{
  keepers = {
   # Generate a new id each time we switch to a new name
   name = local.name
 }
 byte_length = 4
}

Naming of variables should be logical conveying the purpose of the variable

if string variables
\<resource_name>_\<purpose>
if boolean variable
\<resource_name>_\<purpose>_enabled
if numeric
\<resource_name>_\<purpose>_\<count|max|min>

Guidelines for changing variables
Adding a New Variable:
Declare the new variable as optional by providing a default value to ensure backward compatibility
If the variable is optional for the resource, a null value can be provided
Modifying Existing Variables:
Do not change the name or data type of an existing variable
Instead of modifying a variable, add a new one and retain the old variable to maintain backward compatibility
Manage all variables in locals and map the old variable to the new local variable for seamless transition
Deleting Variables:
Avoid deleting existing variables from the variable list
Instead, mark the variable as deprecated in its description, update the README to explain the behavioral changes, and continue to maintain backward compatibility

Guidelines for Naming Outputs¶

Descriptive Naming:
The name of an output should clearly describe the property it represents and be more structured than free-form naming

Recommended Structure:

1	`Use the format {name}\_{type}\_{attribute}, where:`

{name}: Represents the resource or data source name
Example for data: aws_subnet "private" → private
Example for resource: aws_vpc_endpoint_policy "test" → test
{type}: Represents the resource or data source type without the provider prefix
Example for data: aws_subnet "private" → subnet
Example for resource: aws_vpc_endpoint_policy "test" → vpc_endpoint_policy
{attribute}: Represents the specific attribute returned by the output
Generic Naming for Complex Outputs:
For outputs returning values derived from multiple resources or interpolation functions, {name} and {type} should be as generic as possible
Avoid including unnecessary prefixes
Plural Names for Lists:
If the output is a list, use a plural form in the name
Include Descriptions:
Always provide a description for outputs, even if the purpose seems self-evident
Handling Sensitive Outputs:
Avoid marking arguments as sensitive unless you have complete control over their usage across all modules
Formatting Recommendations:
Use hyphens (-) in argument values and in contexts where the value will be human-readable, such as DNS names for RDS instances

Resource¶

The resources/data from the provider should be declared in the following format

resource "aws_route_table" "<resource_name>_public" {
}
resource "aws_route_table" "<resource_name>_public" {
}

Resource naming should reflect the logical context of the resource. For example, if the resource represents a service for JupyterHub, it can be named "jupyterhub"
Do not repeat resource type in resource name (neither partially nor completely)
Use
resource "aws_route_table" "public" {}`
Do Not Use
resource "aws_route_table" "public_route_table" {}
resource "aws_route_table" "public_aws_route_table" {}
A resource should be named using its type if no more descriptive or general name is applicable, or if the resource module creates only one resource of that type. For example, in an AWS VPC module with a single aws_nat_gateway resource and multiple aws_route_table resources, the aws_nat_gateway can be named directly, while the aws_route_table resources should have more descriptive names, such as private, public, or database. Always use singular nouns for resource names
Place the count or for_each argument at the top of the resource or data source block as the first argument, followed by a newline for separation
Include the tags argument (if supported by the resource) as the last core argument, followed by depends_on and lifecycle if needed. Each of these should be separated by a single empty line for clarity
When setting conditions for count or for_each, use boolean values instead of expressions like length or similar constructs

Helm Charts driven environment templates¶

Helm Provider Compatibility¶

Ensure that the Helm provider version in Terraform is compatible with both the Helm version and your Kubernetes version
Verify any breaking changes between different versions of the Helm provider in Terraform, such as those between v1.x and v2.x

Values File Management¶

Value Changes: If the Helm chart values have been modified in a new version (e.g., renamed parameters, added, or removed settings), update the values block in your Terraform configuration to reflect these changes
Dynamic Values: If your Helm charts depend on dynamic values or Terraform variables, ensure these variables are properly updated and consistent across all environments

Dry-Run Upgrade¶

Helm provides a --dry-run option, which is useful to simulate the upgrade without applying changes

Chart Versioning¶

Version Pinning: Make sure to pin the version of Helm charts in your Terraform configuration using the version attribute. This prevents Terraform from automatically upgrading to a new chart version unless explicitly changed

Helm Dependencies¶

Subcharts: If your Helm chart includes dependencies (subcharts), verify whether those dependencies are also being upgraded and check for any breaking changes they might introduce. Avoid using subcharts when possible, as they may not be accessible in self-hosted environments.

Drivers Inputs and Outputs¶

Inputs¶

Dynamic inputs (expressions) can be configured for all fields within the driver. Users can supply values for these inputs during environment deployment, define them as variables within the config context attached to the driver, or specify them inline. These expressions are evaluated in the execution context during workflow execution.

Expression Format:
The standard format for expressions is:
$(current.input.\<variable-name\>)$
Here, \<variable-name> represents the name of the variable whose value is being referenced.

Example:
If a variable is named api_key, it can be referenced in the configuration as:
$(current.input.api\_key)$

Sample Input File:

{
  "kind": "Driver",
  "metadata": {
    "name": "driver-test"
  },
  "spec": {
    "config": {
      "type": "container",
      "container": {
        "image": "jira:latest",
        "arguments": [
          "-ticket_id",
          "$(current.input.ticket_id)$"
        ],
        "env_vars": {
          "API_KEY": "$(current.input.api_key)$"
        }
      }
    },
    "inputs": [
      {
        "data": {
          "variables": [
            {
              "name": "api_key",
              "value": "********",
              "valueType": "text"
            },
            {
              "name": "ticket_id",
              "value": "1234",
              "valueType": "text"
            }
          ]
        }
      }
    ]
  }
}

Outputs¶

Driver outputs can be used as inputs for subsequent drivers within a workflow.

For the HTTP driver, the response payload is saved as output.
For the Container driver, the output.json file is uploaded to the designated endpoint, and its content becomes available as output.

These outputs can be referenced using expressions tailored to specific hooks.

Example Expression:
$(resource.test\_resource.hook.onInit.create\_ticket.output.ticket\_id)$

This expression retrieves the ticket_id value generated by the create_ticket hook during the onInit stage of the test_resource resource template. The ticket_id can then be used in subsequent workflow steps.

Detailed Example:
A resource template named test_resource includes two hooks:

onInit Hook:
Creates a Jira ticket.
Uploads an output.json file containing: {"ticket_id": "1234"}.
onCompletion Hook:
References the data from the onInit hook using the output expression:
$(resource.test\_resource.hook.onInit.create\_ticket.ticket\_id)$ .

This allows the ticket_id created during the onInit stage to be reused during the onCompletion stage or in other steps of the workflow.

{
  "kind": "ResourceTemplate",
  "metadata": {
    "name": "test_resource"
  },
  "spec": {
    "hooks": {
      "onInit": [
        {
          "name": "create_ticket",
          "type": "driver",
          "driver": {
            "data": {
              "config": {
                "type": "container",
                "container": {
                  "image": "jira:custom",
                  "commands": ["create"]
                }
              }
            }
          }
        }
      ],
      "onCompletion": [
        {
          "name": "send_ticket",
          "type": "driver",
          "driver": {
            "data": {
              "config": {
                "type": "container",
                "container": {
                  "image": "jira:custom",
                  "arguments": [
                    "--ticket-id",
                    "$(resource.test_resource.hook.onInit.create_ticket.output.ticket_id)$"
                  ]
                }
              }
            }
          }
        }
      ]
    }
  }
}

Resource Template Lifecycle Hooks

$(resource.resource\_name.hook.onInit.hook\_name.output)$
$(resource.resource\_name.hook.onSuccess.hook\_name.output)$
$(resource.resource\_name.hook.onFailure.hook\_name.output)$
$(resource.resource\_name.hook.onCompletion.hook\_name.output)$

Terraform Lifecycle Hooks

Deploy Hooks

$(resource.template\_name.hook.deploy.init.before.hook\_name.output)$
$(resource.template\_name.hook.deploy.init.after.hook\_name.output)$
$(resource.template\_name.hook.deploy.plan.before.hook\_name.output)$
$(resource.template\_name.hook.deploy.plan.after.hook\_name.output)$
$(resource.template\_name.hook.deploy.apply.before.hook\_name.output)$
$(resource.template\_name.hook.deploy.apply.after.hook\_name.output)$
$(resource.template\_name.hook.deploy.output.before.hook\_name.output)$
$(resource.template\_name.hook.deploy.output.after.hook\_name.output)$

Destroy Hooks

$(resource.template\_name.hook.destroy.init.before.hook\_name.output)$
$(resource.template\_name.hook.destroy.init.after.hook\_name.output)$
$(resource.template\_name.hook.destroy.plan.before.hook\_name.output)$
$(resource.template\_name.hook.destroy.plan.after.hook\_name.output)$
$(resource.template\_name.hook.destroy.destroy.before.hook\_name.output)$
$(resource.template\_name.hook.destroy.destroy.after.hook\_name.output)$

Environment Template Lifecycle Hooks

$(environment.hook.onInit.hook\_name.output)$
$(environment.hook.onSuccess.hook\_name.output)$
$(environment.hook.onFailure.hook\_name.output)$
$(environment.hook.onCompletion.hook\_name.output)$

The expression $(resource.resource\_name.task.task\_name.output)$ is used to access the output of a specific task within a resource template. Here's a breakdown of its components:

resource.resource_name: Refers to the resource template containing the task.
task.task_name: Specifies the particular task within that resource template.
output: Refers to the data produced by the task.

This expression enables subsequent tasks or components to utilize the output of a previous task, facilitating data flow and dependency management within the workflow.

Loading and Launching Templates¶

Environment Manager provides the following interfaces for loading the environment templates into Rafay Platform and launching them.

Pipeline
Swagger API
UI
RCTL (Rafay CLI)
Terraform Provider

Best Practices¶

Centralized Storage:
Store all your templates and related configurations in a private GitHub repository to establish a single source of truth
Automated Loading:
Use a unified pipeline configuration to automate the process of loading changes from Git to your system, ensuring seamless updates to your project
Leveraging Swagger APIs:
If you are building a platform or marketplace for custom templates outside of Rafay and need to load them into your Rafay organization’s project, Swagger APIs provide an efficient and scalable solution