Agents

A Rafay Agent is a service you run in your local network or VPC to connect your artifacts, infrastructure, collaboration, verification, and other providers. After the Rafay Agent is installed, you connect your Rafay Org to private and third-party resources.

Important

The Rafay Agent performs all operations, including deployment and integrations. Rafay Agents helps to access the private repositories and workload artifacts in a Helm/GitRepo, run infra-provisioning stages using Terraform and perform system sync stage process.

Admins with Organization Admin and Project Admin roles have the privilege to create, read, update, upgrade and delete agents in their Orgs
The agent connects back to the Rafay Controller on port 443 outbound only.

The image below shows a typical deployment with the Rafay Agent performing workflow orchestration inside the customer's private network with instructions received from the Rafay Controller. The controller itself can either be Rafay's SaaS or an air gapped deployment self hosted by a customer.

Click here to read additional details about the Rafay Agent if you are using it for Environment Manager or PaaS.

Supported Environments¶

The Rafay Agent can be deployed on either "Docker" or a "Kubernetes Cluster". See support matrix for additional details.

Pre-Conditions¶

An healthy and ready state Cluster is mandatory

New Agent Creation¶

Perform the below steps to create one (or) more agents

Click GitOps and select GitOps Agents in the Controller
Click New Agent
Provide a name and select a Deployment Type from the drop-down, Docker (or) Kubernetes
Click Create

Select a cluster from the drop-down in the General Details section where the agent should be installed

Note: Cluster selection is not applicable for Docker Type

Use the Resources section to configure limits for your GitOps Agent deployment, including concurrency, CPU, memory, and number of workers.
- Concurrency Limits (concurrency: 50): Supports processing up to 50 tasks simultaneously to improve parallel operations
- CPU Limits (cpu: 0.50 or cpu: 500m): Allows the agent to consume up to 0.5 CPU cores
- Memory Limits (memory: 50M and memory: 3Gi): Sets a memory usage cap of 50MB in Docker environments and 3GB in Kubernetes
- Number of Workers (numWorkers: 15): Runs 15 workers to handle Environment Manager activities, increasing processing capacity. The default is 10; this example shows an override value of 15

On the Configuration page, provide node selector, toleration, and node affinity settings to define how the GitOps Agent is scheduled within the Kubernetes cluster. Configure key-value pairs for node selectors, specify tolerations to allow scheduling on tainted nodes, and set required or preferred node affinity rules to control agent placement according to infrastructure needs.
- Node Selector (demo: selector1): The agent will only run on nodes that have a label demo: selector1
- Tolerations (tolerationSeconds: 100): The agent is allowed to run on certain restricted nodes and can stay there for 100 seconds even if the node has restrictions
- Affinity (Node Selection Rules): The agent prefers to run on nodes with specific labels like name=val or demo=selector1/selector2, ensuring better placement

Provide required or preferred pod affinity and pod anti-affinity settings to influence how the GitOps Agent is scheduled relative to other pods in the cluster. Configure these rules to ensure the agent is placed alongside specific pods or kept apart from certain pods based on workload distribution, availability, and fault tolerance requirements.

Click Save

The initial status of the agent creation is Unknown and based on the criteria, the status changes to Healthy or Unhealthy in one (1) minute duration

Upgrade Agent¶

Users are allowed to upgrade the agent(s) if the latest agent version is available

Kubernetes Type¶

To upgrade a Kubernetes type Agent, perform the below steps:

Click Upgrade available blue banner notification and the Progress Tracker pane appears
Read the Note section available in the Progress Tracker to know more about upgrade process
Click Upgrade to proceed

The progress changes to Upgrade in Progress and you can view the progress as shown in the below example

Once the upgrade is initiated, the notification banner disappears showing the status as Healthy

Docker Type¶

To upgrade a Docker type Agent, perform the below steps:

Click Upgrade available blue banner and the Progress Tracker pane appears. A Readme is available for the users
Click Upgrade to proceed

On initiating the agent upgrade, the progress changes to Upgrade in Progress as shown in the below example

Click the upgrade in progress notification. You will see the Instructions to deploy cd-agent agent-docker in the Progress Tracker pane

Once the Agent Draining is complete, the Progress Tracker pane provides a different set of instructions as shown below. Perform these steps to complete the agent upgrade process

Once the upgrade in initiated, the notification banner disappears showing the status as Healthy

Note: The status changes to Deactivated once the agent upgrade is initiated and again changes to Activated on a successful upgrade

Agent Options¶

The existing GitOps Agent(s) can be shared with All/Specific/None projects. This helps to use the configured agent specs to infuse to the new pipeline if required or change the required resources

Make the required selection to share the agent and click Save

Note: Users cannot share/delete the agents inherited from other projects

Users can Edit, Activate/Deactivate or Delete the existing agents using the respective icons. An information icon is available for Docker agents, providing the instructions to deploy the agents and upgrade to the latest version

Note: Activation of Agent is not allowed during the upgradation process

Failed Scenarios¶

Kubernetes Agent upgrade can fail based on the below scenarios

Cluster is unhealthy or not reachable
Agent is not reachable
Cluster's Nodegroup not having enough CPU limit to accommodate new agent

Docker Agent upgrade can fail based on the below scenarios

Agent is not reachable
Not following the given upgrade instructions provided in the process tracker pane within 8-10 mins once the agent draining is complete

Below is an example where the agent is not reachable and the draining process is incomplete, hence the upgrade failed

Centralized Configuration for GitOps Agents¶

Managing resource configurations for GitOps agents manually across multiple deployments can be challenging, especially for large-scale operations. This enhancement enables centralized configuration, allowing users to define resource limits, worker counts, and scheduling policies for better scalability and efficiency.

For example, if agents had a fixed resource quota with predefined CPU and memory allocations, users relying on a single agent for multiple concurrent operations would have to manually update cluster configurations to accommodate increased workloads. With this enhancement, users can now configure parameters such as the number of workers for the engine agent, CPU limits, memory limits, concurrency limits for CD agent RPC calls, tolerations, node selectors, and affinity settings centrally from the controller, simplifying management and improving efficiency.

This feature is supported via API, Terraform, and RCTL.

Parameters Supported on Both Docker and Kubernetes Agents¶

CPU Limits: Defines the CPU resources allocated to the CD agent pod, ensuring optimal performance based on workload demands
Memory Limits: Specifies the memory allocation for the CD agent pod to prevent resource exhaustion and maintain stability
Number of Workers for the Engine Agent: Applies specifically to the Environment Manager. Each agent provisions up to 10 workers (pods) by default to handle environment management activities. Users can configure this value beyond 10 to scale operations efficiently, but the minimum remains 10
Concurrency Limits for CD Agent RPC Calls: Adjusts the number of concurrent RPC calls the CD agent can handle. Increasing this limit can enhance performance when processing multiple requests simultaneously

Parameters Supported on Kubernetes Agents Only¶

Tolerations: Enables scheduling on tainted nodes
Node Selector: Limits scheduling to specific labeled nodes
Affinity: Controls pod placement on preferred nodes

⚠️ Important: Agent-Level Overrides If Kubernetes options such as node selectors, tolerations, or affinity are defined at the agent level, they override the corresponding cluster-level system component settings. By default, system component overrides at the cluster level determine scheduling rules for all system resources, including CD agents. If an agent does not have specific node selectors, it follows these cluster-wide rules. However, if node selectors or other scheduling parameters are set for the agent, they take precedence, ensuring the agent is deployed based on its custom configuration.