Skip to content

Agents

A Rafay Agent is a service you run in your local network or VPC to connect your artifacts, infrastructure, collaboration, verification, and other providers. After the Rafay Agent is installed, you connect your Rafay Org to private and third-party resources.

Important

The Rafay Agent performs all operations, including deployment and integrations. Rafay Agents helps to access the private repositories and workload artifacts in a Helm/GitRepo, run infra-provisioning stages using Terraform and perform system sync stage process.

  • Admins with Organization Admin and Project Admin roles have the privilege to create, read, update, upgrade and delete agents in their Orgs
  • The agent connects back to the Rafay Controller on port 443 outbound only.

The image below shows a typical deployment with the Rafay Agent performing workflow orchestration inside the customer's private network with instructions received from the Rafay Controller. The controller itself can either be Rafay's SaaS or an air gapped deployment self hosted by a customer.

Rafay Agent Architecture

Click here to read additional details about the Rafay Agent if you are using it for Environment Manager or PaaS.


Supported Environments

The Rafay Agent can be deployed on either "Docker" or a "Kubernetes Cluster". See support matrix for additional details.


Pre-Conditions

An healthy and ready state Cluster is mandatory


New Agent Creation

Perform the below steps to create one (or) more agents

  • Click GitOps and select GitOps Agents in the Controller
  • Click New Agent
  • Provide a name and select a Deployment Type from the drop-down, Docker (or) Kubernetes
  • Click Create

New Agent

  • Select a cluster from the drop-down in the General Details section where the agent should be installed

Note: Cluster selection is not applicable for Docker Type

New Agent

  • Use the Resources section to configure limits for your GitOps Agent deployment, including concurrency, CPU, memory, and number of workers.
    • Concurrency Limits (concurrency: 50): Supports processing up to 50 tasks simultaneously to improve parallel operations
    • CPU Limits (cpu: 0.50 or cpu: 500m): Allows the agent to consume up to 0.5 CPU cores
    • Memory Limits (memory: 50M and memory: 3Gi): Sets a memory usage cap of 50MB in Docker environments and 3GB in Kubernetes
    • Number of Workers (numWorkers: 15): Runs 15 workers to handle Environment Manager activities, increasing processing capacity. The default is 10; this example shows an override value of 15

New Agent

  • On the Configuration page, provide node selector, toleration, and node affinity settings to define how the GitOps Agent is scheduled within the Kubernetes cluster. Configure key-value pairs for node selectors, specify tolerations to allow scheduling on tainted nodes, and set required or preferred node affinity rules to control agent placement according to infrastructure needs.
    • Node Selector (demo: selector1): The agent will only run on nodes that have a label demo: selector1
    • Tolerations (tolerationSeconds: 100): The agent is allowed to run on certain restricted nodes and can stay there for 100 seconds even if the node has restrictions
    • Affinity (Node Selection Rules): The agent prefers to run on nodes with specific labels like name=val or demo=selector1/selector2, ensuring better placement

New Agent

  • Provide required or preferred pod affinity and pod anti-affinity settings to influence how the GitOps Agent is scheduled relative to other pods in the cluster. Configure these rules to ensure the agent is placed alongside specific pods or kept apart from certain pods based on workload distribution, availability, and fault tolerance requirements.

New Agent

  • Click Save

The initial status of the agent creation is Unknown and based on the criteria, the status changes to Healthy or Unhealthy in one (1) minute duration


Upgrade Agent

Users are allowed to upgrade the agent(s) if the latest agent version is available

Kubernetes Type

To upgrade a Kubernetes type Agent, perform the below steps:

  • Click Upgrade available blue banner notification and the Progress Tracker pane appears
  • Read the Note section available in the Progress Tracker to know more about upgrade process
  • Click Upgrade to proceed

New Agent

The progress changes to Upgrade in Progress and you can view the progress as shown in the below example

New Agent

Once the upgrade is initiated, the notification banner disappears showing the status as Healthy

New Agent

Docker Type

To upgrade a Docker type Agent, perform the below steps:

  • Click Upgrade available blue banner and the Progress Tracker pane appears. A Readme is available for the users

  • Click Upgrade to proceed

New Agent

On initiating the agent upgrade, the progress changes to Upgrade in Progress as shown in the below example

New Agent

  • Once the Agent Draining is complete, the Progress Tracker pane provides a different set of instructions as shown below. Perform these steps to complete the agent upgrade process

New Agent

Once the upgrade in initiated, the notification banner disappears showing the status as Healthy

New Agent

Note: The status changes to Deactivated once the agent upgrade is initiated and again changes to Activated on a successful upgrade


Agent Options

  • The existing GitOps Agent(s) can be shared with All/Specific/None projects. This helps to use the configured agent specs to infuse to the new pipeline if required or change the required resources

Agents Status

Make the required selection to share the agent and click Save

Agents Status

Note: Users cannot share/delete the agents inherited from other projects

  • Users can Edit, Activate/Deactivate or Delete the existing agents using the respective icons. An information icon is available for Docker agents, providing the instructions to deploy the agents and upgrade to the latest version

Agents Status

Note: Activation of Agent is not allowed during the upgradation process


Failed Scenarios

Kubernetes Agent upgrade can fail based on the below scenarios

  • Cluster is unhealthy or not reachable
  • Agent is not reachable
  • Cluster's Nodegroup not having enough CPU limit to accommodate new agent

Docker Agent upgrade can fail based on the below scenarios

  • Agent is not reachable
  • Not following the given upgrade instructions provided in the process tracker pane within 8-10 mins once the agent draining is complete

Below is an example where the agent is not reachable and the draining process is incomplete, hence the upgrade failed

Agents Status


Centralized Configuration for GitOps Agents

Managing resource configurations for GitOps agents manually across multiple deployments can be challenging, especially for large-scale operations. This enhancement enables centralized configuration, allowing users to define resource limits, worker counts, and scheduling policies for better scalability and efficiency.

For example, if agents had a fixed resource quota with predefined CPU and memory allocations, users relying on a single agent for multiple concurrent operations would have to manually update cluster configurations to accommodate increased workloads. With this enhancement, users can now configure parameters such as the number of workers for the engine agent, CPU limits, memory limits, concurrency limits for CD agent RPC calls, tolerations, node selectors, and affinity settings centrally from the controller, simplifying management and improving efficiency.

This feature is supported via API, Terraform, and RCTL.

Parameters Supported on Both Docker and Kubernetes Agents

  • CPU Limits: Defines the CPU resources allocated to the CD agent pod, ensuring optimal performance based on workload demands
  • Memory Limits: Specifies the memory allocation for the CD agent pod to prevent resource exhaustion and maintain stability
  • Number of Workers for the Engine Agent: Applies specifically to the Environment Manager. Each agent provisions up to 10 workers (pods) by default to handle environment management activities. Users can configure this value beyond 10 to scale operations efficiently, but the minimum remains 10
  • Concurrency Limits for CD Agent RPC Calls: Adjusts the number of concurrent RPC calls the CD agent can handle. Increasing this limit can enhance performance when processing multiple requests simultaneously

Parameters Supported on Kubernetes Agents Only

  • Tolerations: Enables scheduling on tainted nodes
  • Node Selector: Limits scheduling to specific labeled nodes
  • Affinity: Controls pod placement on preferred nodes

⚠️ Important: Agent-Level Overrides If Kubernetes options such as node selectors, tolerations, or affinity are defined at the agent level, they override the corresponding cluster-level system component settings. By default, system component overrides at the cluster level determine scheduling rules for all system resources, including CD agents. If an agent does not have specific node selectors, it follows these cluster-wide rules. However, if node selectors or other scheduling parameters are set for the agent, they take precedence, ensuring the agent is deployed based on its custom configuration.