Skip to content

Best Practices - Environment Manager

Environment Execution and Worker Behavior

The environment provisioning and deprovisioning lifecycle can be divided into three high-level stages:

  • Activity Generation: The Controller identifies the required types and number of activities

  • Agent Interaction: Agents continually poll the Controller for executable activities. The Controller filters and assigns eligible activities based on the agent-template association and configured worker limit

  • Agent Execution: Agent spawns a new pod per activity. Activities execute sequentially or in parallel, per template definition. Once complete, pods are terminated and resources are released. Agent continues polling and queues additional activities as needed

flowchart TD
    A[Environment Deploy/Delete] --> B[Activity Generation]
    B --> C[Controller determines number & type of activities]
    C --> D[Agent Polls Controller]
    D --> E[Controller matches activities to agent]
    E --> G[Agent spawns pod per activity]
    G --> I[Execute activities sequentially or in parallel based on template configuration]
    I --> K[Terminate Pods & Release Resources]

GitOps agent provisions workers (pods) to execute environment-related activities. By default, each agent can spin up to 10 workers as needed.

  • Each worker handles one activity at a time
  • A single environment deployment may generate multiple parallel activities
  • Agents scale workers dynamically, launching them only when needed, based on load

Sizing

Default Resource Requests and Limits per Worker:

Worker Type CPU Memory
Git Worker 250m 512Mi
OpenTofu Worker 500m 1Gi

Requests and limits are identical for each worker type.

  • For sizing guidance, it is recommended to follow the default OpenTofu worker specifications
  • If custom drivers are used, worker pod resource sizing will be determined by the driver configuration

Capacity Planning & Scaling

Cluster/Infrastructure Capacity

  • Ensure adequate resources in the Kubernetes cluster or Docker container where the agent is running
  • Include an additional 10–20% as buffer

Scaling Recommendations

  • Configure a relatively high maximum worker count, based on analysis of current environment activity
  • Enable autoscaling at the infrastructure level (e.g., Karpenter or Kubernetes auto-scaler, EC2 autoscaling)
  • Deploy agents in a dedicated node group or a management cluster
  • Use separate agents for:
    • GitOps System Sync and add-on artifact pulls
    • Environment Manager activity execution
  • For high-usage teams, provision dedicated agents
  • Plan for one pair of agents per 100 parallel activities expected at peak

Sizing Calculations

To ensure efficient execution of activities during environment provisioning and deprovisioning, it's critical to size your agents appropriately. Here's a recommended approach to estimate agent capacity and determine the number of agents required, with built-in buffers and scheduling considerations.

flowchart TD
    A[Start Agent Sizing Process] --> B[Step 1: Analyze Templates]
    B --> C[Determine max parallel activities]
    C --> D[Step 2: Estimate concurrent environments]
    D --> E[Calculate total required workers]
    E --> F[Step 3: Add buffer and determine agents]
    F --> G[Optionally add extra agent for redundancy]
    G --> H[Step 4: Evaluate republish frequency/stagger deployments]
    H --> L[Step 5: Define autoscaling and placement]   
    L --> M[Finalize agent capacity plan]

Step 1: Evaluate Maximum Parallel Activities per Template

  • For each Environment Template (ET), assess how many Resource Templates (RTs) can be executed in parallel
  • For example, if Template ET1 has 5 RTs ( RT1, RT2, RT3, RT4, RT5) and RT5 depends on RT1, then RT1-RT4 can execute in parallel. This means that there can be a max of 4 parallel activities for ET1

Step 2: Determine Total Parallel Activities Across Environments

  • Estimate how many environments will run concurrently per template
  • Multiply the number of parallel activities per template by the number of concurrent environments
  • For example, ET1: 3 parallel activities per environment × 10 environments = 30 activities; ET2: 1 sequential activity per environment × 10 environments = 10 activities; Total Required Workers = 30 + 10 = 40 workers

Step 3: Plan Agent Count and Buffers

  • Add a buffer margin (recommended: 25%) for resilience
  • Divide total worker requirement among agents
  • For example, 40 required workers + 25% buffer = 50 workers; Deploy 2 agents with 25 workers each; Optionally deploy a 3rd agent for redundancy, ensuring fault tolerance and avoiding bottlenecks

Step 4: Consider Deployment Frequency and Staggering

  • Frequent re-publishing of environments can create load spikes
  • To mitigate this, use environment scheduling “jitter” to randomize deployment times or configure custom cron expressions to stagger deployments and avoid simultaneous activity bursts

Step 5: Define Auto-scaling Strategy and Pod Placement Policy

  • Determine the right autoscaling strategy at the infrastructure level (e.g., Karpenter or Kubernetes auto-scaler, EC2 autoscaling)
  • Configure the Pod Placement Policy accordingly

High Availability & Redundancy

To maintain reliability and ensure failover:

  • Configure at least two agents per environment/environment template
  • If one agent is at full capacity or becomes unavailable, the other will automatically take over to continue resource processing

Pod Placement & Scheduling Behavior

Agent and Worker Pod Placement:

  • By default, GitOps agents inherit the System Components Placement settings that is part of the cluster configuration (e.g., node selectors, tolerations, affinity)

  • If placement settings are defined at the agent level, they override cluster-level configurations. This allows precise control over where agents and their workers are scheduled

  • If a custom driver is used, its configuration takes precedence in determining worker pod placement and resource sizing


Agent Configuration Precedence

Agents can be configured at multiple levels:

  • Within the resource template
  • Within the environment template
  • During environment provisioning

When multiple configurations are defined, the following precedence order applies:

  1. Agent specified in the resource template (highest priority)
  2. Agent defined in the environment template
  3. Agent set at the environment level (lowest priority)