Best Practices - Environment Manager¶

Environment Execution and Worker Behavior¶

The environment provisioning and deprovisioning lifecycle can be divided into three high-level stages:

Activity Generation: The Controller identifies the required types and number of activities
Agent Interaction: Agents continually poll the Controller for executable activities. The Controller filters and assigns eligible activities based on the agent-template association and configured worker limit
Agent Execution: Agent spawns a new pod per activity. Activities execute sequentially or in parallel, per template definition. Once complete, pods are terminated and resources are released. Agent continues polling and queues additional activities as needed

flowchart TD
    A[Environment Deploy/Delete] --> B[Activity Generation]
    B --> C[Controller determines number & type of activities]
    C --> D[Agent Polls Controller]
    D --> E[Controller matches activities to agent]
    E --> G[Agent spawns pod per activity]
    G --> I[Execute activities sequentially or in parallel based on template configuration]
    I --> K[Terminate Pods & Release Resources]

GitOps agent provisions workers (pods) to execute environment-related activities. By default, each agent can spin up to 10 workers as needed.

Each worker handles one activity at a time
A single environment deployment may generate multiple parallel activities
Agents scale workers dynamically, launching them only when needed, based on load

Sizing¶

Default Resource Requests and Limits per Worker:¶

Worker Type	CPU	Memory
Git Worker	250m	512Mi
OpenTofu Worker	500m	1Gi

Requests and limits are identical for each worker type.

For sizing guidance, it is recommended to follow the default OpenTofu worker specifications
If custom drivers are used, worker pod resource sizing will be determined by the driver configuration

Capacity Planning & Scaling¶

Cluster/Infrastructure Capacity¶

Ensure adequate resources in the Kubernetes cluster or Docker container where the agent is running
Include an additional 10–20% as buffer

Scaling Recommendations¶

Configure a relatively high maximum worker count, based on analysis of current environment activity
Enable autoscaling at the infrastructure level (e.g., Karpenter or Kubernetes auto-scaler, EC2 autoscaling)
Deploy agents in a dedicated node group or a management cluster
Use separate agents for:
- GitOps System Sync and add-on artifact pulls
- Environment Manager activity execution
For high-usage teams, provision dedicated agents
Plan for one pair of agents per 100 parallel activities expected at peak

Sizing Calculations¶

To ensure efficient execution of activities during environment provisioning and deprovisioning, it's critical to size your agents appropriately. Here's a recommended approach to estimate agent capacity and determine the number of agents required, with built-in buffers and scheduling considerations.

flowchart TD
    A[Start Agent Sizing Process] --> B[Step 1: Analyze Templates]
    B --> C[Determine max parallel activities]
    C --> D[Step 2: Estimate concurrent environments]
    D --> E[Calculate total required workers]
    E --> F[Step 3: Add buffer and determine agents]
    F --> G[Optionally add extra agent for redundancy]
    G --> H[Step 4: Evaluate republish frequency/stagger deployments]
    H --> L[Step 5: Define autoscaling and placement]   
    L --> M[Finalize agent capacity plan]

Step 1: Evaluate Maximum Parallel Activities per Template¶

For each Environment Template (ET), assess how many Resource Templates (RTs) can be executed in parallel
For example, if Template ET1 has 5 RTs ( RT1, RT2, RT3, RT4, RT5) and RT5 depends on RT1, then RT1-RT4 can execute in parallel. This means that there can be a max of 4 parallel activities for ET1

Step 2: Determine Total Parallel Activities Across Environments¶

Estimate how many environments will run concurrently per template
Multiply the number of parallel activities per template by the number of concurrent environments
For example, ET1: 3 parallel activities per environment × 10 environments = 30 activities; ET2: 1 sequential activity per environment × 10 environments = 10 activities; Total Required Workers = 30 + 10 = 40 workers

Step 3: Plan Agent Count and Buffers¶

Add a buffer margin (recommended: 25%) for resilience
Divide total worker requirement among agents
For example, 40 required workers + 25% buffer = 50 workers; Deploy 2 agents with 25 workers each; Optionally deploy a 3^rd agent for redundancy, ensuring fault tolerance and avoiding bottlenecks

Step 4: Consider Deployment Frequency and Staggering¶

Frequent re-publishing of environments can create load spikes
To mitigate this, use environment scheduling “jitter” to randomize deployment times or configure custom cron expressions to stagger deployments and avoid simultaneous activity bursts

Step 5: Define Auto-scaling Strategy and Pod Placement Policy¶

Determine the right autoscaling strategy at the infrastructure level (e.g., Karpenter or Kubernetes auto-scaler, EC2 autoscaling)
Configure the Pod Placement Policy accordingly

High Availability & Redundancy¶

To maintain reliability and ensure failover:

Configure at least two agents per environment/environment template
If one agent is at full capacity or becomes unavailable, the other will automatically take over to continue resource processing

Pod Placement & Scheduling Behavior¶

Agent and Worker Pod Placement:¶

By default, GitOps agents inherit the System Components Placement settings that is part of the cluster configuration (e.g., node selectors, tolerations, affinity)
If placement settings are defined at the agent level, they override cluster-level configurations. This allows precise control over where agents and their workers are scheduled
If a custom driver is used, its configuration takes precedence in determining worker pod placement and resource sizing

Agent Configuration Precedence¶

Agents can be configured at multiple levels:

Within the resource template
Within the environment template
During environment provisioning

When multiple configurations are defined, the following precedence order applies:

Agent specified in the resource template (highest priority)
Agent defined in the environment template
Agent set at the environment level (lowest priority)