Best Practices - Environment Manager¶
Environment Execution and Worker Behavior¶
The environment provisioning and deprovisioning lifecycle can be divided into three high-level stages:
-
Activity Generation: The Controller identifies the required types and number of activities
-
Agent Interaction: Agents continually poll the Controller for executable activities. The Controller filters and assigns eligible activities based on the agent-template association and configured worker limit
-
Agent Execution: Agent spawns a new pod per activity. Activities execute sequentially or in parallel, per template definition. Once complete, pods are terminated and resources are released. Agent continues polling and queues additional activities as needed
flowchart TD
A[Environment Deploy/Delete] --> B[Activity Generation]
B --> C[Controller determines number & type of activities]
C --> D[Agent Polls Controller]
D --> E[Controller matches activities to agent]
E --> G[Agent spawns pod per activity]
G --> I[Execute activities sequentially or in parallel based on template configuration]
I --> K[Terminate Pods & Release Resources]
GitOps agent provisions workers (pods) to execute environment-related activities. By default, each agent can spin up to 10 workers as needed.
- Each worker handles one activity at a time
- A single environment deployment may generate multiple parallel activities
- Agents scale workers dynamically, launching them only when needed, based on load
Sizing¶
Default Resource Requests and Limits per Worker:¶
Worker Type | CPU | Memory |
---|---|---|
Git Worker | 250m | 512Mi |
OpenTofu Worker | 500m | 1Gi |
Requests and limits are identical for each worker type.
- For sizing guidance, it is recommended to follow the default OpenTofu worker specifications
- If custom drivers are used, worker pod resource sizing will be determined by the driver configuration
Capacity Planning & Scaling¶
Cluster/Infrastructure Capacity¶
- Ensure adequate resources in the Kubernetes cluster or Docker container where the agent is running
- Include an additional 10–20% as buffer
Scaling Recommendations¶
- Configure a relatively high maximum worker count, based on analysis of current environment activity
- Enable autoscaling at the infrastructure level (e.g., Karpenter or Kubernetes auto-scaler, EC2 autoscaling)
- Deploy agents in a dedicated node group or a management cluster
- Use separate agents for:
- GitOps System Sync and add-on artifact pulls
- Environment Manager activity execution
- For high-usage teams, provision dedicated agents
- Plan for one pair of agents per 100 parallel activities expected at peak
Sizing Calculations¶
To ensure efficient execution of activities during environment provisioning and deprovisioning, it's critical to size your agents appropriately. Here's a recommended approach to estimate agent capacity and determine the number of agents required, with built-in buffers and scheduling considerations.
flowchart TD
A[Start Agent Sizing Process] --> B[Step 1: Analyze Templates]
B --> C[Determine max parallel activities]
C --> D[Step 2: Estimate concurrent environments]
D --> E[Calculate total required workers]
E --> F[Step 3: Add buffer and determine agents]
F --> G[Optionally add extra agent for redundancy]
G --> H[Step 4: Evaluate republish frequency/stagger deployments]
H --> L[Step 5: Define autoscaling and placement]
L --> M[Finalize agent capacity plan]
Step 1: Evaluate Maximum Parallel Activities per Template¶
- For each Environment Template (ET), assess how many Resource Templates (RTs) can be executed in parallel
- For example, if Template ET1 has 5 RTs ( RT1, RT2, RT3, RT4, RT5) and RT5 depends on RT1, then RT1-RT4 can execute in parallel. This means that there can be a max of 4 parallel activities for ET1
Step 2: Determine Total Parallel Activities Across Environments¶
- Estimate how many environments will run concurrently per template
- Multiply the number of parallel activities per template by the number of concurrent environments
- For example, ET1: 3 parallel activities per environment × 10 environments = 30 activities; ET2: 1 sequential activity per environment × 10 environments = 10 activities; Total Required Workers = 30 + 10 = 40 workers
Step 3: Plan Agent Count and Buffers¶
- Add a buffer margin (recommended: 25%) for resilience
- Divide total worker requirement among agents
- For example, 40 required workers + 25% buffer = 50 workers; Deploy 2 agents with 25 workers each; Optionally deploy a 3rd agent for redundancy, ensuring fault tolerance and avoiding bottlenecks
Step 4: Consider Deployment Frequency and Staggering¶
- Frequent re-publishing of environments can create load spikes
- To mitigate this, use environment scheduling “jitter” to randomize deployment times or configure custom cron expressions to stagger deployments and avoid simultaneous activity bursts
Step 5: Define Auto-scaling Strategy and Pod Placement Policy¶
- Determine the right autoscaling strategy at the infrastructure level (e.g., Karpenter or Kubernetes auto-scaler, EC2 autoscaling)
- Configure the Pod Placement Policy accordingly
High Availability & Redundancy¶
To maintain reliability and ensure failover:
- Configure at least two agents per environment/environment template
- If one agent is at full capacity or becomes unavailable, the other will automatically take over to continue resource processing
Pod Placement & Scheduling Behavior¶
Agent and Worker Pod Placement:¶
-
By default, GitOps agents inherit the System Components Placement settings that is part of the cluster configuration (e.g., node selectors, tolerations, affinity)
-
If placement settings are defined at the agent level, they override cluster-level configurations. This allows precise control over where agents and their workers are scheduled
-
If a custom driver is used, its configuration takes precedence in determining worker pod placement and resource sizing
Agent Configuration Precedence¶
Agents can be configured at multiple levels:
- Within the resource template
- Within the environment template
- During environment provisioning
When multiple configurations are defined, the following precedence order applies:
- Agent specified in the resource template (highest priority)
- Agent defined in the environment template
- Agent set at the environment level (lowest priority)