Rafay Product Documentation

Home
Home
- Index
- Overview
  Overview
- Automation
  Automation
  - Overview
  - CLI
    CLI
    
    Overview
    
    Setup
    
    Commands
    Commands
    
    Access Reports
    
    AddOns
    
    Agents
    
    Backup
    
    Blueprints
    
    Catalog
    
    Clusters
    
    Cloud Credentials
    
    Custom ZTKA Access
    
    Environment Manager
    
    Groups
    
    IdP/SSO
    
    Namespaces
    
    Network Policy
    
    Overrides
    
    Overrides Schema
    
    Pipelines
    
    Policy
    
    Projects
    
    Registry
    
    Repository
    
    RBAC
    
    Secret Groups
    
    Secret Stores
    
    Templating
    
    Trigger
    
    Groups
    
    Workloads
    
    Legacy
    Legacy
    
    Overview
    
    Blueprints
    
    Addons
    
    Agents
    
    Clusters
    
    Credentials
    
    Namespaces
    
    Overrides
    
    Pipeline
    
    Projects
    
    Repository
    
    Trigger
    
    Workloads
  - Terraform Provider
  - APIs
    APIs
    
    Overview
    
    Security
    
    Self-Service Portals
- Clusters
  Clusters
  - Home
  - Overview
  - Metadata
    Metadata
    
    Location
    
    Cluster Labels
    
    Node Labels
    
    Node Taints
    
    Health
  - Amazon EKS
    Amazon EKS
    
    Overview
    
    Supported Environments
    
    Pre-requisites
    Pre-requisites
    
    Credentials
    Credentials
    
    Cloud Credentials
    
    IAM Policy & Role Creation in AWS
    
    CNI Providers
    
    VPC Networking
    VPC Networking
    
    Overview
    
    Secondary CIDR with VPC
    
    Custom AWS CNI
    
    IAM Policy
    IAM Policy
    
    Overview
    
    Full
    
    Customer-Managed VPC
    
    Customer-Managed VPC & IAM
    
    Restricted IAM Policies on Tags
    
    Restricted IAM Policy on ARN
    
    Restricted IAM Policies on VPC & Tags
    
    Service Linked IAM Role
    
    EKS Add-Ons
    EKS Add-Ons
    
    Managed Add-Ons
    
    EKS Pod Identity Associations
    
    Cluster Configuration
    Cluster Configuration
    
    V3 Config Schema
    
    V1 Config Schema
    
    Cluster Config
    
    AWS Tags
    
    IAM
    IAM
    
    IAM Service Accounts
    IAM Service Accounts
    
    Overview
    
    CLI for IRSA
    
    Identity Mapping
    
    Cross Account ARN
    
    Clusters
    Clusters
    
    Control Plane
    
    Provision
    
    Cluster with IPv6 Configuration
    
    Convert to Managed
    
    Secret Encryption
    
    Day-2 Operations
    
    Nodegroups
    Nodegroups
    
    Overview
    
    Custom AMI
    
    Wavelength Zone
    
    Spot Instances
    
    Node Labels
    
    AWS Tags
    
    FAQs
    
    Automation
    Automation
    
    CLI
    
    API
    API
    
    Cluster API
    
    GitOps
    GitOps
    
    Overview
    
    Examples
    
    Terraform Provider
    
    RBAC based KubeCTL
    
    Upgrades
    Upgrades
    
    Upgrade Strategies
    
    k8s Upgrades
    
    Upgrade Insights
    
    AMI Upgrades
    
    Observability
    Observability
    
    Visibility and Monitoring
    
    Audit
    
    Deprovision
    
    Fleet Operations
    
    Diagnose
    Diagnose
    
    Best Practices
    
    FAQs
    
    Troubleshooting
  - Azure AKS
    Azure AKS
    
    Overview
    
    Supported Environments
    
    Pre-requisites
    Pre-requisites
    
    Credentials
    
    Azure Setup
    
    AKS Addons
    
    V1 Config Schema
    
    V3 Config Schema
    
    Restricted Roles & Identities
    
    Clusters
    Clusters
    
    Provision
    
    Azure CNI Overlay
    
    Convert to Managed
    
    Workload Identity
    Workload Identity
    
    Workload Identity Overview & Configuration
    
    Known Issues
    
    Day-2 Operations
    
    Start/Stop Clusters
    
    Nodepools
    Nodepools
    
    Node Labels
    
    Spot Price
    
    Automation
    Automation
    
    Overview
    
    GitOps
    GitOps
    
    Overview
    
    Examples
    
    K8s Upgrades
    
    Observability
    Observability
    
    Visibility and Monitoring
    
    Audit
    
    Deprovision
    
    Fleet Operations
    
    Troubleshooting
  - Bare Metal/VM
    Bare Metal/VM
    
    Approaches
    
    Overview
    
    Supported Environments
    
    Configuration
    
    Preflight Checks
    
    Provisioning
    
    Installer Certificate TTL (Conjurer)
    
    Config Schema
    
    Extended Config Schema (Recommended)
    
    Master Nodes
    
    Worker Nodes
    
    Non-UI Interfaces
    Non-UI Interfaces
    
    GitOps
    
    CLI
    
    API
    
    Day-2 Operations
    
    Kubernetes Access
    
    Kubernetes Upgrades
    
    Node OS Upgrades
    
    Certificate Rotation
    
    Deprovision
    
    Troubleshooting
    
    Retry and Backoff
    
    Reset Node
    
    Storage
    Storage
    
    Overview
    
    Add Storage
    
    Zero Trust Host Access
    Zero Trust Host Access
    
    Overview
    
    Examples
    Examples
    
    Single Command-Node
    
    Multiple Command-Node
    
    Command-Cluster
    
    Command History
    
    Knowledge Base Articles
  - Edge
    Edge
    
    Overview
    
    Simulator
  - Equinix Metal
    Equinix Metal
    
    Overview
    
    Provision Servers
    
    Provision Kubernetes
  - Google GKE
    Google GKE
    
    Overview
    
    Supported Environments
    
    GCP Configuration
    
    Credentials
    
    Clusters
    Clusters
    
    Provisioning
    
    Shared VPC Network
    
    GPU Config
    
    Reservation Affinity
    
    Provisioning Explained
    
    Day-2 Operations
    
    preBootstrapCommands
    
    Automation
    Automation
    
    API
    
    CLI
    
    V3 API Config Schema
    
    V2 API Config Schema
    
    Scale Nodes
    
    Upgrade K8s
    
    Deprovision
    
    Troubleshooting
  - Imported
    Imported
    
    Overview
    
    Cluster Import Wizard
    
    Declarative
    
    Analysis
    
    Customization
    
    Import Failures
    
    Remove Operator
    
    EKS Add-on
    
    Fleet Operations
    
    Troubleshooting
  - Nutanix
    Nutanix
    
    Overview
  - Open Stack
    Open Stack
    
    Overview
    
    Provision
    
    Deprovision
    
    Lifecycle
    
    FAQ
  - RedHat OpenShift
    RedHat OpenShift
    
    Overview
    
    Provision
    
    Import
    
    Blueprints
    
    Dashboards
  - Virtual Appliance
    Virtual Appliance
    
    Overview
    
    Provision
    
    Deprovision
    
    Lifecycle
    
    vSphere Example
    
    SSH Example
- Fleet Operations
  Fleet Operations
  - Overview
  - Create Plan
  - Automation
  - Config Samples
  - Reference Implementation
    Reference Implementation
    
    Amazon EKS
    Amazon EKS
    
    EKS-1.23
    
    EKS-1.24
  - Troubleshooting
- Multi Tenancy
  Multi Tenancy
  - Overview
  - Hard Tenancy
  - Projects
    Projects
    
    Hard Tenancy
    
    Description
    
    Project Tags
    
    Resource Quotas
    
    Cluster Sharing
    
    CLI
  - Soft Tenancy
    Soft Tenancy
    
    Workspace Role
    
    Namespace
    Namespace
    
    Overview
    
    Management
    
    Reconciliation
    
    CLI
    
    Namespace Schema
- Services
  Services
  - Overview
  - Backup and Restore
    Backup and Restore
    
    Overview
    
    API
    
    CLI
    
    Backup Location
    Backup Location
    
    Overview
    
    AWS S3 Bucket
    
    Azure Blob Storage
    
    S3 Compatible Storage
    
    Credentials
    Credentials
    
    Overview
    
    AWS
    AWS
    
    Credentials
    
    Backup & Restore using IRSA
    
    Azure
    
    S3 Compatible
    
    Data Agent
    
    Backup Policy
    
    Backup Job
    
    Restore Policy
    
    Restore Job
    
    Considerations
  - Blueprints
    Blueprints
    
    Overview
    
    Custom Add-Ons
    
    Managed Add-Ons
    Managed Add-Ons
    
    Overview
    
    Ingress Controller
    Ingress Controller
    
    Background
    
    Managed Ingress
    
    Critical Add-ons
    
    Blueprint Types
    Blueprint Types
    
    Default System Blueprints
    Default System Blueprints
    
    Overview
    
    Minimal Blueprint
    
    Standard Default Blueprint
    
    Default AKS
    
    Default GKE
    
    Default Openshift
    
    Default Upstream
    
    Custom and Golden Blueprints
    Custom and Golden Blueprints
    
    Custom Blueprint
    
    Golden Blueprint
    
    Cluster Fleet Management
    
    Sharing
    
    Cluster Overrides
    Cluster Overrides
    
    Overview
    
    Workflow
    
    Customization
    
    Built-in Variables
    
    Sharing Overrides
    
    Update Blueprint
    
    Pod Security Policy (EOL)
    
    Blueprint Schema
    
    CLI
    CLI
    
    Blueprint CLI
    
    AddOns
    
    API
    
    Troubleshooting
  - Catalog
    Catalog
    
    Overview
    
    Manage Catalogs
    
    Catalog
  - Cost Management
    Cost Management
    
    Overview
    
    Cost Profiles
    
    Cloud Credentials
    
    AWS Integration
    
    Azure Integration
    
    GCP Integration
    
    Visibility
    
    Chargeback/Showback
    
    Explorer
    
    CLI
    CLI
    
    Profiles
    
    Chargeback Groups
  - GitOps (Apps & Infra)
    GitOps (Apps & Infra)
    
    Overview
    
    Benefits
    
    Pipelines
    
    Stages
    Stages
    
    Overview
    
    Approval
    
    Deploy Workload
    
    Infra Provisioner
    Infra Provisioner
    
    Overview
    
    CLI
    
    System Sync
    
    System Sync (Best Practices)
    
    Workload Template
    
    Triggers
    Triggers
    
    Overview
    
    Troubleshooting
    
    Secret Groups
    Secret Groups
    
    Pipeline Secret Groups
    
    Secret Groups
    
    Agents
  - Network Policy
    Network Policy
    
    Background
    
    Overview
    
    Installation Profiles
    
    Network Policy Rules
    Network Policy Rules
    
    Overview
    
    Cluster-Wide Network Policy Rules
    
    Namespace Network Policy Rules
    
    Cluster-Wide Network Policies
    
    Namespace Network Policies
    
    Network Policy
  - Policy Mgmt
    Policy Mgmt
    
    Overview
    
    Installation Profiles
    
    Constraint Templates
    
    Constraints
    
    Policies
    
    Policy Violations
    
    Visibility
    
    Policy
  - Secrets Management
    Secrets Management
    
    AWS Secrets Manager
    AWS Secrets Manager
    
    Secrets Store Add-on
    
    Secret Provider Classes
    
    Configure IRSA
    
    Annotations
    
    CLI
    
    HashiCorp Vault
    HashiCorp Vault
    
    Overview
    
    Configure Vault
    
    Use Vault-Helm/YAML
    Use Vault-Helm/YAML
    
    ENV Variables
    
    Files
    
    Use Vault-Wizard
    
    Sealers
    Sealers
    
    Secret Sealer
    
    Use Secret Sealer
  - Visibility & Monitoring
    Visibility & Monitoring
    
    Visibility
    Visibility
    
    Overview
    
    Organization
    
    Projects
    
    Cluster
    
    My Clusters
    
    Nodes
    
    Kubernetes Resources
    Kubernetes Resources
    
    View/Edit/Delete
    
    Create
    
    Kubernetes Events
    
    Pod Dashboard
    
    Container Dashboard
    
    Configuration
    
    GPU Dashboard
    
    Monitoring
    Monitoring
    
    Overview
    
    Alerts
    
    Notifications
    
    Custom Metrics HPA
  - Zero Trust Kubectl
    Zero Trust Kubectl
    
    Background
    
    Overview
    
    KubeCTL
    KubeCTL
    
    Browser
    
    KubeCTL CLI
    
    Configuration
    
    RBAC
    
    Audit Trail
    
    FAQ
  - Rafay CoPilot
    Rafay CoPilot
    
    Overview
    
    FAQ
- App Deployments
  App Deployments
  - Overview
  - Kubectl
  - Helm
  - MySQL
  - Workloads
    Workloads
    
    Overview
    
    Helm Charts
    
    k8s YAML
    
    Registry
    Registry
    
    Overview
    
    System Registry
    
    Repositories
    Repositories
    
    Overview
    
    Public Repos
    
    Private Repos
    
    Lifecycle
    
    Agents
    
    Wizard
    Wizard
    
    Overview
    
    Ingress
    
    DNS based GSLB
    
    Containers
    
    Container Registry
    
    Upgrade Strategy
    
    Storage
    
    Policy
    
    Publish
    
    Certificate
    Certificate
    
    Overview
    
    New Certificate
    
    Cluster Overrides
    Cluster Overrides
    
    Workflow
    
    Share Override
    
    CLI
    
    Zero Trust Debug
    Zero Trust Debug
    
    Overview
    
    Developer Tools
    
    Continuous Integration
    Continuous Integration
    
    Overview
    
    Common Patterns
    
    Jenkins
    Jenkins
    
    Overview
    
    Workload Basics
    
    Workload Wizard
    
    Helm Workloads
    
    YAML Workloads
    
    Provision Upstream k8s
    
    Provision Amazon EKS
    
    CircleCI
    
    GitLab
    
    Azure DevOps
  - Integrated GitOps
  - 3rd Party GitOps
    3rd Party GitOps
    
    ArgoCD
- Backstage
  Backstage
  - Overview
  - Workflow
  - Setup
  - Templates
    Templates
    
    Environments
    Environments
    
    Create
    
    Clusters
    Clusters
    
    Create
    
    Register Existing
    
    Edit Template
    
    Namespaces
    Namespaces
    
    Create
    
    Register Existing
    
    Edit Template
    
    Workloads
    Workloads
    
    Create
  - Entity Cards
  - Delete Plugins
- Environment Manager
  Environment Manager
  - Overview
  - Workflow
  - Visibility
  - Non-UI Interfaces
    Non-UI Interfaces
    
    Overview
    
    CLI
    
    GitOps
  - Templates
    Templates
    
    Contexts
    
    Resource Template
    Resource Template
    
    Providers
    
    Create
    
    Environment Template
    Environment Template
    
    Create
    
    Schedules
    
    Drivers
    Drivers
    
    Create Drivers
    
    Function Drivers
    
    Configuration Parameters
    Configuration Parameters
    
    Expressions
    
    Volume
    
    Static Resource
    
    Example Templates
  - Environment
    Environment
    
    Create
    
    Environment Schedules
  - RBAC
  - Cost Estimation
  - Security Scanning
  - HCP Terraform integration
  - Loader Utility
- GPU PaaS
  GPU PaaS
  - Overview
  - Background
  - Multi Tenancy
    Multi Tenancy
    
    Overview
    
    Controls
    Controls
    
    Isolated Containers
    
    Network Policy
    
    Cluster Policy
    
    Role based Access
    
    Secure Remote Access
    
    Resource Quotas
    
    Audit Logging
    
    Cost Allocation
  - Administrators
  - Users
- User Management
  User Management
  - Overview
  - Users
  - MFA
  - Groups
  - CLI
  - Roles
    Roles
    
    Base Roles
    
    Custom ZTKA access
    Custom ZTKA access
    
    Overview
    
    Rules
    
    Policies
    
    Attribute based access
    Attribute based access
    
    Overview
    
    Rules
    
    Policies
    
    Examples
    
    Common Scenarios
    
    Custom Roles
  - Single Sign On
    Single Sign On
    
    Overview
    
    ADFS
    
    AWS SSO
    
    Azure AD
    
    Duo SSO
    
    Google Workspace
    
    KeyCloak
    
    Okta
    
    Ping One
    
    IdP/SSO
    
    Webhooks
  - Multiple Orgs
  - Break Glass Access
    Break Glass Access
    
    Overview
    
    UI
    
    CLI
- Security
  Security
  - Overview
  - White Listing
  - Access Reports
    Access Reports
    
    Overview
    
    UI
    
    CLI
  - Audit Logging
  - Audit Log Aggregation
    Audit Log Aggregation
    
    Overview
    
    CloudWatch
    
    DataDog
    
    Splunk
    
    SumoLogic
  - Compliance
  - Vulnerabilities
  - CIS Benchmark
  - Contact
- Self Hosted Controller
  Self Hosted Controller
  - Overview
  - Architecture
  - Installation
    Installation
    
    Self-hosted Controller on EKS
    
    Self-hosted Controller on GKE
    
    Self-hosted Controller on Baremetal/VM
    
    Installation using Helm Chart
    Installation using Helm Chart
    
    Installation
    
    Configuration
  - ConfigBuilder CLI Tool
    ConfigBuilder CLI Tool
    
    Configuration
    
    Input Parameters
  - Upgrade
  - Backup & Restore
- Support Matrix
  Support Matrix
- Partners
  Partners
AI/ML
AI/ML
- Overview
- MLOps Platform
  MLOps Platform
  - Overview
  - Unique Capabilities
  - Features
  - Benefits
  - Support Matrix
  - GCP
    GCP
    
    Design
    
    Requirements
    
    Integrations
    
    Costs
    
    Setup
    
    Configure
    
    Deployment
    
    Day 2
    
    Destroy
    
    Troubleshoot
  - Integrations
    Integrations
    
    Google BigQuery
    
    Google Cloud Storage
  - User Guide
    User Guide
    
    Overview
    
    Workspace
    
    Auto ML
    Auto ML
    
    Overview
    
    Katib
    
    Feature Store
    Feature Store
    
    Overview
    
    Feast
    
    Operations
    
    Model Visualization
    Model Visualization
    
    Overview
    
    Concepts
    
    Considerations
    
    PyTorch
    
    Notebooks
    Notebooks
    
    Overview
    
    Lifecycle
    
    Images
    
    Build Images in Notebook
    
    Troubleshooting
    
    Pipelines
    Pipelines
    
    Overview
    
    Distributed Training
    Distributed Training
    
    Overview
    
    Administrators
    
    User
    
    Model Registry
    Model Registry
    
    Overview
    
    Operations
  - Get Started
    Get Started
    
    Overview
    
    Basic Pipeline
    Basic Pipeline
    
    Overview
    
    Iris Dataset
    
    MLOps Pipeline
    
    Deep Learning Pipeline
    Deep Learning Pipeline
    
    Overview
    
    Titanic Dataset
    
    DL MLOps Pipeline
    
    Katib (AutoML)
    Katib (AutoML)
    
    Overview
    
    Explainer
    
    AutoML
    
    Training in Notebook
    Training in Notebook
    
    PyTorch Basic
    
    PyTorch Distributed
    
    TensorFlow
- Ray as a Service
  Ray as a Service
  - Overview
  - Benefits
  - Support Matrix
  - Design
    Design
    
    Design
    
    Ray
    
    kubeRay
    
    Custom Scheduler
  - Administration
    Administration
    
    Setup
    
    Configure
    
    Onboard New User
    
    Monitor
    
    Troubleshoot
  - Users
    Users
    
    Launch
    
    Troubleshoot
    
    Use
    
    Ray Dashboard
    
    Ray Serve
  - Get Started
    Get Started
    
    Overview
    
    Simple
    
    Request GPU
    
    Batch
    
    Distributed Training
    Distributed Training
    
    PyTorch
    
    TensorFlow
- Generative AI
  Generative AI
  - Overview
Get Started
Get Started
- Overview
- Kubernetes Mgmt
- Access Control
  Access Control
  - IDP RBAC
    IDP RBAC
    
    Overview
    
    Alerts
    
    Notifications
- Alerts & Notifications
  Alerts & Notifications
  - Alerts
  - Notifications
- Amazon EKS
  Amazon EKS
  - Home
  - Backup/Restore
    Backup/Restore
    
    Overview
    
    Part 1: Setup Environment
    
    Part 2: Create Resources
    
    Part 3: Backup/Restore
  - Blue/Green Upgrade
    Blue/Green Upgrade
    
    Overview
    
    Part 1: Setup
    
    Part 2: Provision
    
    Part 3: Workload
    
    Part 4: Deprovision
  - Cluster Lifecycle
    Cluster Lifecycle
    
    Overview
    
    Prerequisites
    
    Part 1: Provision
    
    Part 2: Scale
    
    Part 3: Node Group
    
    Part 4: Upgrade
    
    Part 5: Deprovision
  - CloudWatch
    CloudWatch
    
    Overview
    
    Part 1: Setup
    
    Part 2: Provision
    
    Part 3: Blueprint
    
    Part 4: Deprovision
  - Cluster Autoscaler
    Cluster Autoscaler
    
    Overview
    
    Part 1: Setup
    
    Part 2: Blueprint
    
    Part 3: Provision
    
    Part 4: Workload
    
    Part 5: Deprovision
  - Custom Networking
    Custom Networking
    
    Overview
    
    Provision
    
    Deploy Workload
    
    Deprovision
  - EFS
    EFS
    
    Overview
    
    Part 1: Setup
    
    Part 2: Provision
    
    Part 3: Blueprint
    
    Part 4: Workload
    
    Part 5: Deprovision
  - EKS System Sync
    EKS System Sync
    
    Overview
    
    Part 1: Setup
    
    Part 2: Sync from Git
    
    Part 3: Sync from System
  - Fleet for EKS
    Fleet for EKS
    
    Overview
    
    Part 1: Create & Execute
    
    Part 2: Stop & Delete
  - External DNS
    External DNS
    
    Overview
    
    Part 1: Provision
    
    Part 2: Blueprint
    
    Part 3: Workload
    
    Part 4: Deprovision
  - Fargate
    Fargate
    
    Overview
    
    Provision
    
    Deploy Workload
    
    Deprovision
  - GitOps
    GitOps
    
    Overview
    
    Part 1: Setup
    
    Part 2: Provision
    
    Part 3: Pipeline
    
    Part 4: Utilize
    
    Part 5: Deprovision
  - GPU
    GPU
    
    Overview
    
    Part 1: Setup
    
    Part 2: Provision Part 2: Provision
    Table of contents
    
    What Will You Do
    
    Step 1: Cluster Spec
    
    Cluster Details
    
    Step 2: Provision Cluster
    
    Step 3: Verify Cluster
    
    Step 4: Remove EKS GPU Daemonset
    
    Recap
    
    Part 3: Blueprint
    
    Part 4: Workload
    
    Part 5: Deprovision
  - Graviton
    Graviton
    
    Overview
    
    Provision
    
    Deploy Workload
    
    Deprovision
  - Karpenter
    Karpenter
    
    Overview
    
    Part 1: Setup
    
    Part 2: Provision
    
    Part 3: Blueprint
    
    Part 4: Workload
    
    Part 5: Upgrade
    
    Part 6: Deprovision
  - Secrets Manager
    Secrets Manager
    
    Overview
    
    Part 1: Provision
    
    Part 2: Blueprint
    
    Part 3: Workload
    
    Part 4: Deprovision
  - Spot Instances
    Spot Instances
    
    Overview
    
    Part 1: Provision
    
    Part 2: Deprovision
  - Takeover
    Takeover
    
    Overview
    
    Import & Takeover
    
    Lifecycle Operations
    
    Deprovision
  - Standard Operating Model
    Standard Operating Model
    
    Overview
    
    Part 1: Setup
    
    Part 2: Provision
    
    Part 3: Deprovision
  - Triton
    Triton
    
    Overview
    
    Part 1: Setup
    
    Part 2: Provision
    
    Part 3: Blueprint
    
    Part 4: Workload
    
    Part 5: Deprovision
  - Windows
    Windows
    
    Overview
    
    Part 1: Provision
    
    Part 2: Workload
    
    Part 3: Deprovision
- App Lifecycle
  App Lifecycle
  - Workload Lifecycle
    Workload Lifecycle
    
    Overview
    
    Part 1: YAML
    
    Part 2: Helm
    
    Part 3: Update
  - Multi Stage GitOps Pipeline
    Multi Stage GitOps Pipeline
    
    Overview
    
    Part 1: Setup
    
    Part 2: Deploy
    
    Part 3: Pipeline
    
    Part 4: Update
  - Troubleshooting
    Troubleshooting
    
    Overview
    
    Scenario 1: Misconfigured Requests
    
    Scenario 2: Incorrect Container Image
  - Progressive Rollouts
    Progressive Rollouts
    
    Overview
    
    Blue/Green
    Blue/Green
    
    Overview
    
    Blue/Green
    
    Canary
    Canary
    
    Overview
    
    Canary
- Azure AKS
  Azure AKS
  - Home
  - Backup/Restore
    Backup/Restore
    
    Overview
    
    Part 1: Setup Environment
    
    Part 2: Create Resources
    
    Part 3: Backup/Restore
  - Cluster Lifecycle
    Cluster Lifecycle
    
    Overview
    
    Prerequisites
    
    Part 1: Provision
    
    Part 2: Scale
    
    Part 3: Node Pool
    
    Part 4: Upgrade
    
    Part 5: Deprovision
  - Cluster Takeover
    Cluster Takeover
    
    Overview
    
    Part 1: Provision
    
    Part 2: Deprovision
  - GPU
    GPU
    
    Overview
    
    Part 1: Setup
    
    Part 2: Provision
    
    Part 3: Blueprint
    
    Part 4: Workload
    
    Part 5: Deprovision
  - Standard Operating Model
    Standard Operating Model
    
    Overview
    
    Part 1: Setup
    
    Part 2: Provision
    
    Part 3: Deprovision
- Basics
  Basics
- Blueprints
  Blueprints
  - Blueprint Lifecycle
    Blueprint Lifecycle
    
    Overview
    
    Part 1: Create
    
    Part 2: Update
    
    Part 3: Monitor
  - Add-Ons and Overrides
    Add-Ons and Overrides
    
    Overview
    
    Part 1: Create
    
    Part 2: Utilize
  - Drift Detection
    Drift Detection
    
    Overview
    
    Part 1: Detect
    
    Part 2: Block
  - Namespace Synchronization
    Namespace Synchronization
    
    Overview
    
    Part 1: Create
    
    Part 2: Manage
- Cost Management
  Cost Management
- Environment Manager
  Environment Manager
  - Overview
  - AWS
    AWS
    
    Basics
    Basics
    
    RDS
    RDS
    
    Overview
    
    Setup
    
    Developer Self-Service
    
    ECS
    ECS
    
    Overview
    
    Setup
    
    Provision
  - Azure
    Azure
    
    Basics
    Basics
    
    Overview
    
    Setup
    
    Provision
  - GCP
    GCP
    
    Basics
    Basics
    
    Overview
    
    Setup
    
    Provision
- GitOps
  GitOps
  - AKS System Sync
    AKS System Sync
    
    Overview
    
    Part 1: Setup
    
    Part 2: Provision
    
    Part 3: Deprovision
  - Deployment Strategies
    Deployment Strategies
    
    Overview
    
    Setup
    
    Recreate
    
    Rolling Update
    
    Blue-Green
    
    Canary
  - System Sync
    System Sync
    
    Overview
    
    Part 1: Setup
    
    Part 2: Sync Blueprint
    
    Part 3: Sync Workload
  - EKS System Sync
    EKS System Sync
    
    Overview
    
    Part 1: Setup
    
    Part 2: Sync from Git
    
    Part 3: Sync from System
- Google GKE
  Google GKE
  - Home
  - Cluster Lifecycle
    Cluster Lifecycle
    
    Overview
    
    Part 1: Provision
    
    Part 2: Scale
    
    Part 3: Upgrade
    
    Part 4: Deprovision
  - GKE System Sync
    GKE System Sync
    
    Overview
    
    Part 1: Setup
    
    Part 2: Sync from Git
    
    Part 3: Sync from System
- Kubernetes
  Kubernetes
  - Overview
  - Install MicroK8s
  - Kubernetes 101
    Kubernetes 101
    
    Part 1: Using Namespaces
    
    Part 2: Using Pods
    
    Part 3: Using Deployments
    
    Part 4: Using Services
    
    Part 5: Using Ingress
  - Kubernetes 201
    Kubernetes 201
    
    Part 1: Using ConfigMaps
    
    Part 2: Using Secrets
    
    Part 3: Using PV
    
    Part 4: Using PVC
  - Kubernetes 301
    Kubernetes 301
    
    Deployments, StatefulSets, DaemonSets
    
    Part 1: Using StatefulSets
    
    Part 2: Using DaemonSets
  - Kubernetes 401
    Kubernetes 401
    
    Part 1: Using Port-Forward
- Multi-tenancy
  Multi-tenancy
- OpenShift
  OpenShift
- Policy Management
  Policy Management
  - OPA Gatekeeper
    OPA Gatekeeper
    
    Overview
    
    Part 1: Setup
    
    Part 2: Policy
    
    Part 3: Blueprint
    
    Part 4: Workload
  - Turnkey OPA Policies
    Turnkey OPA Policies
    
    Overview
    
    Part 1: Setup
    
    Part 2: Apply
    
    Part 3: Test
- Troubleshooting
  Troubleshooting
  - Workloads
    Workloads
    
    Overview
    
    Scenario 1: Misconfigured Requests
    
    Scenario 2: Incorrect Container Image
- Upstream MKS
  Upstream MKS
  - Home
  - Backup/Restore
    Backup/Restore
    
    Overview
    
    Part 1: Setup Environment
    
    Part 2: Create Resources
    
    Part 3: Backup/Restore
  - Cluster Lifecycle
    Cluster Lifecycle
    
    Overview
    
    Part 1: Provision
    
    Part 2: Scale
    
    Part 3: Upgrade
    
    Part 4: Deprovision
  - GPU
    GPU
    
    Overview
    
    Part 1: Setup
    
    Part 2: Blueprint
    
    Part 3: Workload
  - Managed Storage
    Managed Storage
    
    Overview
    
    Part 1: Setup
    
    Part 2: Blueprint
    
    Part 3: Utilize
    
    Part 4: Expand
  - Standard Operating Model
    Standard Operating Model
    
    Overview
    
    Part 1: Setup
    
    Part 2: Provision
    
    Part 3: Deprovision
  - Windows
    Windows
    
    Overview
    
    Part 1: Provision
    
    Part 2: Workload
    
    Part 3: Deprovision
- Virtual Machines
  Virtual Machines
  - KubeVirt
    KubeVirt
    
    Overview
    
    Part 1: Setup
    
    Part 2: Blueprint
    
    Part 3: Deploy VM
- Zero Trust Kubectl
  Zero Trust Kubectl
Recipes
Recipes
- Overview
- Contributors
- AI/ML
  AI/ML
  - Overview
  - K8sGPT
    K8sGPT
    
    Overview
    
    Configure
    
    Test
  - Kuberay
    Kuberay
    
    Overview
    
    Configure
    
    Test
- AlertManager
  AlertManager
- Backup
  Backup
  - CloudCasa
  - Velero
    Velero
    
    Overview
    
    Credentials - IAM Role
    
    Credentials - IAM User
    
    Credentials - MinIO
    
    Use Velero
- Cost Management
  Cost Management
  - Overview
  - Kubecost
  - StormForge
    StormForge
    
    Overview
    
    Configure
- Cert-Manager
  Cert-Manager
- Databases
  Databases
  - Redis
  - InfluxDB
- Developer Self-Service
  Developer Self-Service
  - Backstage
  - Vclusters
- Edge
  Edge
  - Zededa
    Zededa
    
    Overview
    
    Import Cluster
    
    Provision Cluster
- Functions
  Functions
  - Kubeless
- Governance
  Governance
  - OPA Gatekeeper
    OPA Gatekeeper
    
    Overview
    
    Policies
    
    Examples
    Examples
    
    Container without limits configured
    
    Container without probes configured
    
    Pull container images from only ECR registry
    
    Unique Service Selector
    
    Unique Ingress Host
    
    Run Containers only with selective users
  - Kyverno
    Kyverno
    
    Overview
    
    Policies
- GPU
  GPU
  - Nvidia GPU Operator
    Nvidia GPU Operator
    
    Overview
    
    Install
    
    Test GPU
  - GPU Simulator
    GPU Simulator
    
    Install
- Ingress
  Ingress
  - ALB
    ALB
    
    Overview
    
    Create
    
    Configure
    
    Access
  - Ambassador
  - Citrix
  - Kong
    Kong
    
    Install Kong
    
    Enable Monitoring
    
    Enable Logging
    
    Sample Application
  - NGINX
    NGINX
    
    Overview
    
    Create Blueprint
    
    Test Workload
  - ngrok
  - Traefik
- Load Balancer
  Load Balancer
  - MetalLB
    MetalLB
    
    Overview
    
    Create
    
    Configure
    
    Access
- Logging
  Logging
  - CloudWatch
  - OpenSearch
    OpenSearch
    
    Overview
    
    Create
    
    Configure
    
    Access
  - Splunk
  - Splunk Otel Collector
  - Sumologic
  - New Relic
- Monitoring
  Monitoring
  - Amazon Prometheus
    Amazon Prometheus
    
    Overview
    
    Create
    
    Configure
    
    Access
  - CloudWatch
  - Datadog Agent
  - Dynatrace
  - Grafana
  - New Relic
  - OpsVerse Agent
  - Kube Prometheus Stack
    Kube Prometheus Stack
    
    Overview
    
    Configure
    
    Access
  - Splunk Connect
  - Splunk Otel Collector
- Network Policy
  Network Policy
  - Overview
  - Calico
    Calico
    
    Install
    
    Test
  - Cilium
    Cilium
    
    AWS CNI
    
    Azure Overlay CNI
- Secrets
  Secrets
  - AWS Secrets Manager
    AWS Secrets Manager
    
    Overview
    
    Create
    
    Configure
    
    Access
  - External Secrets
    External Secrets
    
    Overview
    
    Create
    
    Configure
    
    Access
  - Hashicorp Vault
    Hashicorp Vault
    
    Overview
    
    Create
    
    Configure
    
    Access
  - Sealed Secrets
- Security
  Security
  - Araali
  - Kube-bench
  - Trivy
- Service Mesh
  Service Mesh
  - Istio
    Istio
    
    Overview
    
    Use Istio
  - Linkerd
    Linkerd
    
    Overview
    
    Use Linkerd
- Storage
  Storage
  - MinIO
  - Ondat
  - Portworx
  - Rook Ceph
    Rook Ceph
    
    Overview
    
    Install
    
    Utilize
    
    Monitor
    
    Expand
- Tracing
  Tracing
  - OpenTelemetry
    OpenTelemetry
    
    Overview
    
    Configure
    
    Test
- Troubleshooting
  Troubleshooting
  - Sosivio
Blueprints
Blueprints
- Overview
- About
- AI & GenAI
- CaaS
- Cluster as Service
  Cluster as Service
  - Overview
  - Amazon EKS
    Amazon EKS
    
    Env Manager
    Env Manager
    
    Overview
    
    Setup
    
    Provision
    
    Terraform
    Terraform
    
    Overview
    
    Prerequisites
    
    Provision
    
    Day-2
    
    Deprovision
  - Amazon ECS
    Amazon ECS
    
    Overview
    
    Setup
    
    Use
  - Azure AKS
    Azure AKS
    
    Terraform
    Terraform
    
    Overview
    
    Prerequisites
    
    Provision
    
    Day-2
    
    Deprovision
  - Google GKE
    Google GKE
    
    Overview
    
    Setup
    
    Use
  - Oracle OKE
    Oracle OKE
    
    Overview
    
    Setup
    
    Use
  - PhoenixNAP
    PhoenixNAP
    
    Overview
    
    Setup
    
    Use
  - VMware
    VMware
    
    Overview
    
    Setup
    
    Use
- Gen AI/AI
  Gen AI/AI
  - Overview
  - Amazon ECS
    Amazon ECS
    
    Overview
    
    Setup
    
    Use
  - Amazon EKS
    Amazon EKS
    
    Overview
    
    Setup
    
    Use
  - JupyterHub as Service
    JupyterHub as Service
    
    Overview
    
    Setup
    
    Use
- Multitenancy
  Multitenancy
  - Overview
  - Introduction
  - Controls
    Controls
    
    Resource Quotas
    
    Isolated Containers
    
    Network Policy
    
    Cluster Policy
    
    Role based Access
    
    Secure Remote Access
    
    Audit Logging
    
    Cost Allocation
  - Namespace as Service
    Namespace as Service
    
    Overview
    
    Setup
    
    Use
  - Workspace as Service
    Workspace as Service
    
    Overview
    
    Setup
    
    Use
  - vClusters
    vClusters
    
    Overview
    
    Setup
    
    Use
Releases
Releases
- Overview
- Release Info
- Production-SaaS
  Production-SaaS
  - 2024
    2024
    
    Oct
    
    Sept
    
    Aug
    
    July
    
    June
    
    May
    
    Apr
    
    Mar
    
    Feb
    
    Jan
  - 2023
    2023
    
    Dec
    
    Nov
    
    Oct
    
    Sept
    
    Aug
    
    July
    
    June
    
    May
    
    Apr
    
    Mar
    
    Feb
    
    Jan
  - 2022
  - 2021
  - 2020
  - 2019
- Self Hosted Controller
  Self Hosted Controller
  - 2024
    2024
    
    Aug
    
    July
    
    June
    
    Apr
- Preview-SaaS
  Preview-SaaS
  - Overview
  - Upcoming
Blog
Blog
Contact
Contact
- Overview
- Email
- Slack
Open Source
Open Source
- Open Source
Use Cases
Use Cases
- Overview
- Cost Optimization
  Cost Optimization
  - Granular Cost Visibility & Chargebacks
- Environment and Resource Provisioning
  Environment and Resource Provisioning
  - Standardized Resource Creation for Developers
  - Cloud Landing Zone Management
- Kubernetes Lifecycle Management
  Kubernetes Lifecycle Management
- Migration from Other Platforms to Rafay
  Migration from Other Platforms to Rafay
  - Mirantis to Rafay Migration
  - Rancher to Rafay Migration
- Platform-as-a-Service Offerings
  Platform-as-a-Service Offerings
  - Managed Kubernetes Service for Customer Sites
- Multi-Tenant Infrastructure & Tooling
  Multi-Tenant Infrastructure & Tooling
  - Multi-Tenant Self-Service Clusters
- Standardization and Governance
  Standardization and Governance

Part 2: Provision

What Will You Do¶

In this part of the self-paced exercise, you will provision an Amazon EKS cluster with a GPU node group based on a declarative cluster specification

Step 1: Cluster Spec¶

Open Terminal (on macOS/Linux) or Command Prompt (Windows) and navigate to the folder where you forked the Git repository
Navigate to the folder "/getstarted/gpueks/cluster"

The "eks-gpu.yaml" file contains the declarative specification for our Amazon EKS Cluster.

Cluster Details¶

The following items may need to be updated/customized if you made changes to these or used alternate names.

cluster name: demo-gpu-eks
cloud provider: aws-cloud-credential
project: defaultproject
region: us-west-2
ami: ami-0114d85734fee93fb

Step 2: Provision Cluster¶

On your command line, navigate to the "cluster" sub folder
Type the command

rctl apply -f eks-gpu.yaml

If there are no errors, you will be presented with a "Task ID" that you can use to check progress/status. Note that this step requires creation of infrastructure in your AWS account and can take ~20-30 minutes to complete.

{
  "taskset_id": "d27l3rk",
  "operations": [
    {
      "operation": "ClusterCreation",
      "resource_name": "demo-gpu-eks",
      "status": "PROVISION_TASK_STATUS_PENDING"
    },
    {
      "operation": "NodegroupCreation",
      "resource_name": "t3-nodegroup",
      "status": "PROVISION_TASK_STATUS_PENDING"
    },
    {
      "operation": "NodegroupCreation",
      "resource_name": "gpu-nodegroup",
      "status": "PROVISION_TASK_STATUS_PENDING"
    },
    {
      "operation": "BlueprintSync",
      "resource_name": "demo-gpu-eks",
      "status": "PROVISION_TASK_STATUS_PENDING"
    }
  ],
  "comments": "The status of the operations can be fetched using taskset_id",
  "status": "PROVISION_TASKSET_STATUS_PENDING"
}

Navigate to the project in your Org
Click on Infrastructure -> Clusters. You should see something like the following

Provisioning in Process

Click on the cluster name to monitor progress

Provisioning in Process

Step 3: Verify Cluster¶

Once provisioning is complete, you should see a healthy cluster in the web console

Provisioned Cluster

Click on the kubectl link and type the following command

kubectl get nodes -o wide

You should see something like the following

NAME                                            STATUS   ROLES    AGE   VERSION                INTERNAL-IP       EXTERNAL-IP     OS-IMAGE             KERNEL-VERSION                  CONTAINER-RUNTIME
ip-192-168-109-113.us-west-2.compute.internal   Ready    <none>   16m   v1.24.11-eks-a59e1f0   192.168.109.113   <none>          Amazon Linux 2       5.10.178-162.673.amzn2.x86_64   containerd://1.6.19
ip-192-168-52-184.us-west-2.compute.internal    Ready    <none>   15m   v1.24.10               192.168.52.184    54.193.37.206   Ubuntu 20.04.6 LTS   5.15.0-1034-aws                 containerd://1.6.12

Step 4: Remove EKS GPU Daemonset¶

You will now remove the EKS installed Nvidia daemonset. This daemonset will install the GPU drivers. However, we will be using the Nvidia Operator which will install the needed drivers.

Navigate to Infrastructure -> Clusters
Click on Resources on the cluster card
Click on DaemonSets on the left hand side of the page
Find the daemonset with the name nvidia-device-plugin-daemonset
Click on the actions button next to the previously located daemonset
Click Delete
Click Yes to confirm the deletion

Delete Daemonset

Recap¶

Congratulations! At this point, you have successfully configured and provisioned an Amazon EKS cluster with a GPU node group in your AWS account using the RCTL CLI. You are now ready to move on to the next step where you will create a deploy a custom cluster blueprint that contains the GPU Operator as an addon.