Skip to content

Endpoint

Overview

An endpoint represents the access point through which GenAI model inference requests are served. Endpoints route incoming traffic to deployed models running on a GPU-enabled compute cluster.

Each endpoint is associated with:

  • A hostname
  • A target compute cluster
  • TLS certificates (full chain and private key)

Endpoints must be created before deploying any model.


General Information

  • Navigate to Operations Console → GenAI → Endpoints → New Endpoint
  • Enter general details for the endpoint:
    • Name and a Description (optional)

Deployment Details

Additional deployment parameters include:

  • Host Name: The DNS hostname used to access the endpoint (e.g., demo.genai.paas.example.com). This must match the TLS certificate.
  • Compute Cluster: Select the compute cluster created and initialized in Compute Cluster step. Only supported clusters appear in this list.

Certificate Details

TLS configuration is required to secure traffic to the endpoint:

  • Certificate: Full certificate chain in PEM format
  • Private Key: Corresponding private key

Endpoint General Info

Select Save Changes to create the endpoint.

After the endpoint is saved, the system automatically begins initializing it on the associated compute cluster.

Endpoint Initialization in the Cluster

  • The system deploys the required AI Gateway Controller and Envoy Gateway components in the gaap-controller namespace.
  • Pods are created with names based on the endpoint, following patterns such as:
ai-gateway-controller-<generated-id>
envoy-gaap-controller-<endpoint-name>-<generated-id>

Example observed in the cluster:

gaap-controller   ai-gateway-controller-66d48fdd5b-4h2g5                           1/1   Running   23h
gaap-controller   envoy-gaap-controller-ethan-global-endpoint-d55ada0a-9b86dsvrcj   3/3   Running   23h

Endpoint Pod

Endpoint Readiness

When these controller and gateway pods reach the Running state, the endpoint is fully initialized and becomes ready for model deployment. Once the endpoint components are running in the cluster, the endpoint becomes active. At this stage, the endpoint is fully applied on the compute cluster. With both the compute cluster and the endpoint in place, the required infrastructure for GenAI inference is complete.