Endpoint
Overview¶
An endpoint represents the access point through which GenAI model inference requests are served. Endpoints route incoming traffic to deployed models running on a GPU-enabled compute cluster.
Each endpoint is associated with:
- A hostname
- A target compute cluster
- TLS certificates (full chain and private key)
Endpoints must be created before deploying any model.
General Information¶
- Navigate to Operations Console → GenAI → Endpoints → New Endpoint
- Enter general details for the endpoint:
- Name and a Description (optional)
Deployment Details¶
Additional deployment parameters include:
- Host Name: The DNS hostname used to access the endpoint (e.g.,
demo.genai.paas.example.com). This must match the TLS certificate. - Compute Cluster: Select the compute cluster created and initialized in Compute Cluster step. Only supported clusters appear in this list.
Certificate Details¶
TLS configuration is required to secure traffic to the endpoint:
- Certificate: Full certificate chain in PEM format
- Private Key: Corresponding private key
Select Save Changes to create the endpoint.
After the endpoint is saved, the system automatically begins initializing it on the associated compute cluster.
Endpoint Initialization in the Cluster¶
- The system deploys the required AI Gateway Controller and Envoy Gateway components in the
gaap-controllernamespace. - Pods are created with names based on the endpoint, following patterns such as:
ai-gateway-controller-<generated-id>
envoy-gaap-controller-<endpoint-name>-<generated-id>
Example observed in the cluster:
gaap-controller ai-gateway-controller-66d48fdd5b-4h2g5 1/1 Running 23h
gaap-controller envoy-gaap-controller-ethan-global-endpoint-d55ada0a-9b86dsvrcj 3/3 Running 23h
Endpoint Readiness¶
When these controller and gateway pods reach the Running state, the endpoint is fully initialized and becomes ready for model deployment. Once the endpoint components are running in the cluster, the endpoint becomes active. At this stage, the endpoint is fully applied on the compute cluster. With both the compute cluster and the endpoint in place, the required infrastructure for GenAI inference is complete.

