Endpoint
An endpoint is a URL where end user/application can access one/many LLMs via the OpenAI compatible API. An endpoint can be either multitenant or dedicated to a single tenant.
New Endpoint¶
- In the Ops Console, click on GenAI and then Endpoint.
- Now, click on "New Endpoint" to initiate the workflow
General Section¶
Provide a unique name for the endpoint and an optional description.
Deployment Section¶
- Enter the "host name" for the endpoint (e.g. https://api.inference.com)
- Specify how the endpoint service will be exposed to users (External->Load Balancer, Internal->ClusterIP, k8s NodePort)
- Select the compute cluster from the dropdown that will be used to power the inference service.
- Optionally, specify how many replicas and resources you would like to allocate to each replica.
Certificate Section¶
Users and applications that will access the Inference service's API endpoint will expect the service to be secured using server side TLS. Upload the server certificate (chain) and private key in PEM format.
List All Endpoints¶
In the Ops Console, click on GenAI and then Endpoint. This will display the list of configured endpoints, their status and some metadata for the administrator.
View Endpoint Details¶
In the Ops Console, click on GenAI and then Endpoint. This will display details about the endpoint.
Endpoint Metrics¶
Click the Metrics tab in the Endpoint to view critical metrics. Summary and Trends for the following are available to operators.
- Upstream
- Downstream
- Success
- Errors
Operators can specify filters such as "Date Range", "Start and End Time" and "Granularity".
Delete Endpoint¶
To delete an endpoint, click on the ellipses (3 dots) under actions for the selected endpoint.
Important
This action is not reversible. Admins will need to recreate the endpoint in case of accidental deletion.






