Skip to content

Endpoint

An endpoint is a URL where end user/application can access one/many LLMs via the OpenAI compatible API. An endpoint can be either multitenant or dedicated to a single tenant.


New Endpoint

  • In the Ops Console, click on GenAI and then Endpoint.
  • Now, click on "New Endpoint" to initiate the workflow

General Section

Provide a unique name for the endpoint and an optional description.

New Endpoint

Deployment Section

  • Enter the "host name" for the endpoint (e.g. https://api.inference.com)
  • Specify how the endpoint service will be exposed to users (External->Load Balancer, Internal->ClusterIP, k8s NodePort)
  • Select the compute cluster from the dropdown that will be used to power the inference service.
  • Optionally, specify how many replicas and resources you would like to allocate to each replica.

New Endpoint


Certificate Section

Users and applications that will access the Inference service's API endpoint will expect the service to be secured using server side TLS. Upload the server certificate (chain) and private key in PEM format.

New Endpoint


List All Endpoints

In the Ops Console, click on GenAI and then Endpoint. This will display the list of configured endpoints, their status and some metadata for the administrator.

List All Endpoints


View Endpoint Details

In the Ops Console, click on GenAI and then Endpoint. This will display details about the endpoint.

View Endpoint Details

Endpoint Metrics Filters


Endpoint Metrics

Click the Metrics tab in the Endpoint to view critical metrics. Summary and Trends for the following are available to operators.

  • Upstream
  • Downstream
  • Success
  • Errors

View Endpoint Metrics

Operators can specify filters such as "Date Range", "Start and End Time" and "Granularity".

Endpoint Metric Filters


Delete Endpoint

To delete an endpoint, click on the ellipses (3 dots) under actions for the selected endpoint.

Important

This action is not reversible. Admins will need to recreate the endpoint in case of accidental deletion.