Overview
The reference designs for AI and Generative AI come with both documentation and code and are primarily designed for platform teams. With this, they can provide self-service experiences for application teams/developers with infrastructure required for AI and Generative AI.
The reference designs assume a simple two step process:
Step 1¶
The platform team imports the provided environment template(s) into their Rafay Org, configures it with the required credentials for AWS etc and shares it with downstream projects that developers and data scientists can access.
Step 2¶
The developer logs in to create an environment based on the published environment template for AI or Generative AI
The image below showcases the high level steps.
sequenceDiagram
autonumber
participant admin as Platform Team
participant rafay as Rafay
participant user as Developer
rect rgb(191, 223, 255)
Note over admin,rafay: Setup Environment Template <br> for AI/Generative AI
admin->>admin: Clone Git Repo
admin->>rafay: Setup Environment Template
admin->>rafay: Provide Credentials <br>(Infrastructure & LLM)
end
rect rgb(191, 223, 255)
Note over rafay,user: Provision <br> AI/Generative AI Environment
user->>rafay: Create Environment <br> based on Environment Template
user->>rafay: Use Environment
user->>rafay: Destroy Environment
end
Infrastructure Options¶
The sample Generative AI applications we currently provide are containerized and the designs/templates we provide are based on "Amazon ECS" and "Amazon EKS" for infrastructure.
Based on Amazon ECS¶
Provisioning the Amazon ECS based environment takes between 5-10 minutes. It makes sense to provide the app developer with a complete self service experience where they can provision single tenant ECS clusters on-demand with the Generative AI application deployed on it as an ECS Task.
Based on Amazon EKS¶
Provisioning an Amazon EKS cluster based environment can take ~30-40 minutes. Kubernetes clusters are extremely well suited for multi tenancy. As a result, we recommend that for every user/developer, the platform engineer provision a Kubernetes namespace and create an IRSA in it. The IRSA will ensure that the Generative AI application deployed by the developer to the namespace will have the required permissions to programmatically access the LLMs on Amazon Bedrock
Roadmap¶
This reference design is an initial version. We plan to progressively enhance the design with additional functionality based on our roadmap and customer feedback. Please watch this space or our product blogs for updates.