Infrastructure Templates for Generative AI on AWS¶
We constantly hear from our customers about wanting their developers to experiment with Generative AI. No organization wants to be left behind and they are all trying to find ways to empower their developers and application teams to be able to experiment with use cases powered especially by Generative AI.
According to recent Gartner research, >80% of enterprises will have used Generative AI APIs or Deployed Generative AI-Enabled Applications by 2026.
We have been listening to our customers and are happy to announce Rafay's Templates for AI & Generative AI. Platform teams can now provide their developers with a self service experience for Gen AI infrastructure enabling developers to experiment with new and innovative Generative AI use cases.
Customer Requirements¶
In our conversations with platform teams, developers and key technology partners, a few key requirements bubbled up to the top as critical requirements to provide this self service experience with transparent enforcement of critical controls.
-
Self Service
This was emphasized as the most important. They wanted a frictionless experience for their developers because they do not want any bottlenecks for experimentation. Platform and Ops teams are keenly aware that they are swamped supporting other critical priorities. -
Cost
With potentially 100s or 1000s of active developer environments for Gen AI, it is paramount that the cost associated with every environment is kept extremely low and under-utilized environments are deprovisioned to save $. -
Powered by Standards based IaC
Organizations have made significant investments in Infrastructure as Code (IaC) and they they wanted these environments to be backed by their preferred IaC such as Terraform. -
Infrastructure Provider
Most of the organizations that we spoke with were either on AWS or Azure. Many of them have usage commitments and would like to leverage it. -
Access to Multiple Models
We consistently heard that organizations would like to experiment with different models for different use cases. Given how fast the Generative AI landscape is evolving, it is sensible to not be locked into a provider that can only support a single model. -
Customize the Model
Organizations mentioned that they need the ability to further tune/train a foundational model with custom data to ensure it can be optimized for their use case. -
Security
Organizations said they were uncomfortable about using public/open models until they have guarantees and clarity on whether their data would not be used for public use.
As we looked at these requirements, we decided to prioritize our first version of the templates for Gen AI on AWS. We will be releasing a version of the templates for Azure in a few weeks.
Typical Steps for Users¶
Using the Gen AI infrastructure templates is essentially a simple "2-step" process. The first step involves the platform engineer importing the templates into their Rafay Org. The second step involves the developer/data scientist "consuming" the templates to provision the environments so that they can use it. The diagram below shows the high level steps.
sequenceDiagram
autonumber
participant admin as Platform Team
participant rafay as Rafay
participant user as Developer
rect rgb(191, 223, 255)
Note over admin,rafay: Step 1: Setup Environment Template
admin->>admin: Clone Git Repo
admin->>rafay: Setup Environment Template
admin->>rafay: Provide Credentials <br>(Infrastructure)
end
rect rgb(191, 223, 255)
Note over rafay,user: Step 2: Use Environment Template
user->>rafay: Create Environment <br> based on Environment Template
user->>rafay: Use Environment
user->>rafay: Destroy Environment
end
Gen AI on Amazon ECS¶
This Generative AI template provisions an Amazon ECS cluster inside a VPC, deploys a task with an example Gen AI application. The ECS cluster is automatically configured with IAM policies to make API calls to a LLM in Amazon's Bedrock Generative AI service. The high level architecture looks like the following image.
This ECS based environment will cost ~$9/developer/month making it an extremely affordable development environment.
End-to-end provisioning of the ECS based environment based on this template takes approximately 6 minutes.
Watch a video of the developer experience with this template.
Gen AI on Amazon EKS¶
This Generative AI template is based on a shared Kubernetes cluster based on Amazon EKS. Every developer gets access to a Kubernetes namespace on the shared EKS cluster. As part of environment creation, an IRSA is automatically deployed in the namespace with the necessary policies for applications to make API calls to a LMM in Amazon's Bedrock Generative AI service.
Two sample Generative AI applications are also deployed to the namespace that the developer can use as a starting point. The high level architecture looks like the following image.
End-to-end provisioning of the EKS based environment based on this template takes approximately 6 minutes.
Watch a video of the developer experience with this template.
Learn More/Try It¶
Are you interested in learning more about Rafay's "Templates for AI and Gen AI"?
- Read through the documentation for the Templates
- Schedule a demo
- Meet us and watch a live demo at upcoming conferences and industry events
- If you would like to try this yourself, you can sign up for a Free Org.
Important
1 |
|