Secure Gen AI on Kubernetes: Multi-AZ RAG Platform
How Tech Stack Playbook designed and built a production-grade, zero-trust Generative AI platform on Amazon EKS for a Fortune 500 SaaS โ turning a fast-moving โvibe-codedโ prototype into a multi-AZ, enterprise-scale Retrieval-Augmented Generation system.
Overview
Tech Stack Playbook was engaged by a Fortune 500 SaaS to take a fast-moving Generative AI prototype โ a Retrieval-Augmented Generation (RAG) chatbot where users upload documents, chat with them, and receive grounded, source-cited answers โ and re-architect it into a production-grade, zero-trust, multi-availability-zone platform suitable for enterprise-scale deployment.
The build delivers Next.js application pods and Python FastAPI server pods running on separate Amazon EKS clusters across two Availability Zones, with Amazon Bedrock Knowledge Base on Amazon OpenSearch Service for vector retrieval, Amazon Cognito and AWS AppSync for federated auth and GraphQL, and a full defense-in-depth perimeter of CloudFront, WAF, Shield, and GuardDuty malware scanning โ all provisioned declaratively with AWS CDK.
The engagement produced a 10-step production security playbook that TSP now applies across every Generative AI engagement.
From โVibe Codingโ to Production-Grade AI
Modern Generative AI applications are easier than ever to stand up โ but the same tools that make prototyping fast also produce applications that are structurally insecure, operationally fragile, and impossible to run in regulated enterprise environments. The client had a working โtalk with your documentsโ prototype that checked the demo box, but nearly every production concern a real SaaS platform must solve was still open.
- Vibe-coded prototypes pull bloated, outdated, and frequently vulnerable dependencies
- No real authentication or authorization โ users can see each other's data
- Single-AZ or non-existent scaling design โ a single failure takes everything down
- Files uploaded by users are accepted and processed without malware scanning
- No rate limiting on LLM invocations โ one bad actor triggers "Denial-of-Wallet"
- No audit trail, logging, or observability for incident response or compliance
- No CDN, WAF, or DDoS protection at the edge
- Long-lived AWS credentials in CI/CD pipelines
- No automated code review, dependency scanning, or shift-left security
Multi-AZ Kubernetes Platform
The platform addresses every dimension of the prototype-to-production gap โ containerization, multi-AZ Kubernetes orchestration, zero-trust identity, RAG retrieval, malware scanning, edge protection, rate limiting, audit logging, and shift-left security automation.
How Data Flows Through the Stack
Edge & Identity
Every request hits Amazon CloudFront first โ TLS is terminated at the edge, AWS WAF inspects the request for Layer 7 threats, and AWS Shield absorbs network-layer DDoS traffic. Authenticated users are federated through Amazon Cognito, which issues short-lived JWTs consumed by the application and API layers.
Application Tier on EKS
The Next.js front-end runs as Kubernetes pods on the App EKS cluster, distributed across AZ-A and AZ-B in private subnets. Traffic from CloudFront lands on the Classic Elastic Load Balancer in public subnets, which routes to pods in both availability zones. A separate Server EKS cluster runs the Python FastAPI service handling inference orchestration, RAG queries, and feature APIs.
Data & Retrieval
AWS AppSync provides the GraphQL data API backed by Amazon DynamoDB for user-scoped metadata. File uploads flow through S3 presigned URLs into a production bucket guarded by Amazon GuardDuty Malware Protection โ files marked SAFE proceed to Amazon Bedrock Knowledge Base ingestion on Amazon OpenSearch Service; files marked UNSAFE are quarantined and logged. An EventBridge rule fires the Bedrock StartIngestionJob API via Lambda on every S3 write.
CI/CD & Supply Chain
Every push to the Git repository triggers a webhook-driven GitHub Actions pipeline. The pipeline assumes an AWS IAM role via OIDC โ there are no long-lived AWS keys in CI. Docker images build, push to Amazon ECR, and roll out to EKS pods. Amazon Q runs in the pull-request flow for automated security review.