Client Engagement SaaS โ€” Fortune 500

Secure Gen AI on Kubernetes: Multi-AZ RAG Platform

How Tech Stack Playbook designed and built a production-grade, zero-trust Generative AI platform on Amazon EKS for a Fortune 500 SaaS โ€” turning a fast-moving โ€œvibe-codedโ€ prototype into a multi-AZ, enterprise-scale Retrieval-Augmented Generation system.

Multi-AZ
EKS High Availability
Zero Trust
Security Architecture
Fortune 500
Enterprise SaaS Client
10-Step
Production Playbook

Overview

Tech Stack Playbook was engaged by a Fortune 500 SaaS to take a fast-moving Generative AI prototype โ€” a Retrieval-Augmented Generation (RAG) chatbot where users upload documents, chat with them, and receive grounded, source-cited answers โ€” and re-architect it into a production-grade, zero-trust, multi-availability-zone platform suitable for enterprise-scale deployment.

The build delivers Next.js application pods and Python FastAPI server pods running on separate Amazon EKS clusters across two Availability Zones, with Amazon Bedrock Knowledge Base on Amazon OpenSearch Service for vector retrieval, Amazon Cognito and AWS AppSync for federated auth and GraphQL, and a full defense-in-depth perimeter of CloudFront, WAF, Shield, and GuardDuty malware scanning โ€” all provisioned declaratively with AWS CDK.

The engagement produced a 10-step production security playbook that TSP now applies across every Generative AI engagement.

From โ€œVibe Codingโ€ to Production-Grade AI

Modern Generative AI applications are easier than ever to stand up โ€” but the same tools that make prototyping fast also produce applications that are structurally insecure, operationally fragile, and impossible to run in regulated enterprise environments. The client had a working โ€œtalk with your documentsโ€ prototype that checked the demo box, but nearly every production concern a real SaaS platform must solve was still open.

  • Vibe-coded prototypes pull bloated, outdated, and frequently vulnerable dependencies
  • No real authentication or authorization โ€” users can see each other's data
  • Single-AZ or non-existent scaling design โ€” a single failure takes everything down
  • Files uploaded by users are accepted and processed without malware scanning
  • No rate limiting on LLM invocations โ€” one bad actor triggers "Denial-of-Wallet"
  • No audit trail, logging, or observability for incident response or compliance
  • No CDN, WAF, or DDoS protection at the edge
  • Long-lived AWS credentials in CI/CD pipelines
  • No automated code review, dependency scanning, or shift-left security

Multi-AZ Kubernetes Platform

The platform addresses every dimension of the prototype-to-production gap โ€” containerization, multi-AZ Kubernetes orchestration, zero-trust identity, RAG retrieval, malware scanning, edge protection, rate limiting, audit logging, and shift-left security automation.

01
Multi-AZ Amazon EKS Platform
Separate App (Next.js) and Server (FastAPI) EKS clusters, each distributed across two Availability Zones with Elastic Load Balancing, private subnets, and NAT gateways for outbound traffic.
02
AWS CDK Infrastructure as Code
Fully typed Python CDK stacks for VPC, EKS, ECR, CloudFront, Cognito, and observability โ€” bootstrap, synth, and deploy automated per environment.
03
Cognito + AppSync Auth Layer
Federated identity via Amazon Cognito; AWS AppSync GraphQL API enforcing per-user data isolation and granular RBAC.
04
Bedrock Knowledge Base RAG
Amazon Bedrock Knowledge Base on Amazon OpenSearch Service as the vector store โ€” EventBridge-triggered ingestion on S3 uploads for automated re-indexing.
05
Malware-Scanned File Uploads
S3 presigned URLs for upload; Amazon GuardDuty Malware Protection for S3 scans every file; quarantine bucket isolates anything flagged UNSAFE.
06
Zero-Trust Edge
Amazon CloudFront with SSL termination, AWS WAF for Layer 7 filtering, and AWS Shield for always-on DDoS protection โ€” custom domains via Route 53 and ACM.
07
Denial-of-Wallet Protection
Per-user token tracking and daily rate limits in DynamoDB with TTL-based reset โ€” preventing runaway LLM costs from abuse or accidental loops.
08
Audit-Ready Observability
AWS CloudTrail for account-level audit trails, CloudWatch metrics and alarms, and Amazon EventBridge for event-driven upload and ingestion workflows.
09
OIDC-Auth CI/CD Pipelines
GitHub Actions assumes AWS IAM roles via OIDC โ€” no long-lived access keys in the CI environment. Automated Docker build, push to Amazon ECR, and kubectl rollout.
10
Shift-Left with Amazon Q
Amazon Q performs automated pull-request review, dependency vulnerability scanning, and codebase refactoring โ€” catching insecure patterns before they ship.
The gap between a โ€œvibe-codedโ€ Gen AI prototype and a production-grade enterprise platform isn't model quality or UI polish โ€” it's the identity, scaling, audit, and edge-protection infrastructure around the AI. That infrastructure is the product.

How Data Flows Through the Stack

Edge & Identity

Every request hits Amazon CloudFront first โ€” TLS is terminated at the edge, AWS WAF inspects the request for Layer 7 threats, and AWS Shield absorbs network-layer DDoS traffic. Authenticated users are federated through Amazon Cognito, which issues short-lived JWTs consumed by the application and API layers.

Application Tier on EKS

The Next.js front-end runs as Kubernetes pods on the App EKS cluster, distributed across AZ-A and AZ-B in private subnets. Traffic from CloudFront lands on the Classic Elastic Load Balancer in public subnets, which routes to pods in both availability zones. A separate Server EKS cluster runs the Python FastAPI service handling inference orchestration, RAG queries, and feature APIs.

Data & Retrieval

AWS AppSync provides the GraphQL data API backed by Amazon DynamoDB for user-scoped metadata. File uploads flow through S3 presigned URLs into a production bucket guarded by Amazon GuardDuty Malware Protection โ€” files marked SAFE proceed to Amazon Bedrock Knowledge Base ingestion on Amazon OpenSearch Service; files marked UNSAFE are quarantined and logged. An EventBridge rule fires the Bedrock StartIngestionJob API via Lambda on every S3 write.

CI/CD & Supply Chain

Every push to the Git repository triggers a webhook-driven GitHub Actions pipeline. The pipeline assumes an AWS IAM role via OIDC โ€” there are no long-lived AWS keys in CI. Docker images build, push to Amazon ECR, and roll out to EKS pods. Amazon Q runs in the pull-request flow for automated security review.

Outcomes & Impact

Prototype โ†’ Production Working Gen AI demo re-architected into a multi-AZ, zero-trust Kubernetes platform suitable for enterprise-regulated environments.
Multi-AZ by Default Full stack runs across two availability zones with zero single-AZ dependencies โ€” resilient to AZ-level failures out of the box.
Zero-Trust Architecture Every request is authenticated, authorized, rate-limited, logged, and scanned โ€” the default posture, not a bolt-on.
Denial-of-Wallet Protected Per-user token tracking and daily rate limits prevent runaway LLM costs from abuse, buggy clients, or accidental loops.
10-Step Production Playbook Re-usable security checklist โ€” AI code review, file scanning, rate limits, RBAC, audit logging โ€” now applied across every TSP Gen AI engagement.
Shift-Left Security Automated pull-request review, dependency scanning, and codebase refactoring catch insecure patterns before they ever ship to production.

Technologies Used

Amazon EKS Amazon Bedrock AWS CDK Next.js Python FastAPI TypeScript GraphQL Docker Amazon ECR Amazon Cognito AWS AppSync Amazon S3 Amazon DynamoDB Amazon OpenSearch Service Amazon CloudFront AWS WAF AWS Shield Amazon GuardDuty AWS CloudTrail Amazon CloudWatch Amazon EventBridge GitHub Actions (OIDC) Amazon Q