Client Engagement SaaS — Fortune 500

Secure Gen AI on Kubernetes: Multi-AZ RAG Platform

How Tech Stack Playbook designed and built a production-grade, zero-trust Generative AI platform on Amazon EKS for a Fortune 500 SaaS — turning a fast-moving “vibe-coded” prototype into a multi-AZ, enterprise-scale Retrieval-Augmented Generation system.

Multi-AZ

EKS High Availability

Zero Trust

Security Architecture

Fortune 500

Enterprise SaaS Client

10-Step

Production Playbook

Executive Summary

Overview

Tech Stack Playbook was engaged by a Fortune 500 SaaS to take a fast-moving Generative AI prototype — a Retrieval-Augmented Generation (RAG) chatbot where users upload documents, chat with them, and receive grounded, source-cited answers — and re-architect it into a production-grade, zero-trust, multi-availability-zone platform suitable for enterprise-scale deployment.

The build delivers Next.js application pods and Python FastAPI server pods running on separate Amazon EKS clusters across two Availability Zones, with Amazon Bedrock Knowledge Base on Amazon OpenSearch Service for vector retrieval, Amazon Cognito and AWS AppSync for federated auth and GraphQL, and a full defense-in-depth perimeter of CloudFront, WAF, Shield, and GuardDuty malware scanning — all provisioned declaratively with AWS CDK.

The engagement produced a 10-step production security playbook that TSP now applies across every Generative AI engagement.

The Challenge

From “Vibe Coding” to Production-Grade AI

Modern Generative AI applications are easier than ever to stand up — but the same tools that make prototyping fast also produce applications that are structurally insecure, operationally fragile, and impossible to run in regulated enterprise environments. The client had a working “talk with your documents” prototype that checked the demo box, but nearly every production concern a real SaaS platform must solve was still open.

Vibe-coded prototypes pull bloated, outdated, and frequently vulnerable dependencies
No real authentication or authorization — users can see each other's data
Single-AZ or non-existent scaling design — a single failure takes everything down
Files uploaded by users are accepted and processed without malware scanning
No rate limiting on LLM invocations — one bad actor triggers "Denial-of-Wallet"
No audit trail, logging, or observability for incident response or compliance
No CDN, WAF, or DDoS protection at the edge
Long-lived AWS credentials in CI/CD pipelines
No automated code review, dependency scanning, or shift-left security

What We Delivered

Multi-AZ Kubernetes Platform

The platform addresses every dimension of the prototype-to-production gap — containerization, multi-AZ Kubernetes orchestration, zero-trust identity, RAG retrieval, malware scanning, edge protection, rate limiting, audit logging, and shift-left security automation.

Multi-AZ Amazon EKS Platform

Separate App (Next.js) and Server (FastAPI) EKS clusters, each distributed across two Availability Zones with Elastic Load Balancing, private subnets, and NAT gateways for outbound traffic.

AWS CDK Infrastructure as Code

Fully typed Python CDK stacks for VPC, EKS, ECR, CloudFront, Cognito, and observability — bootstrap, synth, and deploy automated per environment.

Cognito + AppSync Auth Layer

Federated identity via Amazon Cognito; AWS AppSync GraphQL API enforcing per-user data isolation and granular RBAC.

Bedrock Knowledge Base RAG

Amazon Bedrock Knowledge Base on Amazon OpenSearch Service as the vector store — EventBridge-triggered ingestion on S3 uploads for automated re-indexing.

Malware-Scanned File Uploads

S3 presigned URLs for upload; Amazon GuardDuty Malware Protection for S3 scans every file; quarantine bucket isolates anything flagged UNSAFE.

Zero-Trust Edge

Amazon CloudFront with SSL termination, AWS WAF for Layer 7 filtering, and AWS Shield for always-on DDoS protection — custom domains via Route 53 and ACM.

Denial-of-Wallet Protection

Per-user token tracking and daily rate limits in DynamoDB with TTL-based reset — preventing runaway LLM costs from abuse or accidental loops.

Audit-Ready Observability

AWS CloudTrail for account-level audit trails, CloudWatch metrics and alarms, and Amazon EventBridge for event-driven upload and ingestion workflows.

OIDC-Auth CI/CD Pipelines

GitHub Actions assumes AWS IAM roles via OIDC — no long-lived access keys in the CI environment. Automated Docker build, push to Amazon ECR, and kubectl rollout.

Shift-Left with Amazon Q

Amazon Q performs automated pull-request review, dependency vulnerability scanning, and codebase refactoring — catching insecure patterns before they ship.

Key Insight

The gap between a “vibe-coded” Gen AI prototype and a production-grade enterprise platform isn't model quality or UI polish — it's the identity, scaling, audit, and edge-protection infrastructure around the AI. That infrastructure is the product.

Architecture

How Data Flows Through the Stack

Edge & Identity

Every request hits Amazon CloudFront first — TLS is terminated at the edge, AWS WAF inspects the request for Layer 7 threats, and AWS Shield absorbs network-layer DDoS traffic. Authenticated users are federated through Amazon Cognito, which issues short-lived JWTs consumed by the application and API layers.

Application Tier on EKS

The Next.js front-end runs as Kubernetes pods on the App EKS cluster, distributed across AZ-A and AZ-B in private subnets. Traffic from CloudFront lands on the Classic Elastic Load Balancer in public subnets, which routes to pods in both availability zones. A separate Server EKS cluster runs the Python FastAPI service handling inference orchestration, RAG queries, and feature APIs.

Data & Retrieval

AWS AppSync provides the GraphQL data API backed by Amazon DynamoDB for user-scoped metadata. File uploads flow through S3 presigned URLs into a production bucket guarded by Amazon GuardDuty Malware Protection — files marked SAFE proceed to Amazon Bedrock Knowledge Base ingestion on Amazon OpenSearch Service; files marked UNSAFE are quarantined and logged. An EventBridge rule fires the Bedrock StartIngestionJob API via Lambda on every S3 write.

CI/CD & Supply Chain

Every push to the Git repository triggers a webhook-driven GitHub Actions pipeline. The pipeline assumes an AWS IAM role via OIDC — there are no long-lived AWS keys in CI. Docker images build, push to Amazon ECR, and roll out to EKS pods. Amazon Q runs in the pull-request flow for automated security review.

Results

Outcomes & Impact

Prototype → Production Working Gen AI demo re-architected into a multi-AZ, zero-trust Kubernetes platform suitable for enterprise-regulated environments.

Multi-AZ by Default Full stack runs across two availability zones with zero single-AZ dependencies — resilient to AZ-level failures out of the box.

Zero-Trust Architecture Every request is authenticated, authorized, rate-limited, logged, and scanned — the default posture, not a bolt-on.

Denial-of-Wallet Protected Per-user token tracking and daily rate limits prevent runaway LLM costs from abuse, buggy clients, or accidental loops.

10-Step Production Playbook Re-usable security checklist — AI code review, file scanning, rate limits, RBAC, audit logging — now applied across every TSP Gen AI engagement.

Shift-Left Security Automated pull-request review, dependency scanning, and codebase refactoring catch insecure patterns before they ever ship to production.

Technical Stack

Technologies Used

Amazon EKS Amazon Bedrock AWS CDK Next.js Python FastAPI TypeScript GraphQL Docker Amazon ECR Amazon Cognito AWS AppSync Amazon S3 Amazon DynamoDB Amazon OpenSearch Service Amazon CloudFront AWS WAF AWS Shield Amazon GuardDuty AWS CloudTrail Amazon CloudWatch Amazon EventBridge GitHub Actions (OIDC) Amazon Q

Secure Gen AI on Kubernetes: Multi-AZ RAG Platform

Overview

From “Vibe Coding” to Production-Grade AI

Multi-AZ Kubernetes Platform

How Data Flows Through the Stack

Edge & Identity

Application Tier on EKS

Data & Retrieval

CI/CD & Supply Chain

Outcomes & Impact

Technologies Used

Join Our Free Trial