Multi-Model AI on AWS: Step Functions + Lambda + CDK

◉ AI / ML Engineering · Platform

How we productionized a fine-tuned, multi-model AI pipeline for an elite executive advisory firm — taking proprietary models trapped in Colab and shipping them as a Step Functions-orchestrated, container-based, CDK-defined platform with zero idle compute.

By: TSP Engineering Team · 14 min read · Step Functions · Lambda · Docker · ECR · CDK

multi_model_pipeline.ipynb — colab research environment

[ IN • 1 ]manual execution

!pip install torch==1.13.1 transformers==4.28 huggingface-hub==0.14.1 # good luck reproducing this

[ IN • 2 ]hardcoded path

MODEL_PATH = "/content/drive/MyDrive/models/extraction-v7-final-FINAL-use-this-one.pt"

[ IN • 14 ]sequential + stateful

output_1 = run_extraction(transcript)
output_2 = run_analysis(output_1) # hope you ran cell 3 first
output_3 = run_synthesis(output_2)RuntimeError: session state lost after runtime restart. rerun from cell 1.

Multi

Model Pipeline

Idle Compute

100%

Typed CDK IaC

$100M+

Per-Client Revenue Tier

TL;DR

An executive advisory firm had proprietary, fine-tuned AI models for analyzing coaching session transcripts with seven- through ten-figure entrepreneurs. The models worked. The problem was that they only worked in Colab — manually, by one researcher, with no path to production. Tech Stack Playbook built the production system that made them useful.

Below: the Step Functions state machine, the containerization pattern, why we rejected LLM-generated IaC in favor of typed CDK, and the deterministic-vs-probabilistic tension that defined every architecture call.

01 / THESISThe Research-to-Production Gap Is the Real AI Engineering Problem

The AI industry talks about model research as if that's the hard part. For the teams we work with, the models are often the easy part. What's hard is everything that happens between "the notebook works on my laptop" and "the platform works for our clients."

This engagement is the archetype: sophisticated, domain-specific fine-tuned models that extracted behavioral signals from private coaching sessions — capable work, valuable output, real IP. All of it locked inside Jupyter. Manually ordered cells. Implicit state. Google Drive–mounted checkpoints. A single researcher who could run it. A pipeline that existed in one Chrome tab at a time.

BEFORE

Colab Research Artifact

Models lived in Jupyter with hardcoded Drive paths
One researcher could run it. Nobody else.
Sequential models, no orchestration, no retries
Runtime state evaporated on every kernel restart
No IaC, no CI/CD, no environment management
Outputs trapped in notebook cells — no persistence

AFTER

Production Platform on AWS

Models containerized to ECR — same images dev & prod
Any engineer can trigger a run via standard PR flow
Step Functions orchestrates the multi-model DAG
DynamoDB persists every generation + intermediate state
100% CDK-typed infra, GitHub Actions CI/CD pipeline
Application layer surfaces structured reports

That pattern — stakes high, clients paying for institutional-grade analysis, models trapped in notebooks — is not unusual. It's the default state of AI capability at most firms that invest in custom model work. Pair this with the 14-step security lockdown we shipped for the same client and you see the full picture: the infrastructure to run the models has to be as enterprise-grade as the clientele.

02 / ORCHESTRATIONExecute the State Machine. Click Any State to Inspect It.

This is the multi-model pipeline as a Step Functions state machine. Hit Execute to watch the pulse travel through each containerized Lambda stage. Click any state — running, completed, or pending — to inspect its Lambda function, container image, input/output schema, and execution profile.

◉ coaching-intelligence-pipeline arn:aws:states:us-east-1:•••:stateMachine/coaching-intel-pipeline

Idle

Start

Extract

λ · container

Analyze

λ · container

Synthesize

λ · container

Persist

DynamoDB

End

Click any state to inspect

The progressive pipeline transforms a raw coaching transcript through four fine-tuned models, then persists structured intelligence to DynamoDB. Hit Execute to watch it run, or click any state above.

Why Step Functions (and not a Python script calling Lambdas)

A for-loop calling four Lambdas sequentially would work until it didn't — until a transient throttling error in stage three meant the whole pipeline fell on the floor with no retry, no observability, and no resume. Step Functions gives you structured state transitions, per-step retry policies, built-in error handling paths, and an execution history that's queryable for months. The visual console replaces the tribal knowledge of "ask the researcher which notebook cell this was."

Why containerized Lambda (and not ECS / SageMaker endpoints)

Three constraints aligned with Lambda's execution model: each model generation finished well inside 15 minutes, sessions were fully self-contained (no state carryover between invocations), and the workload was bursty with long idle gaps. Lambda's container image support means the exact Docker image we run locally ships to production unchanged. ECS would have introduced a 24/7 cost floor for a workload that's idle most of the day. SageMaker endpoints would have added cost and complexity we didn't need for this inference pattern.

03 / CONTAINERIZATIONNotebook → ECR Image → Lambda

Every model becomes a Docker image deployed to ECR, then invoked as a Lambda function. One definition, one artifact, one path through dev and prod. The container handles dependency pinning, model-weight loading, and the Lambda runtime API contract.

dockerfile · model container

# Base image provides the Lambda runtime API
FROM public.ecr.aws/lambda/python:3.11

# Pin dependencies — no surprises between dev and prod
COPY requirements.txt ${LAMBDA_TASK_ROOT}
RUN pip install --no-cache-dir -r requirements.txt

# Model weights baked into the image (immutable, reproducible)
COPY model_weights/ ${LAMBDA_TASK_ROOT}/model_weights/
COPY handler.py ${LAMBDA_TASK_ROOT}

# Lambda runtime calls handler.lambda_handler(event, context)
CMD [ "handler.lambda_handler" ]

04 / IACTyped CDK, Not LLM-Generated Terraform

This engagement made one decision very early: no LLM-generated infrastructure. Not because LLMs are bad at writing Terraform — they're fine at it. Because the environment serves private strategic data from seven- to ten-figure entrepreneurs, and probabilistic infrastructure definitions are a category error for clients at that tier. Every stack gets the same hardening pattern we documented in our multi-account AWS modernization work.

CDK in TypeScript gave us what the problem needed: real programming constructs (loops, conditionals, abstractions), type safety across every resource definition, and a deterministic synth that produces the same CloudFormation output on every run. Same input, same output. No drift. No "it worked in my agent."

typescript · cdk · pipeline stack

import { Stack, StackProps, Duration } from 'aws-cdk-lib';
import { DockerImageFunction, DockerImageCode } from 'aws-cdk-lib/aws-lambda';
import { StateMachine, Chain } from 'aws-cdk-lib/aws-stepfunctions';
import { LambdaInvoke } from 'aws-cdk-lib/aws-stepfunctions-tasks';

export class CoachingIntelligenceStack extends Stack {
  constructor(scope: Construct, id: string, props?: StackProps) {
    super(scope, id, props);

    // Each model is a typed Lambda construct — configuration is code.
    const models = ['extract', 'analyze', 'synthesize'].map(name =>
      new DockerImageFunction(this, `Model-${name}`, {
        code: DockerImageCode.fromImageAsset(`./models/${name}`),
        memorySize: 3008,                   // right-sized per model profile
        timeout: Duration.minutes(15),      // Lambda container max
        environment: { LOG_LEVEL: 'INFO' },
      })
    );

    // Chain tasks into the state machine. Retry policy per step.
    const definition = Chain.start(
      new LambdaInvoke(this, 'Extract',    { lambdaFunction: models[0] })
    ).next(
      new LambdaInvoke(this, 'Analyze',    { lambdaFunction: models[1] })
    ).next(
      new LambdaInvoke(this, 'Synthesize', { lambdaFunction: models[2] })
    );

    new StateMachine(this, 'CoachingIntelPipeline', { definition });
  }
}

That same file loops, types, and refactors. Add a fifth model? One line in the array. Change memory sizing? One property. Rename the state machine? TypeScript tells you the 14 downstream references that need updating before you merge. That's the part an LLM generating YAML can't give you — feedback at edit time.

◉ Key Insight

The AI models produce probabilistic outputs. The infrastructure that runs them must be entirely deterministic. That tension — managing probabilistic AI workloads through typed, auditable infrastructure code — was the design principle behind every call we made.

05 / OUTCOMESWhat Shipped

Research → Prod

Models Liberated from Colab

Proprietary fine-tuned models now run as a managed, production-grade pipeline available to the entire team.

Zero

Idle Compute Cost

Lambda-based execution means the firm pays only for actual model inference seconds — no 24/7 endpoint tax.

100%

Typed CDK IaC

Every Lambda, state machine, table, role, and ECR repo defined in deterministic, version-controlled TypeScript.

Any PR

Any Engineer Can Deploy

Single-researcher bottleneck eliminated. GitHub Actions pipeline ships infrastructure and model updates through standard PR flow.

Stack

AWS Lambda (container) Amazon ECR AWS Step Functions Amazon DynamoDB AWS CDK (TypeScript) GitHub Actions Docker IAM Least-Privilege CloudWatch Fine-Tuned Models

06 / TAKEAWAYIf Your AI Only Works in a Notebook, You Don't Have a Product

The gap between a working model and a shippable AI product is where most organizations stall. Containerization, orchestration, state management, typed IaC, CI/CD, application delivery — these aren't afterthoughts to model work. They're the product. The model gets the headline. The platform earns the revenue.

If you're sitting on proprietary AI models that only your researchers can run, the message is unambiguous: the model is not the product. The infrastructure is. That's the gap our AI & ML engagements exist to close.

Have AI models trapped in notebooks?

We partner with AI-first teams to productionize fine-tuned models into governed, observable, scalable platforms on AWS — containerized, orchestrated, typed end-to-end. No probabilistic infrastructure. No single-researcher bottlenecks.

Book a strategy call

Explore more

Full multi-model AI case study AI/ML engagements The 14-step security lockdown (same client) The TSP team

The Model Is Not the Product: Shipping Multi-Model AI from Notebooks to Production on AWS

01 / THESISThe Research-to-Production Gap Is the Real AI Engineering Problem

Colab Research Artifact

Production Platform on AWS

02 / ORCHESTRATIONExecute the State Machine. Click Any State to Inspect It.

Why Step Functions (and not a Python script calling Lambdas)

Why containerized Lambda (and not ECS / SageMaker endpoints)

03 / CONTAINERIZATIONNotebook → ECR Image → Lambda

04 / IACTyped CDK, Not LLM-Generated Terraform

05 / OUTCOMESWhat Shipped

Stack

06 / TAKEAWAYIf Your AI Only Works in a Notebook, You Don't Have a Product

Have AI models trapped in notebooks?

Join Our Free Trial