All posts

The Model Is Not the Product: Shipping Multi-Model AI from Notebooks to Production on AWS

◉ AI / ML Engineering · Platform

How we productionized a fine-tuned, multi-model AI pipeline for an elite executive advisory firm — taking proprietary models trapped in Colab and shipping them as a Step Functions-orchestrated, container-based, CDK-defined platform with zero idle compute.

By: TSP Engineering Team · 14 min read · Step Functions · Lambda · Docker · ECR · CDK
      multi_model_pipeline.ipynb — colab research environment
[ IN • 1 ]manual execution
!pip install torch==1.13.1 transformers==4.28 huggingface-hub==0.14.1 # good luck reproducing this
[ IN • 2 ]hardcoded path
MODEL_PATH = "/content/drive/MyDrive/models/extraction-v7-final-FINAL-use-this-one.pt"
[ IN • 14 ]sequential + stateful
output_1 = run_extraction(transcript)
output_2 = run_analysis(output_1) # hope you ran cell 3 first
output_3 = run_synthesis(output_2)RuntimeError: session state lost after runtime restart. rerun from cell 1.
Multi
Model Pipeline
$0
Idle Compute
100%
Typed CDK IaC
$100M+
Per-Client Revenue Tier
TL;DR

An executive advisory firm had proprietary, fine-tuned AI models for analyzing coaching session transcripts with seven- through ten-figure entrepreneurs. The models worked. The problem was that they only worked in Colab — manually, by one researcher, with no path to production. Tech Stack Playbook built the production system that made them useful.

Below: the Step Functions state machine, the containerization pattern, why we rejected LLM-generated IaC in favor of typed CDK, and the deterministic-vs-probabilistic tension that defined every architecture call.

01 / THESISThe Research-to-Production Gap Is the Real AI Engineering Problem

The AI industry talks about model research as if that's the hard part. For the teams we work with, the models are often the easy part. What's hard is everything that happens between "the notebook works on my laptop" and "the platform works for our clients."

This engagement is the archetype: sophisticated, domain-specific fine-tuned models that extracted behavioral signals from private coaching sessions — capable work, valuable output, real IP. All of it locked inside Jupyter. Manually ordered cells. Implicit state. Google Drive–mounted checkpoints. A single researcher who could run it. A pipeline that existed in one Chrome tab at a time.

BEFORE

Colab Research Artifact

  • Models lived in Jupyter with hardcoded Drive paths
  • One researcher could run it. Nobody else.
  • Sequential models, no orchestration, no retries
  • Runtime state evaporated on every kernel restart
  • No IaC, no CI/CD, no environment management
  • Outputs trapped in notebook cells — no persistence
AFTER

Production Platform on AWS

  • Models containerized to ECR — same images dev & prod
  • Any engineer can trigger a run via standard PR flow
  • Step Functions orchestrates the multi-model DAG
  • DynamoDB persists every generation + intermediate state
  • 100% CDK-typed infra, GitHub Actions CI/CD pipeline
  • Application layer surfaces structured reports

That pattern — stakes high, clients paying for institutional-grade analysis, models trapped in notebooks — is not unusual. It's the default state of AI capability at most firms that invest in custom model work. Pair this with the 14-step security lockdown we shipped for the same client and you see the full picture: the infrastructure to run the models has to be as enterprise-grade as the clientele.

02 / ORCHESTRATIONExecute the State Machine. Click Any State to Inspect It.

This is the multi-model pipeline as a Step Functions state machine. Hit Execute to watch the pulse travel through each containerized Lambda stage. Click any state — running, completed, or pending — to inspect its Lambda function, container image, input/output schema, and execution profile.

◉ coaching-intelligence-pipeline arn:aws:states:us-east-1:•••:stateMachine/coaching-intel-pipeline
Idle
 
Start
 
 
 
Extract
λ · container
 
 
 
Analyze
λ · container
 
 
 
Synthesize
λ · container
 
 
 
Persist
DynamoDB
 
 
End
Click any state to inspect
 
The progressive pipeline transforms a raw coaching transcript through four fine-tuned models, then persists structured intelligence to DynamoDB. Hit Execute to watch it run, or click any state above.

Why Step Functions (and not a Python script calling Lambdas)

A for-loop calling four Lambdas sequentially would work until it didn't — until a transient throttling error in stage three meant the whole pipeline fell on the floor with no retry, no observability, and no resume. Step Functions gives you structured state transitions, per-step retry policies, built-in error handling paths, and an execution history that's queryable for months. The visual console replaces the tribal knowledge of "ask the researcher which notebook cell this was."

Why containerized Lambda (and not ECS / SageMaker endpoints)

Three constraints aligned with Lambda's execution model: each model generation finished well inside 15 minutes, sessions were fully self-contained (no state carryover between invocations), and the workload was bursty with long idle gaps. Lambda's container image support means the exact Docker image we run locally ships to production unchanged. ECS would have introduced a 24/7 cost floor for a workload that's idle most of the day. SageMaker endpoints would have added cost and complexity we didn't need for this inference pattern.

03 / CONTAINERIZATIONNotebook → ECR Image → Lambda

Every model becomes a Docker image deployed to ECR, then invoked as a Lambda function. One definition, one artifact, one path through dev and prod. The container handles dependency pinning, model-weight loading, and the Lambda runtime API contract.

dockerfile · model container
# Base image provides the Lambda runtime API
FROM public.ecr.aws/lambda/python:3.11

# Pin dependencies — no surprises between dev and prod
COPY requirements.txt ${LAMBDA_TASK_ROOT}
RUN pip install --no-cache-dir -r requirements.txt

# Model weights baked into the image (immutable, reproducible)
COPY model_weights/ ${LAMBDA_TASK_ROOT}/model_weights/
COPY handler.py ${LAMBDA_TASK_ROOT}

# Lambda runtime calls handler.lambda_handler(event, context)
CMD [ "handler.lambda_handler" ]

04 / IACTyped CDK, Not LLM-Generated Terraform

This engagement made one decision very early: no LLM-generated infrastructure. Not because LLMs are bad at writing Terraform — they're fine at it. Because the environment serves private strategic data from seven- to ten-figure entrepreneurs, and probabilistic infrastructure definitions are a category error for clients at that tier. Every stack gets the same hardening pattern we documented in our multi-account AWS modernization work.

CDK in TypeScript gave us what the problem needed: real programming constructs (loops, conditionals, abstractions), type safety across every resource definition, and a deterministic synth that produces the same CloudFormation output on every run. Same input, same output. No drift. No "it worked in my agent."

typescript · cdk · pipeline stack
import { Stack, StackProps, Duration } from 'aws-cdk-lib';
import { DockerImageFunction, DockerImageCode } from 'aws-cdk-lib/aws-lambda';
import { StateMachine, Chain } from 'aws-cdk-lib/aws-stepfunctions';
import { LambdaInvoke } from 'aws-cdk-lib/aws-stepfunctions-tasks';

export class CoachingIntelligenceStack extends Stack {
  constructor(scope: Construct, id: string, props?: StackProps) {
    super(scope, id, props);

    // Each model is a typed Lambda construct — configuration is code.
    const models = ['extract', 'analyze', 'synthesize'].map(name =>
      new DockerImageFunction(this, `Model-${name}`, {
        code: DockerImageCode.fromImageAsset(`./models/${name}`),
        memorySize: 3008,                   // right-sized per model profile
        timeout: Duration.minutes(15),      // Lambda container max
        environment: { LOG_LEVEL: 'INFO' },
      })
    );

    // Chain tasks into the state machine. Retry policy per step.
    const definition = Chain.start(
      new LambdaInvoke(this, 'Extract',    { lambdaFunction: models[0] })
    ).next(
      new LambdaInvoke(this, 'Analyze',    { lambdaFunction: models[1] })
    ).next(
      new LambdaInvoke(this, 'Synthesize', { lambdaFunction: models[2] })
    );

    new StateMachine(this, 'CoachingIntelPipeline', { definition });
  }
}

That same file loops, types, and refactors. Add a fifth model? One line in the array. Change memory sizing? One property. Rename the state machine? TypeScript tells you the 14 downstream references that need updating before you merge. That's the part an LLM generating YAML can't give you — feedback at edit time.

◉ Key Insight

The AI models produce probabilistic outputs. The infrastructure that runs them must be entirely deterministic. That tension — managing probabilistic AI workloads through typed, auditable infrastructure code — was the design principle behind every call we made.

05 / OUTCOMESWhat Shipped

Research → Prod
Models Liberated from Colab

Proprietary fine-tuned models now run as a managed, production-grade pipeline available to the entire team.

Zero
Idle Compute Cost

Lambda-based execution means the firm pays only for actual model inference seconds — no 24/7 endpoint tax.

100%
Typed CDK IaC

Every Lambda, state machine, table, role, and ECR repo defined in deterministic, version-controlled TypeScript.

Any PR
Any Engineer Can Deploy

Single-researcher bottleneck eliminated. GitHub Actions pipeline ships infrastructure and model updates through standard PR flow.

Stack

AWS Lambda (container) Amazon ECR AWS Step Functions Amazon DynamoDB AWS CDK (TypeScript) GitHub Actions Docker IAM Least-Privilege CloudWatch Fine-Tuned Models

06 / TAKEAWAYIf Your AI Only Works in a Notebook, You Don't Have a Product

The gap between a working model and a shippable AI product is where most organizations stall. Containerization, orchestration, state management, typed IaC, CI/CD, application delivery — these aren't afterthoughts to model work. They're the product. The model gets the headline. The platform earns the revenue.

If you're sitting on proprietary AI models that only your researchers can run, the message is unambiguous: the model is not the product. The infrastructure is. That's the gap our AI & ML engagements exist to close.

Have AI models trapped in notebooks?

We partner with AI-first teams to productionize fine-tuned models into governed, observable, scalable platforms on AWS — containerized, orchestrated, typed end-to-end. No probabilistic infrastructure. No single-researcher bottlenecks.

Book a strategy call  
Explore more