How we productionized a fine-tuned, multi-model AI pipeline for an elite executive advisory firm — taking proprietary models trapped in Colab and shipping them as a Step Functions-orchestrated, container-based, CDK-defined platform with zero idle compute.
output_2 = run_analysis(output_1) # hope you ran cell 3 first
output_3 = run_synthesis(output_2)RuntimeError: session state lost after runtime restart. rerun from cell 1.
An executive advisory firm had proprietary, fine-tuned AI models for analyzing coaching session transcripts with seven- through ten-figure entrepreneurs. The models worked. The problem was that they only worked in Colab — manually, by one researcher, with no path to production. Tech Stack Playbook built the production system that made them useful.
Below: the Step Functions state machine, the containerization pattern, why we rejected LLM-generated IaC in favor of typed CDK, and the deterministic-vs-probabilistic tension that defined every architecture call.
01 / THESISThe Research-to-Production Gap Is the Real AI Engineering Problem
The AI industry talks about model research as if that's the hard part. For the teams we work with, the models are often the easy part. What's hard is everything that happens between "the notebook works on my laptop" and "the platform works for our clients."
This engagement is the archetype: sophisticated, domain-specific fine-tuned models that extracted behavioral signals from private coaching sessions — capable work, valuable output, real IP. All of it locked inside Jupyter. Manually ordered cells. Implicit state. Google Drive–mounted checkpoints. A single researcher who could run it. A pipeline that existed in one Chrome tab at a time.
Colab Research Artifact
- Models lived in Jupyter with hardcoded Drive paths
- One researcher could run it. Nobody else.
- Sequential models, no orchestration, no retries
- Runtime state evaporated on every kernel restart
- No IaC, no CI/CD, no environment management
- Outputs trapped in notebook cells — no persistence
Production Platform on AWS
- Models containerized to ECR — same images dev & prod
- Any engineer can trigger a run via standard PR flow
- Step Functions orchestrates the multi-model DAG
- DynamoDB persists every generation + intermediate state
- 100% CDK-typed infra, GitHub Actions CI/CD pipeline
- Application layer surfaces structured reports
That pattern — stakes high, clients paying for institutional-grade analysis, models trapped in notebooks — is not unusual. It's the default state of AI capability at most firms that invest in custom model work. Pair this with the 14-step security lockdown we shipped for the same client and you see the full picture: the infrastructure to run the models has to be as enterprise-grade as the clientele.
02 / ORCHESTRATIONExecute the State Machine. Click Any State to Inspect It.
This is the multi-model pipeline as a Step Functions state machine. Hit Execute to watch the pulse travel through each containerized Lambda stage. Click any state — running, completed, or pending — to inspect its Lambda function, container image, input/output schema, and execution profile.
Why Step Functions (and not a Python script calling Lambdas)
A for-loop calling four Lambdas sequentially would work until it didn't — until a transient throttling error in stage three meant the whole pipeline fell on the floor with no retry, no observability, and no resume. Step Functions gives you structured state transitions, per-step retry policies, built-in error handling paths, and an execution history that's queryable for months. The visual console replaces the tribal knowledge of "ask the researcher which notebook cell this was."
Why containerized Lambda (and not ECS / SageMaker endpoints)
Three constraints aligned with Lambda's execution model: each model generation finished well inside 15 minutes, sessions were fully self-contained (no state carryover between invocations), and the workload was bursty with long idle gaps. Lambda's container image support means the exact Docker image we run locally ships to production unchanged. ECS would have introduced a 24/7 cost floor for a workload that's idle most of the day. SageMaker endpoints would have added cost and complexity we didn't need for this inference pattern.
03 / CONTAINERIZATIONNotebook → ECR Image → Lambda
Every model becomes a Docker image deployed to ECR, then invoked as a Lambda function. One definition, one artifact, one path through dev and prod. The container handles dependency pinning, model-weight loading, and the Lambda runtime API contract.
# Base image provides the Lambda runtime API FROM public.ecr.aws/lambda/python:3.11 # Pin dependencies — no surprises between dev and prod COPY requirements.txt ${LAMBDA_TASK_ROOT} RUN pip install --no-cache-dir -r requirements.txt # Model weights baked into the image (immutable, reproducible) COPY model_weights/ ${LAMBDA_TASK_ROOT}/model_weights/ COPY handler.py ${LAMBDA_TASK_ROOT} # Lambda runtime calls handler.lambda_handler(event, context) CMD [ "handler.lambda_handler" ]
04 / IACTyped CDK, Not LLM-Generated Terraform
This engagement made one decision very early: no LLM-generated infrastructure. Not because LLMs are bad at writing Terraform — they're fine at it. Because the environment serves private strategic data from seven- to ten-figure entrepreneurs, and probabilistic infrastructure definitions are a category error for clients at that tier. Every stack gets the same hardening pattern we documented in our multi-account AWS modernization work.
CDK in TypeScript gave us what the problem needed: real programming constructs (loops, conditionals, abstractions), type safety across every resource definition, and a deterministic synth that produces the same CloudFormation output on every run. Same input, same output. No drift. No "it worked in my agent."
import { Stack, StackProps, Duration } from 'aws-cdk-lib'; import { DockerImageFunction, DockerImageCode } from 'aws-cdk-lib/aws-lambda'; import { StateMachine, Chain } from 'aws-cdk-lib/aws-stepfunctions'; import { LambdaInvoke } from 'aws-cdk-lib/aws-stepfunctions-tasks'; export class CoachingIntelligenceStack extends Stack { constructor(scope: Construct, id: string, props?: StackProps) { super(scope, id, props); // Each model is a typed Lambda construct — configuration is code. const models = ['extract', 'analyze', 'synthesize'].map(name => new DockerImageFunction(this, `Model-${name}`, { code: DockerImageCode.fromImageAsset(`./models/${name}`), memorySize: 3008, // right-sized per model profile timeout: Duration.minutes(15), // Lambda container max environment: { LOG_LEVEL: 'INFO' }, }) ); // Chain tasks into the state machine. Retry policy per step. const definition = Chain.start( new LambdaInvoke(this, 'Extract', { lambdaFunction: models[0] }) ).next( new LambdaInvoke(this, 'Analyze', { lambdaFunction: models[1] }) ).next( new LambdaInvoke(this, 'Synthesize', { lambdaFunction: models[2] }) ); new StateMachine(this, 'CoachingIntelPipeline', { definition }); } }
That same file loops, types, and refactors. Add a fifth model? One line in the array. Change memory sizing? One property. Rename the state machine? TypeScript tells you the 14 downstream references that need updating before you merge. That's the part an LLM generating YAML can't give you — feedback at edit time.
The AI models produce probabilistic outputs. The infrastructure that runs them must be entirely deterministic. That tension — managing probabilistic AI workloads through typed, auditable infrastructure code — was the design principle behind every call we made.
05 / OUTCOMESWhat Shipped
Proprietary fine-tuned models now run as a managed, production-grade pipeline available to the entire team.
Lambda-based execution means the firm pays only for actual model inference seconds — no 24/7 endpoint tax.
Every Lambda, state machine, table, role, and ECR repo defined in deterministic, version-controlled TypeScript.
Single-researcher bottleneck eliminated. GitHub Actions pipeline ships infrastructure and model updates through standard PR flow.
Stack
06 / TAKEAWAYIf Your AI Only Works in a Notebook, You Don't Have a Product
The gap between a working model and a shippable AI product is where most organizations stall. Containerization, orchestration, state management, typed IaC, CI/CD, application delivery — these aren't afterthoughts to model work. They're the product. The model gets the headline. The platform earns the revenue.
If you're sitting on proprietary AI models that only your researchers can run, the message is unambiguous: the model is not the product. The infrastructure is. That's the gap our AI & ML engagements exist to close.
Have AI models trapped in notebooks?
We partner with AI-first teams to productionize fine-tuned models into governed, observable, scalable platforms on AWS — containerized, orchestrated, typed end-to-end. No probabilistic infrastructure. No single-researcher bottlenecks.
Book a strategy call