All posts

Inheriting Complexity: How We Modernized a Multi-Account AWS Data Platform

โ—‰ Cloud Engineering · IaC · Governance

No greenfield rebuild. No one-time cleanup. Just the disciplined work of inheriting a fragmented AWS estate, establishing governance, and shipping a Terraform/Terragrunt foundation with a 12–24 month modernization roadmap that leadership could actually plan against.

By: TSP Engineering Team · 15 min read · Terraform · Terragrunt · IAM · GitHub Actions
      discovery.sh — running multi-account inventory
$ python tsp-aws-inventory.py --all-accounts --drift-detect [INFO] Scanning 8 AWS accounts across 4 regions... [INFO] Inventoried 1,247 resources · 73 IAM roles · 14 long-lived access keys [WARN] 34 resources unmanaged by IaC (created via console) [WARN] 12 IAM policies grant *:* on * — broad blast radius [ALERT] Long-lived access keys older than 365 days: 6 found [ALERT] Inbound data connectors with no documented owner: 4 # this is what "organically grown" actually looks like $ tsp-modernize plan --phase=foundation 
Multi
Account AWS Estate
100%
IaC Foundation
12–24mo
Modernization Roadmap
0
Long-Lived Creds Remaining
TL;DR

A multinational health research and advocacy organization had a multi-account AWS data platform that had grown organically across years and teams — fragmented IaC, inconsistent IAM, manual deployments, opaque data lineage. Tech Stack Playbook re-architected the foundation with Terraform/Terragrunt, GitHub Actions CI/CD, hardened IAM, and a phased modernization roadmap leadership could plan against.

Below: the multi-account topology, the explorable Terragrunt module tree we shipped, and the modernization roadmap that took the estate from drift-by-default to governed-by-default.

01 / INHERITEDWhat "Organically Grown" Actually Means

Greenfield AWS work gets the case studies. Inherited AWS work gets the actual hours. Most enterprise cloud estates were built across multiple phases of work, by multiple teams, against changing best practices, under shifting deadlines. The result is recognizable across every engagement: fragmented infrastructure patterns, inconsistent IAM, multiple inbound data connectors feeding silos with unclear lineage, and no programmatic source of truth for what's actually running.

That was the brief here. A multinational health research enterprise running a multi-account AWS data platform supporting research, analytics, and partner data exchange. The platform worked. It also carried compounding risk in every dimension we surveyed. Our mandate was clear: stabilize without disrupting operations, then chart a modernization path leadership could fund and execute against.

This is the same shape of problem we ran into during the 14-step security lockdown for a different client — but executed proactively rather than after a breach. The patterns we ship are identical because the failure modes are identical.

02 / TOPOLOGYThe Multi-Account AWS Foundation

The first deliverable wasn't code — it was a governed account topology. Workloads separated by environment and trust boundary. Standardized baselines. A clear audit account that catches every CloudTrail event in tamper-evident storage. A shared services account so logging, secrets, and identity stop being copy-pasted across every workload account.

โ—‰ AWS Organization — Account Topology & Trust Boundaries
 

The topology is deliberately boring. Boring is the goal. An audit account that nobody touches except the security team. Shared services so logging, identity, and secrets aren't copy-pasted across workload accounts. A workloads OU where production and pre-production live with clear separation. A researcher sandbox where data-team experimentation happens without risk to anything else. SCPs at the management level enforcing what no individual account can override.

03 / IACClick Through the Terraform/Terragrunt Module Tree

The IaC foundation is where the engagement compounds the most value. Reusable Terraform modules for the patterns that repeat (VPCs, IAM roles, secrets) live in one library. Terragrunt orchestrates the per-environment composition with DRY config — same module instantiated against dev, staging, and prod, with environment-scoped values inherited from a single root config. Click any file in the tree to see what's actually inside.

โ—‰ infrastructure/ explorable repository tree
 
— select a file —  
Click any file in the tree to inspect its contents.

Why Terraform and Terragrunt

Terraform handles the resource definitions. Terragrunt handles the orchestration around them — environment promotion, remote state configuration, dependency wiring, and DRY config inheritance. The combination eliminates an entire class of copy-paste bugs that always show up the moment a third environment gets added to a Terraform-only setup.

The pattern: a small set of well-tested modules live in modules/. A Terragrunt config tree mirrors the account/environment topology, with each leaf composing modules with environment-specific inputs. Add a new environment? It's a directory. Update a module everywhere it's used? It's a single source change. The same hybrid IaC discipline runs through our work re-architecting The Capital and the CDK-based multi-model AI pipeline — different tools, identical philosophy: deterministic, typed, version-controlled, peer-reviewed.

04 / ROADMAPThe Phased Modernization Plan Leadership Can Plan Against

Discovery surfaces the truth. The roadmap turns it into something a CFO and a CTO can both agree to. We delivered a phased plan with named outcomes per phase, dependency-aware sequencing, and explicit risk callouts — so leadership knew what was at stake at each stage and what would unblock at the end of it.

โ—‰ 12–24 Month Modernization Roadmap
Phased · Stage-Gated · Risk-Annotated
PHASE 00 · COMPLETE

Discovery & Inventory

Programmatic AWS resource inventory across every account using custom Python tooling. Drift detection. IAM policy audit. Data connector topology mapping. Evidence-based prioritization.

PHASE 01 · COMPLETE

Foundation — Multi-Account Baselines

Refined account topology with SCP-enforced guardrails. Standardized baselines per OU. Org-wide CloudTrail with Object Lock. Identity Center for federated access.

PHASE 02 · COMPLETE

IaC & CI/CD — Terraform + Terragrunt + Actions

Module library shipped. Terragrunt orchestration in place. GitHub Actions plan/apply workflows with policy gates and federated OIDC auth. No more click-ops.

PHASE 03 · COMPLETE

IAM & Secrets Hardening

Long-lived access keys eliminated. Role trust policies reviewed and tightened. Secrets migrated into Secrets Manager with automatic rotation. Federated identity is now the only path in.

PHASE 04 · IN PROGRESS

Data Platform — Connector Onboarding Standard

Landing zone patterns shipped. New inbound data sources onboard against a documented connector contract with clear ownership, lineage, and governance metadata.

PHASE 05 · NEXT

Workload Modernization — Managed Services Migration

Self-managed components migrated onto AWS managed services where it makes sense. Cost optimization. Operational handoff to the enterprise's internal team with documentation and runbooks.

โ—‰ Key Insight

This was not a greenfield rebuild or a one-time cleanup. It was the disciplined work of inheriting complexity, establishing governance, and building a durable foundation for long-term platform growth. Most enterprise cloud work looks like this. Most case studies don't.

05 / OUTCOMESWhat Shipped

Repeatable
Infrastructure as Code

Every change is version-controlled, peer-reviewed, and automated end-to-end. No more manual console drift, no more "who changed what" mysteries.

Federated
Identity & Secrets

Long-lived credentials retired across the estate. Identity Center is the only path in. Secrets in Secrets Manager with automated rotation.

Onboardable
Data Platform

New connectors and sources onboard against a consistent, documented pattern. Lineage and ownership are defined, not discovered later.

12–24mo
Modernization Roadmap

Phased plan with named outcomes per phase, giving leadership clarity to plan, fund, and resource the next 18 months of platform investment.

Stack

AWS Multi-Account AWS Organizations + SCPs Terraform Terragrunt GitHub Actions OIDC Federation IAM Identity Center CloudTrail S3 Object Lock Secrets Manager KMS Python (drift tooling) Data Landing Zones

06 / TAKEAWAYThe Reference Pattern for Inherited Cloud Estates

Most enterprise AWS accounts didn't start out fragmented. They got that way because deadlines moved, teams turned over, best practices evolved, and nobody had the budget to rewrite the foundation while shipping new product. The work to fix it isn't glamorous. It also isn't optional. The estate either gets the disciplined modernization treatment, or the next breach, audit, or critical migration becomes the forcing function — and at that point you're paying triple.

If your cloud estate has gotten away from you — fragmented IaC, IAM you can't fully audit, connectors with no documented owner — that's the engagement we exist for. Same pattern shows up across every modern enterprise our cloud engineering work touches.

Inherited a fragmented AWS estate that needs a path forward?

We partner with enterprise teams to discover, govern, and modernize multi-account AWS environments — Terraform/Terragrunt foundations, federated identity, automated CI/CD, and phased roadmaps leadership can plan against. No greenfield daydreams. Real cloud, real estate, real plan.

Book a strategy call  
Explore more