We Recovered a Compromised AWS Account in Under 24 Hours: Here's the 14-Step Lockdown Protocol

 

We Recovered an AWS Account in 24 Hours — The 14-Step Lockdown Protocol | Tech Stack Playbook®

⚡ Incident Response · DevSecOps

We Recovered a Compromised AWS Account in Under 24 Hours. Here's the 14-Step Lockdown That Shipped With It.

AWS estimated 4 days. We restored full operations in under 24 and hardened the environment in 48 — a protocol later presented at AWS re:Inforce. This is the playbook, with the actual IAM policies, CloudTrail queries, and S3 lockdown patterns.

By Brian H. Hough · AWS DevTools Hero · 14 min read · Security · IaC · IAM
      cloudtrail-alert.sh — ssh admin@tsp-incident-response
$ aws cloudtrail lookup-events --lookup-attributes AttributeKey=Username,AttributeValue=root [2026-04-19 02:14:07 UTC] Retrieving CloudTrail events... ⚠ ALERT: Root account login detected — source: 185.220.101.47 (Tor exit node) ⚠ ALERT: 247,382 SES SendEmail API calls in past 4 hours ⚠ ALERT: 14 S3 PutObject events to s3://client-redacted/malware/ ⚠ AWS Account Suspension triggered by AWS Trust & Safety [2026-04-19 02:14:11 UTC] AWS estimated resolution: 4 business days $ page tsp-security-oncall --severity=P0 ✔ Paged. Response team mobilizing. 
<24h
Full Recovery
14
Lockdown Steps
75%
Faster Than AWS Est.
re:Inforce
Protocol Published
TL;DR

An executive advisory firm's AWS root account was compromised — attacker launched a quarter-million-email abuse campaign, uploaded malware to S3, and got the account suspended. Tech Stack Playbook's 48-hour incident response restored operations in under 24 hours against AWS's 4-day estimate.

The lockdown protocol that followed — 14 steps, from root-credential elimination to JWT-gated API re-architecture — is below. Every step includes the specific controls we'd ship today. Presented at AWS re:Inforce as a reference framework for developer-led incident response.

01 / ANATOMYHow the Breach Actually Unfolded

The breach followed the oldest playbook in cloud security: unprotected root credentials, no MFA, broad blast radius. What made it memorable was the speed of escalation — from first unauthorized API call to full account suspension, under four hours. Senior engineers have seen plenty of security engagements, but this one was instructive: every step the attacker took exploited a default that should never have been the default.

◉ Attack Escalation — T+0h to T+4h
 
STAGE 01
Root Key Compromise
Unsecured credentials exposed. No MFA gate.
 
STAGE 02
SES Abuse Campaign
247K+ unauthorized emails in one evening.
 
STAGE 03
S3 Malware Upload
Payloads dropped in 14 buckets. Public ACLs.
 
STAGE 04
Account Suspended
AWS Trust & Safety pulls the plug.

The tell at Stage 02 was volume: legitimate SES usage for this workload averaged ~400 emails per day. A CloudTrail lookup across a 4-hour window surfaced the divergence immediately — and it's the first query we run in any modern incident response engagement:

bash
# Surface SES API anomalies in the last 4 hours — volume + source IP
aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=EventName,AttributeValue=SendEmail \
  --start-time $(date -u -d '4 hours ago' +%s) \
  --query 'Events[].{Time:EventTime,User:Username,IP:CloudTrailEvent}' \
  --output table \
  | awk '/sourceIPAddress/ {print $NF}' | sort | uniq -c | sort -rn

02 / TIMELINEThe 48-Hour Recovery Arc

Drag the marker to scrub through the 48 hours. Each checkpoint is a real decision point — where most incident responses either compound or contain.

◉ Recovery Timeline T + 24:00h
T+0h T+12h T+24h T+36h T+48h
 
T + 24:00h
Full Operations Restored

Platform back online. All client-facing APIs serving traffic. Root credentials retired, initial MFA controls live, S3 bucket policies audited and locked. 75% ahead of AWS's stated recovery estimate.

03 / PROTOCOLThe 14-Step Security Lockdown

With operations restored, the real work began: hardening the environment so the same class of breach couldn't happen twice. Click into any step to see the specific controls, policies, and code we'd ship today. This protocol pulls directly from lessons learned on our multi-account AWS governance work and adjacent DevSecOps engagements.

The root account is not an operator account. First 60 minutes: rotate the root password, force-retire every access key under root, enable MFA (hardware preferred), and enforce an SCP that denies root usage for anything except the handful of tasks that genuinely require it (billing signup, account close, root-only service endpoints).

json · SCP
{
  "Version": "2012-10-17",
  "Statement": [{
    "Sid": "DenyRootUserActions",
    "Effect": "Deny",
    "Action": "*",
    "Resource": "*",
    "Condition": {
      "StringLike": {
        "aws:PrincipalArn": "arn:aws:iam::*:root"
      }
    }
  }]
}

Long-lived IAM users are an anti-pattern in 2026. Every human operator gets federated access through IAM Identity Center (formerly AWS SSO) with short-lived credentials. Every remaining IAM user gets deleted or disabled. Every long-lived access key gets rotated to a role-based assumption pattern.

bash · audit
# Find every active long-lived access key older than 90 days
aws iam list-users --query 'Users[*].UserName' --output text \
  | tr '\t' '\n' \
  | xargs -I {} aws iam list-access-keys --user-name {} \
      --query 'AccessKeyMetadata[?Status==`Active`]' \
      --output table

The attacker had AdministratorAccess because the compromised identity had it. Rewrite every policy to grant the minimum set of actions on the minimum set of resources. Restrict IAM creation itself to a small, named admin group — protected by both MFA and an SCP boundary.

Fastest path: use IAM Access Analyzer's policy generation to diff intended vs. actual privileges, then prune.

Enable Block Public Access at the account level, not just per-bucket. Audit every bucket policy for "Principal": "*" and kill it on sight. Every object upload triggers a malware scan (GuardDuty Malware Protection or an S3 EventBridge → Lambda → ClamAV pipeline).

bash · block public
aws s3control put-public-access-block \
  --account-id $AWS_ACCOUNT_ID \
  --public-access-block-configuration \
    BlockPublicAcls=true,IgnorePublicAcls=true,\
BlockPublicPolicy=true,RestrictPublicBuckets=true

Databases move to private subnets with no public ingress. API Gateway endpoints enforce JWT authorization at the edge, validated against a managed auth layer (Cognito or equivalent). Client data is never served through an unauthenticated path.

Sandbox SES. Explicit sending authorization policies. Configuration sets with per-send quotas and bounce/complaint tracking. Anomaly detection on send volume (in this case, anything above 2× rolling 7-day baseline pages on-call immediately).

Organization-wide CloudTrail, multi-region, log file validation enabled, delivered to a dedicated log-archive account with S3 Object Lock in compliance mode. If an attacker can rewrite your audit trail, you don't have an audit trail.

Continuous threat detection via GuardDuty. Findings aggregated in Security Hub across every account. EventBridge rules route high-severity findings to PagerDuty with a documented on-call rotation. No silent findings.

Customer-managed KMS keys for every tier of sensitive data. Key policies scoped to specific IAM principals. Automatic annual rotation. Client transcripts, PII, and payment data never stored in plaintext anywhere.

No plaintext secrets in env vars, config files, or repos. Every credential lives in Secrets Manager with Lambda-backed rotation (we default to 58-day cycles to stay well inside the 60-day quarterly audit window). Applications pull at runtime, not at deploy.

Every infrastructure change lands through Terraform/Terragrunt with peer review. Console access is read-mostly. Drift detection runs nightly. This is the same IaC-first pattern we implemented during our multi-account AWS modernization — it pays compounding dividends the minute an incident happens.

AWS WAF in front of every internet-facing API with managed rule groups (OWASP Top 10, bad bots, known-bad IPs) and custom rate-limit rules scoped per authenticated identity.

The first IR engagement is always the most expensive, because there's no runbook. We left the client with documented runbooks for root compromise, data exfiltration, SES abuse, and account suspension — plus quarterly tabletop drills that test them.

Security is never done. Quarterly reviews walk the full attack surface, audit new IAM policies added since last cycle, re-validate that GuardDuty/Security Hub have no stale findings, and update the threat model as the platform evolves.

◉ Key Insight

Client coaching transcripts — private strategy sessions with nine- and ten-figure entrepreneurs — had been stored in publicly accessible S3 buckets and served through unauthenticated API endpoints. The breach was the forcing function. The lockdown was the product.

04 / ARCHITECTUREBefore & After — What Actually Changed

The post-lockdown architecture isn't exotic. It's what a security-first AWS account looks like when every default has been explicitly hardened instead of inherited.

BEFORE Pre-Breach Architecture
 
IAM
Root keys active · No MFA
 
S3 Buckets
Public ACLs · No scanning
 
SES
Out of sandbox · No quota
 
CloudTrail
Single region · No validation
 
API Gateway
Unauthenticated endpoints
AFTER Post-Lockdown Architecture
 
IAM Identity Center
SSO · Short-lived creds
 
S3 + BPA
Block Public + GuardDuty
 
SES + Quotas
Sandboxed · Rate-limited
 
Org CloudTrail
Multi-region · Object Lock
 
API + JWT + WAF
Authenticated · Rate-limited

05 / OUTCOMESWhat Shipped

<24h
Recovery Time

Full platform restored vs. AWS's 4-day estimate — minimizing downtime for ultra-high-net-worth clientele.

14/14
Lockdown Complete

All hardening steps implemented and validated within 48 hours of initial engagement.

0
Exposed Transcripts

Previously public client data locked down with KMS envelope encryption and JWT-gated delivery.

re:Inforce
Methodology Published

Protocol later presented at AWS re:Inforce as a framework for developer-led incident response.

Stack

IAM Identity Center CloudTrail GuardDuty Security Hub S3 Block Public Access KMS Secrets Manager AWS WAF JWT / OIDC Terraform Terragrunt CloudWatch

06 / TAKEAWAYWhy Developer-Led IR Beats a Ticket Queue

AWS's 4-day estimate isn't a judgment on AWS — it's a function of the generic queue every support request sits in. Developer-led incident response is faster because the people triaging the breach are the same people who can rewrite an SCP, ship a JWT middleware, and redeploy infrastructure as code before lunch. That's the difference between a ticket and a response team. It's also the argument for having security-minded engineers embedded before anything goes wrong, not after.

If you're reading this and thinking about your own root account MFA status — that's the right instinct. Go check right now. We'll wait.

Need an incident-response-ready AWS foundation?

We partner with enterprise teams to design, build, and ship production-grade cloud systems that are secure-by-design from day one — so breaches are contained, not catastrophic.

Book a strategy call  
Explore more

SUBSCRIBE FOR WEEKLY LIFE LESSONS

Lorem ipsum dolor sit amet, metus at rhoncus dapibus, habitasse vitae cubilia odio sed.

We hate SPAM. We will never sell your information, for any reason.