Runtime Incident Reports

Real Agentic Failures.
Real Costs. Real Containment.

Documented runtime incidents from Claude Code, Cursor, Windsurf, and multi-agent systems. Each incident maps to the governance system that would have prevented it.

CRP-2024-001Claude Code

The $1,100 Overnight Token Burn

Environment Governance
Timeline

11:47 PM — 6:23 AM (6h 36m unattended)

Blast Radius

$1,147 in API tokens consumed. Zero usable output.

Root Cause

Agent entered recursive retry loop on a failing test. No financial circuit breaker. No unattended execution limits. Agent burned through context window 14 times, each time restarting from scratch.

4.2M
tokens Consumed
$1,147
cost Usd
89
retries Detected
0 lines shipped
usable Output
Governance Containment

AI Cost Containment System would have halted execution at $25 budget cap (97.8% savings). Unattended timeout would have triggered at 30 minutes.

Deploy AI Cost Containment
CRP-2024-002Cursor

The 47-File Cursor Rewrite

Environment Governance
Timeline

2:15 PM — 2:52 PM (37 minutes)

Blast Radius

47 files modified. 12 new phantom dependencies introduced. 3 config files overwritten.

Root Cause

Agent was asked to refactor a single utility function. Without scope enforcement, it followed import chains across the entire codebase, "fixing" each file it touched. Ghost dependencies imported from packages not in package.json.

47
files Modified
12
phantom Deps
3
configs Overwritten
4.5 hours
rollback Time
Governance Containment

Repository Drift Prevention would have blocked out-of-scope mutations at file 2. Import validator would have caught phantom dependencies immediately.

Deploy Repository Drift Prevention
CRP-2024-003Claude Code + MCP

The .env Credential Leak via MCP

Tool Governance
Timeline

10:30 AM — 10:31 AM (instant)

Blast Radius

AWS access keys, database credentials, and Stripe API keys exposed to third-party MCP server.

Root Cause

Agent connected to an MCP tool server that requested file system access. Server read .env file containing production credentials. No context isolation. No capability manifest validation.

5
credentials Exposed
None
server Verification
None
context Isolation
3 days
detection Time
Governance Containment

MCP Governance System would have blocked .env access via file-guard, validated server against manifest, and enforced context isolation.

Deploy MCP Governance
CRP-2024-004Multi-Agent (CrewAI)

The $890 Agreement Loop

Skill Governance
Timeline

9:00 AM — 3:15 PM (6h 15m)

Blast Radius

$890 in compute. 340 turns of agents agreeing with each other. Zero tool invocations. Zero code produced.

Root Cause

Three agents entered an agreement loop — each validating the previous agent's output without performing any actual work. No turn limit. No tool-invocation requirement. No agreement loop detection.

340
total Turns
0
tool Invocations
$890
cost Usd
0 lines
code Produced
Governance Containment

Orchestration Entropy System would have detected the agreement loop at turn 10 and halted the workflow (99% cost prevention).

Deploy Orchestration Entropy
CRP-2024-005Cursor + GitHub

The Rubber-Stamp PR Avalanche

Skill Governance
Timeline

Sprint duration (2 weeks)

Blast Radius

34 AI-generated PRs merged with <2 min review. 8 contained bugs. 3 reached production. 1 caused a customer-facing outage.

Root Cause

AI code generation volume exceeded team review capacity. Engineers began rubber-stamping PRs to clear the queue. No confidence scoring. No review timer. No burnout detection.

34
prs Submitted
1.8 min
avg Review Time
8
bugs Shipped
1
production Incidents
Governance Containment

Verification Burden Collapse Prevention would have flagged rubber-stamp reviews, throttled AI generation when queue exceeded 8 PRs, and routed low-confidence code to deep review.

Deploy Verification Burden Collapse
CRP-2024-006Claude Code

Context Rot: Agent Forgot Its Own Architecture

Skill Governance
Timeline

10:00 AM — 1:45 PM (3h 45m)

Blast Radius

23 files corrupted with contradictory implementations. Agent began patching its own patches. 6 hours remediation.

Root Cause

After 90 minutes, the agent's context window filled. Original architecture instructions were pushed out. Agent continued generating code that contradicted the initial design, then tried to "fix" the contradictions by patching files it had just modified.

225 min
session Duration
23
files Corrupted
7
patch Chain Depth
6
remediation Hours
Governance Containment

Context Rot Prevention would have triggered checkpoint rotation at 65% utilization and mandatory semantic reset at 85%. Patch chain detector would have halted at depth 3.

Deploy Context Rot Prevention

Every incident above was preventable.

Deploy runtime governance infrastructure to contain these failures before they occur.