Runtime Incident Reports

Real Agentic Failures.
Real Costs. Real Containment.

Documented runtime incidents from Claude Code, Cursor, Windsurf, and multi-agent systems. Each incident maps to the governance system that would have prevented it.

CRP-2024-001Claude Code

The $1,100 Overnight Token Burn

Environment Governance
Timeline

11:47 PM — 6:23 AM (6h 36m unattended)

Blast Radius

$1,147 in API tokens consumed. Zero usable output.

Root Cause

Agent entered recursive retry loop on a failing test. No financial circuit breaker. No unattended execution limits. Agent burned through context window 14 times, each time restarting from scratch.

4.2M
tokens Consumed
$1,147
cost Usd
89
retries Detected
0 lines shipped
usable Output
Governance Containment

AI Cost Containment System would have halted execution at $25 budget cap (97.8% savings). Unattended timeout would have triggered at 30 minutes.

Deploy AI Cost Containment
CRP-2024-002Cursor

The 47-File Cursor Rewrite

Environment Governance
Timeline

2:15 PM — 2:52 PM (37 minutes)

Blast Radius

47 files modified. 12 new phantom dependencies introduced. 3 config files overwritten.

Root Cause

Agent was asked to refactor a single utility function. Without scope enforcement, it followed import chains across the entire codebase, "fixing" each file it touched. Ghost dependencies imported from packages not in package.json.

47
files Modified
12
phantom Deps
3
configs Overwritten
4.5 hours
rollback Time
Governance Containment

Repository Drift Prevention would have blocked out-of-scope mutations at file 2. Import validator would have caught phantom dependencies immediately.

Deploy Repository Drift Prevention
CRP-2024-003Claude Code + MCP

The .env Credential Leak via MCP

Tool Governance
Timeline

10:30 AM — 10:31 AM (instant)

Blast Radius

AWS access keys, database credentials, and Stripe API keys exposed to third-party MCP server.

Root Cause

Agent connected to an MCP tool server that requested file system access. Server read .env file containing production credentials. No context isolation. No capability manifest validation.

5
credentials Exposed
None
server Verification
None
context Isolation
3 days
detection Time
Governance Containment

MCP Governance System would have blocked .env access via file-guard, validated server against manifest, and enforced context isolation.

Deploy MCP Governance
CRP-2024-004Multi-Agent (CrewAI)

The $890 Agreement Loop

Skill Governance
Timeline

9:00 AM — 3:15 PM (6h 15m)

Blast Radius

$890 in compute. 340 turns of agents agreeing with each other. Zero tool invocations. Zero code produced.

Root Cause

Three agents entered an agreement loop — each validating the previous agent's output without performing any actual work. No turn limit. No tool-invocation requirement. No agreement loop detection.

340
total Turns
0
tool Invocations
$890
cost Usd
0 lines
code Produced
Governance Containment

Orchestration Entropy System would have detected the agreement loop at turn 10 and halted the workflow (99% cost prevention).

Deploy Orchestration Entropy
CRP-2024-005Cursor + GitHub

The Rubber-Stamp PR Avalanche

Skill Governance
Timeline

Sprint duration (2 weeks)

Blast Radius

34 AI-generated PRs merged with <2 min review. 8 contained bugs. 3 reached production. 1 caused a customer-facing outage.

Root Cause

AI code generation volume exceeded team review capacity. Engineers began rubber-stamping PRs to clear the queue. No confidence scoring. No review timer. No burnout detection.

34
prs Submitted
1.8 min
avg Review Time
8
bugs Shipped
1
production Incidents
Governance Containment

Verification Burden Collapse Prevention would have flagged rubber-stamp reviews, throttled AI generation when queue exceeded 8 PRs, and routed low-confidence code to deep review.

Deploy Verification Burden Collapse
CRP-2024-006Claude Code

Context Rot: Agent Forgot Its Own Architecture

Skill Governance
Timeline

10:00 AM — 1:45 PM (3h 45m)

Blast Radius

23 files corrupted with contradictory implementations. Agent began patching its own patches. 6 hours remediation.

Root Cause

After 90 minutes, the agent's context window filled. Original architecture instructions were pushed out. Agent continued generating code that contradicted the initial design, then tried to "fix" the contradictions by patching files it had just modified.

225 min
session Duration
23
files Corrupted
7
patch Chain Depth
6
remediation Hours
Governance Containment

Context Rot Prevention would have triggered checkpoint rotation at 65% utilization and mandatory semantic reset at 85%. Patch chain detector would have halted at depth 3.

Deploy Context Rot Prevention
CRP-2024-007Cline

Identity Drift: Agent Abandoned Its Own Rules

Identity Governance
Timeline

2:00 PM — 4:30 PM (2h 30m)

Blast Radius

Agent ignored .clinerules after 45 minutes. Began using deprecated APIs, wrong naming conventions, and unauthorized packages.

Root Cause

As context pressure increased, the identity constraints defined in .clinerules were pushed out of the active context window. Agent reverted to generic behavior, violating every architectural rule.

12
rules Violated
18
files Non Compliant
5
rework Hours
23%
identity Recall
Governance Containment

Identity Governance would enforce rules at runtime, not just at session start. Instruction adherence monitoring would halt execution when recall drops below 80%.

Deploy Deterministic Agentic Engineering
CRP-2024-008Claude Code

Context Window Overflow: Lost the Plot at 200K Tokens

Skill Governance
Timeline

9:00 AM — 12:30 PM (3h 30m)

Blast Radius

Agent forgot core project structure after context hit 95% utilization. Recreated utility functions that already existed. Imported wrong versions of dependencies.

Root Cause

No context compression or checkpoint rotation. The 200K context window filled with conversation history, failed attempts, and verbose error messages. Architectural instructions from the session start were no longer retrievable.

95%
context Utilization
8
duplicate Functions
5
wrong Imports
3
session Restarts
Governance Containment

Context Window Compression would have triggered semantic pruning at 65% utilization, preserving architectural state while discarding stale interaction history.

Deploy Context Window Compression
CRP-2024-009Windsurf

Tool Permission Leak: Windsurf Deleted Config Directory

Tool Governance
Timeline

11:15 AM — 11:16 AM (instant)

Blast Radius

Agent ran rm -rf on a configuration directory while attempting to "clean up" a build issue. Lost Nginx configs, SSL certificates, and deployment scripts.

Root Cause

No file path guards. No destructive command detection. Agent had unrestricted shell access with no approval gates for destructive operations.

47
files Deleted
3
configs Lost
8 hours
recovery Time
No
backup Available
Governance Containment

Tool Permission Governance would have blocked rm -rf via destructive command detection, required human approval for any operation touching config directories.

Deploy Tool Permission Governance
CRP-2024-010Cursor

Change Management: The 94-File Unauthorized Refactor

Environment Governance
Timeline

3:00 PM — 4:15 PM (1h 15m)

Blast Radius

94 files modified in a single session. Agent was asked to fix a CSS bug but followed import chains into the entire component library, refactoring each file it touched.

Root Cause

No scope enforcement. No approval gates for multi-file changes. No diff size limits. Agent interpreted "fix the styling" as permission to refactor the entire design system.

94
files Modified
3400
lines Changed
Critical
scope Creep
6 hours
rollback Time
Governance Containment

Agentic Change Management would have halted at file 5 (threshold: max 10 files without approval), requiring human review before continuing.

Deploy Agentic Change Management
CRP-2024-011Roo Code

Autonomous Execution: The rm -rf Test Directory

Environment Governance
Timeline

8:30 PM — overnight (unattended)

Blast Radius

Agent deleted test directory, then attempted to "fix" failing tests by removing the test runner configuration. No audit trail. Discovered 14 hours later.

Root Cause

Agent ran in fully autonomous mode overnight with no human-in-the-loop checkpoints. No execution audit trail. No destructive operation detection.

14
unattended Hours
23
files Destroyed
None
audit Trail
14 hours
detection Delay
Governance Containment

Autonomous Execution Safety would have required human approval for file deletions, enforced unattended timeout at 30 minutes, and logged every shell command.

Deploy Autonomous Execution Safety
CRP-2024-012Multi-Agent (Enterprise)

Engineering Economics: AI Agents Were Net-Negative

Skill Governance
Timeline

Q4 2024 (3 months)

Blast Radius

Team of 8 engineers spent 40% of sprint time reviewing and fixing AI-generated code. Total cost of AI + remediation exceeded hiring 2 additional engineers.

Root Cause

No ROI telemetry. No cost-per-task tracking. Management assumed AI was "free productivity" without measuring remediation overhead, review burden, and quality regression costs.

40% of sprint
review Burden
$180K/quarter
remediation Cost
$45K/quarter
ai Tool Cost
-$135K/quarter
net R O I
Governance Containment

AI Engineering Economics System would have tracked cost-per-task, flagged negative ROI at week 2, and recommended governance deployment to reduce remediation overhead by 60-80%.

Deploy AI Engineering Economics
CRP-2024-013Claude Code

Governance Theater: System Prompt Bypassed in 3 Messages

Identity Governance
Timeline

10:00 AM — 10:08 AM (8 minutes)

Blast Radius

System prompt instructing "never modify package.json" was bypassed after 3 conversational turns. Agent added 4 unauthorized dependencies.

Root Cause

System prompts are natural language suggestions, not deterministic constraints. Under context pressure or creative interpretation, agents routinely bypass text-based instructions.

3
turns To Bypass
4
unauthorized Deps
0% after bypass
prompt Adherence
Manual review
detection Method
Governance Containment

Runtime Governance enforces rules through middleware interception, not natural language. package.json would be in the write-restricted file list with hard-coded blocks.

Deploy Runtime Governance
CRP-2024-014Cursor

Retry Inflation: $340 on a CSS Animation

Skill Governance
Timeline

1:30 PM — 5:45 PM (4h 15m)

Blast Radius

$340 in API tokens spent on a CSS animation that should have taken 10 minutes. Agent attempted 67 variations, each adding more context bloat.

Root Cause

No retry limit. No cost ceiling. Agent kept trying increasingly complex solutions, each consuming more tokens. By attempt 40, the context was so polluted that correct solutions were impossible.

67
attempts
1.2M
tokens Burned
$340
cost Usd
Low (CSS)
task Complexity
None
correct Attempt
Governance Containment

Retry Inflation Control would have halted at attempt 3 (cost ceiling: $25), escalated to human review, and recommended session reset.

Deploy Retry Inflation Control
CRP-2024-015Codex

Hallucination Debt: Phantom API That Didn't Exist

Skill Governance
Timeline

2:00 PM — 4:00 PM (2 hours)

Blast Radius

Agent generated 400 lines of integration code against a third-party API endpoint that did not exist. Team spent 8 hours debugging before discovering the API was hallucinated.

Root Cause

No admissibility validation. Agent generated code referencing API endpoints from training data that had been deprecated or never existed. No dependency verification pipeline.

400
lines Generated
3
hallucinated A P Is
8
debugging Hours
0 lines
code Shipped
Governance Containment

Hallucination Debt Reduction would have run dependency verification against live registries, caught the phantom API immediately, and blocked the code from entering the review pipeline.

Deploy Hallucination Debt Reduction

Every incident above was preventable.

Deploy runtime governance infrastructure to contain these failures before they occur.

Need an expert verdict?

30-minute rapid-fire evaluation. You describe the problem, I tell you which approach wins — and why.

Richard Ewing — AI Economist & Capital Auditor