Runtime Incident Reports

Real Agentic Failures.
Real Costs. Real Containment.

Documented runtime incidents from Claude Code, Cursor, Windsurf, and multi-agent systems. Each incident maps to the governance system that would have prevented it.

CRP-2024-001 — Claude Code

The $1,100 Overnight Token Burn

Environment Governance

Timeline

11:47 PM — 6:23 AM (6h 36m unattended)

Blast Radius

$1,147 in API tokens consumed. Zero usable output.

Root Cause

Agent entered recursive retry loop on a failing test. No financial circuit breaker. No unattended execution limits. Agent burned through context window 14 times, each time restarting from scratch.

4.2M

tokens Consumed

$1,147

cost Usd

retries Detected

0 lines shipped

usable Output

Governance Containment

AI Cost Containment System would have halted execution at $25 budget cap (97.8% savings). Unattended timeout would have triggered at 30 minutes.

Deploy AI Cost Containment

CRP-2024-002 — Cursor

The 47-File Cursor Rewrite

Environment Governance

Timeline

2:15 PM — 2:52 PM (37 minutes)

Blast Radius

47 files modified. 12 new phantom dependencies introduced. 3 config files overwritten.

Root Cause

Agent was asked to refactor a single utility function. Without scope enforcement, it followed import chains across the entire codebase, "fixing" each file it touched. Ghost dependencies imported from packages not in package.json.

files Modified

phantom Deps

configs Overwritten

4.5 hours

rollback Time

Governance Containment

Repository Drift Prevention would have blocked out-of-scope mutations at file 2. Import validator would have caught phantom dependencies immediately.

Deploy Repository Drift Prevention

CRP-2024-003 — Claude Code + MCP

The .env Credential Leak via MCP

Tool Governance

Timeline

10:30 AM — 10:31 AM (instant)

Blast Radius

AWS access keys, database credentials, and Stripe API keys exposed to third-party MCP server.

Root Cause

Agent connected to an MCP tool server that requested file system access. Server read .env file containing production credentials. No context isolation. No capability manifest validation.

credentials Exposed

None

server Verification

None

context Isolation

3 days

detection Time

Governance Containment

MCP Governance System would have blocked .env access via file-guard, validated server against manifest, and enforced context isolation.

Deploy MCP Governance

CRP-2024-004 — Multi-Agent (CrewAI)

The $890 Agreement Loop

Skill Governance

Timeline

9:00 AM — 3:15 PM (6h 15m)

Blast Radius

$890 in compute. 340 turns of agents agreeing with each other. Zero tool invocations. Zero code produced.

Root Cause

Three agents entered an agreement loop — each validating the previous agent's output without performing any actual work. No turn limit. No tool-invocation requirement. No agreement loop detection.

340

total Turns

tool Invocations

$890

cost Usd

0 lines

code Produced

Governance Containment

Orchestration Entropy System would have detected the agreement loop at turn 10 and halted the workflow (99% cost prevention).

Deploy Orchestration Entropy

CRP-2024-005 — Cursor + GitHub

The Rubber-Stamp PR Avalanche

Skill Governance

Timeline

Sprint duration (2 weeks)

Blast Radius

34 AI-generated PRs merged with <2 min review. 8 contained bugs. 3 reached production. 1 caused a customer-facing outage.

Root Cause

AI code generation volume exceeded team review capacity. Engineers began rubber-stamping PRs to clear the queue. No confidence scoring. No review timer. No burnout detection.

prs Submitted

1.8 min

avg Review Time

bugs Shipped

production Incidents

Governance Containment

Verification Burden Collapse Prevention would have flagged rubber-stamp reviews, throttled AI generation when queue exceeded 8 PRs, and routed low-confidence code to deep review.

Deploy Verification Burden Collapse

CRP-2024-006 — Claude Code

Context Rot: Agent Forgot Its Own Architecture

Skill Governance

Timeline

10:00 AM — 1:45 PM (3h 45m)

Blast Radius

23 files corrupted with contradictory implementations. Agent began patching its own patches. 6 hours remediation.

Root Cause

After 90 minutes, the agent's context window filled. Original architecture instructions were pushed out. Agent continued generating code that contradicted the initial design, then tried to "fix" the contradictions by patching files it had just modified.

225 min

session Duration

files Corrupted

patch Chain Depth

remediation Hours

Governance Containment

Context Rot Prevention would have triggered checkpoint rotation at 65% utilization and mandatory semantic reset at 85%. Patch chain detector would have halted at depth 3.

Deploy Context Rot Prevention

CRP-2024-007 — Cline

Identity Drift: Agent Abandoned Its Own Rules

Identity Governance

Timeline

2:00 PM — 4:30 PM (2h 30m)

Blast Radius

Agent ignored .clinerules after 45 minutes. Began using deprecated APIs, wrong naming conventions, and unauthorized packages.

Root Cause

As context pressure increased, the identity constraints defined in .clinerules were pushed out of the active context window. Agent reverted to generic behavior, violating every architectural rule.

rules Violated

files Non Compliant

rework Hours

23%

identity Recall

Governance Containment

Identity Governance would enforce rules at runtime, not just at session start. Instruction adherence monitoring would halt execution when recall drops below 80%.

Deploy Deterministic Agentic Engineering

CRP-2024-008 — Claude Code

Context Window Overflow: Lost the Plot at 200K Tokens

Skill Governance

Timeline

9:00 AM — 12:30 PM (3h 30m)

Blast Radius

Agent forgot core project structure after context hit 95% utilization. Recreated utility functions that already existed. Imported wrong versions of dependencies.

Root Cause

No context compression or checkpoint rotation. The 200K context window filled with conversation history, failed attempts, and verbose error messages. Architectural instructions from the session start were no longer retrievable.

95%

context Utilization

duplicate Functions

wrong Imports

session Restarts

Governance Containment

Context Window Compression would have triggered semantic pruning at 65% utilization, preserving architectural state while discarding stale interaction history.

Deploy Context Window Compression

CRP-2024-009 — Windsurf

Tool Permission Leak: Windsurf Deleted Config Directory

Tool Governance

Timeline

11:15 AM — 11:16 AM (instant)

Blast Radius

Agent ran rm -rf on a configuration directory while attempting to "clean up" a build issue. Lost Nginx configs, SSL certificates, and deployment scripts.

Root Cause

No file path guards. No destructive command detection. Agent had unrestricted shell access with no approval gates for destructive operations.

files Deleted

configs Lost

8 hours

recovery Time

backup Available

Governance Containment

Tool Permission Governance would have blocked rm -rf via destructive command detection, required human approval for any operation touching config directories.

Deploy Tool Permission Governance

CRP-2024-010 — Cursor

Change Management: The 94-File Unauthorized Refactor

Environment Governance

Timeline

3:00 PM — 4:15 PM (1h 15m)

Blast Radius

94 files modified in a single session. Agent was asked to fix a CSS bug but followed import chains into the entire component library, refactoring each file it touched.

Root Cause

No scope enforcement. No approval gates for multi-file changes. No diff size limits. Agent interpreted "fix the styling" as permission to refactor the entire design system.

files Modified

3400

lines Changed

Critical

scope Creep

6 hours

rollback Time

Governance Containment

Agentic Change Management would have halted at file 5 (threshold: max 10 files without approval), requiring human review before continuing.

Deploy Agentic Change Management

CRP-2024-011 — Roo Code

Autonomous Execution: The rm -rf Test Directory

Environment Governance

Timeline

8:30 PM — overnight (unattended)

Blast Radius

Agent deleted test directory, then attempted to "fix" failing tests by removing the test runner configuration. No audit trail. Discovered 14 hours later.

Root Cause

Agent ran in fully autonomous mode overnight with no human-in-the-loop checkpoints. No execution audit trail. No destructive operation detection.

unattended Hours

files Destroyed

None

audit Trail

14 hours

detection Delay

Governance Containment

Autonomous Execution Safety would have required human approval for file deletions, enforced unattended timeout at 30 minutes, and logged every shell command.

Deploy Autonomous Execution Safety

CRP-2024-012 — Multi-Agent (Enterprise)

Engineering Economics: AI Agents Were Net-Negative

Skill Governance

Timeline

Q4 2024 (3 months)

Blast Radius

Team of 8 engineers spent 40% of sprint time reviewing and fixing AI-generated code. Total cost of AI + remediation exceeded hiring 2 additional engineers.

Root Cause

No ROI telemetry. No cost-per-task tracking. Management assumed AI was "free productivity" without measuring remediation overhead, review burden, and quality regression costs.

40% of sprint

review Burden

$180K/quarter

remediation Cost

$45K/quarter

ai Tool Cost

-$135K/quarter

net R O I

Governance Containment

AI Engineering Economics System would have tracked cost-per-task, flagged negative ROI at week 2, and recommended governance deployment to reduce remediation overhead by 60-80%.

Deploy AI Engineering Economics

CRP-2024-013 — Claude Code

Governance Theater: System Prompt Bypassed in 3 Messages

Identity Governance

Timeline

10:00 AM — 10:08 AM (8 minutes)

Blast Radius

System prompt instructing "never modify package.json" was bypassed after 3 conversational turns. Agent added 4 unauthorized dependencies.

Root Cause

System prompts are natural language suggestions, not deterministic constraints. Under context pressure or creative interpretation, agents routinely bypass text-based instructions.

turns To Bypass

unauthorized Deps

0% after bypass

prompt Adherence

Manual review

detection Method

Governance Containment

Runtime Governance enforces rules through middleware interception, not natural language. package.json would be in the write-restricted file list with hard-coded blocks.

Deploy Runtime Governance

CRP-2024-014 — Cursor

Retry Inflation: $340 on a CSS Animation

Skill Governance

Timeline

1:30 PM — 5:45 PM (4h 15m)

Blast Radius

$340 in API tokens spent on a CSS animation that should have taken 10 minutes. Agent attempted 67 variations, each adding more context bloat.

Root Cause

No retry limit. No cost ceiling. Agent kept trying increasingly complex solutions, each consuming more tokens. By attempt 40, the context was so polluted that correct solutions were impossible.

attempts

1.2M

tokens Burned

$340

cost Usd

Low (CSS)

task Complexity

None

correct Attempt

Governance Containment

Retry Inflation Control would have halted at attempt 3 (cost ceiling: $25), escalated to human review, and recommended session reset.

Deploy Retry Inflation Control

CRP-2024-015 — Codex

Hallucination Debt: Phantom API That Didn't Exist

Skill Governance

Timeline

2:00 PM — 4:00 PM (2 hours)

Blast Radius

Agent generated 400 lines of integration code against a third-party API endpoint that did not exist. Team spent 8 hours debugging before discovering the API was hallucinated.

Root Cause

No admissibility validation. Agent generated code referencing API endpoints from training data that had been deprecated or never existed. No dependency verification pipeline.

400

lines Generated

hallucinated A P Is

debugging Hours

0 lines

code Shipped

Governance Containment

Hallucination Debt Reduction would have run dependency verification against live registries, caught the phantom API immediately, and blocked the code from entering the review pipeline.

Deploy Hallucination Debt Reduction

Every incident above was preventable.

Deploy runtime governance infrastructure to contain these failures before they occur.

View All 15 Runtime Modules →Read the Architecture Doctrine

← Return to Infrastructure Catalog

⚡

Need an expert verdict?

30-minute rapid-fire evaluation. You describe the problem, I tell you which approach wins — and why.

Schedule Evaluation ($450)View All Advisory Options

Richard Ewing — AI Economist & Capital Auditor

Real Agentic Failures.Real Costs. Real Containment.

The $1,100 Overnight Token Burn

The 47-File Cursor Rewrite

The .env Credential Leak via MCP

The $890 Agreement Loop

The Rubber-Stamp PR Avalanche

Context Rot: Agent Forgot Its Own Architecture

Identity Drift: Agent Abandoned Its Own Rules

Context Window Overflow: Lost the Plot at 200K Tokens

Tool Permission Leak: Windsurf Deleted Config Directory

Change Management: The 94-File Unauthorized Refactor

Autonomous Execution: The rm -rf Test Directory

Engineering Economics: AI Agents Were Net-Negative

Governance Theater: System Prompt Bypassed in 3 Messages

Retry Inflation: $340 on a CSS Animation

Hallucination Debt: Phantom API That Didn't Exist

Every incident above was preventable.

Need an expert verdict?

Real Agentic Failures.
Real Costs. Real Containment.