The AI Product Business Test

Validating AI unit economics before writing code. (Editor's Pick)

By Richard Ewing·January 20, 2026

The Margin Collapse Reality Check

Before writing a single line of AI orchestration code, product teams must pass the AI Business Test. The fundamental problem with GenAI products is the Cost of Predictivity: taking an LLM from 80% accuracy to 95% accuracy often requires a 10x explosion in token costs and RAG infrastructure.

The Viability Framework

If your product requires 5,000 input tokens and generates 1,000 output tokens to satisfy a single user query, calculate that cost via the OpenAI/Anthropic pricing sheets. Now multiply that by user volume. Does your SaaS subscription cover that burn rate while maintaining 70% gross margins? If not, you are building a feature that fails at scale.

To survive, you must implement Semantic Caching and Tiered Model Routing to drastically reduce live LLM calls.

Benchmark your exact token economics at The AUEB Calculator. Recognized as an Editor's Pick on Built In.

Like this analysis?

Get the weekly engineering economics briefing — one email, every Monday.

Subscribe Free →

Canonical Frameworks

Technical Insolvency Date

The Technical Insolvency Date (TID) is the specific future quarter when an organization's technical debt maintenance will consume 100% of engineering capacity, leaving zero time for new feature development. Every software organization accumulates technical debt over time — shortcuts taken under deadline pressure, aging infrastructure, deprecated dependencies, and code that nobody understands anymore. This debt isn't free. It requires ongoing maintenance hours: bug fixes, security patches, dependency updates, and workarounds for architectural limitations. The critical insight is that maintenance burden grows faster than most leaders realize. If your team currently spends 40% of its time on maintenance and that percentage is growing 3% per quarter, you can calculate the exact quarter when maintenance reaches 100%. That quarter is your Technical Insolvency Date. At the TID, your engineering team is fully consumed by keeping existing systems alive. Feature velocity drops to zero. No new capabilities. No competitive response. No innovation. Your R&D investment becomes pure maintenance spend — you're paying innovation-era salaries for maintenance-era output. The concept draws from financial insolvency: the point where a company's liabilities exceed its assets and it cannot meet its obligations. Technical insolvency is the same idea applied to engineering capacity — the point where your maintenance obligations exceed your available engineering hours. Most organizations don't realize they're approaching the TID because they track technical debt qualitatively rather than quantitatively. Telling a board "we have technical debt" gets deprioritized. Telling a board "we are 8 quarters from technical insolvency — the point where we can no longer ship any new features" gets immediate action and budget allocation.

Read Definition →

Audit Interview

The Audit Interview is a hiring protocol that tests verification skills instead of code generation skills. In the AI age, the scarce human skill is not writing code — it's catching what AI gets wrong. Traditional coding interviews ask candidates to write algorithms on a whiteboard or in a shared editor. This was a reasonable proxy for engineering skill when humans wrote all the code. But in 2026, AI tools like GitHub Copilot, Cursor, and Claude generate code faster and often more correctly than human candidates under interview pressure. When Anthropic discovered that candidates were using Claude to pass their own coding interviews, it proved that traditional interviews are testing the wrong thing. They're testing a skill that AI performs better than humans under artificial conditions. The Audit Interview flips the model. Instead of asking candidates to generate code, it presents them with AI-generated code that contains hidden flaws — security vulnerabilities, logic errors, performance anti-patterns, edge case failures, and architectural problems. The candidate's job is to find the bugs, rank them by severity, and make a ship/no-ship recommendation. The protocol works like this: candidates receive a realistic code review scenario (500-1000 lines of AI-generated code with 3-5 hidden flaws). They have 10 minutes to review the code, identify issues, and present their findings. The evaluation scores 4 dimensions of engineering judgment: 1. Verification: How many bugs did they find? Did they catch the security vulnerability? 2. Prioritization: Did they correctly rank issues by severity? 3. Communication: Can they explain the risk to a non-technical stakeholder? 4. Judgment: Would they ship this code? Under what conditions? With what caveats? The free Audit Interview tool at richardewing.io/tools/audit-interview generates realistic AI-written code with calibrated flaws for interviewers to use immediately.

Read Definition →

📊