BlogAI Economics
AI Economics8 min read read

The Product P&L Test: Why Your AI Feature is Bleeding Cash

Before you let your team spend six months building a Generative AI feature, force yourself to pass the Product P&L Test.

By Richard Ewing·
Share:

The Product P&L Test: Stopping the AI Cash Bleed

In the current macroeconomic environment, capital is exceedingly expensive. As Product Leaders, Chief Technology Officers, and Founders, we must immediately stop being starry-eyed about technical possibility and become ruthless, uncompromising guardians of business viability. Before you allow your engineering team to spend six months building and deploying a Generative AI feature into your core product, you must force yourself to pass the Product P&L Test.

The Danger of "AI for AI's Sake"

The tech industry is currently infected with FOMO (Fear Of Missing Out). Boards are pressuring CEOs to "have an AI story," which cascades down to product teams shipping rushed wrappers around the OpenAI API. These features are often launched with massive fanfare, but quickly become ghost towns within the application, utilized only by a tiny fraction of power users who simultaneously drive up your cloud compute costs.

If you cannot mathematically prove how a feature improves your unit economics, you are not building a product. You are conducting an expensive, subsidized science experiment funded by your CFO.

The Three Pillars of the Product P&L Test

To pass the Product P&L test, an AI feature proposal must answer three critical questions with hard, verifiable numbers, not narrative storytelling:

  1. What is the Exact Cost of Inference?
    You must know exactly how many fractions of a cent it costs to run a single query through the model. If a user utilizes the feature 100 times a day, what is the impact on your COGS? Have you factored in the token costs for input context windows, output generation, and the vector database lookups for Retrieval-Augmented Generation (RAG)? If engineering cannot provide an estimated cost per 1,000 interactions, the feature is rejected.
  2. What is the Margin Threshold and Monetization Strategy?
    At what exact volume of user engagement does the feature flip from being profitable to unprofitable? Never bundle unlimited generative AI compute into a standard, flat-rate SaaS subscription. It is financial suicide. You must implement strict usage-based pricing, token-based credits, or hardcoded fair-use caps to protect your gross margin floor. If the feature is highly valuable, users will pay for the credits. If they refuse to pay, the feature was never valuable to begin with.
  3. What is the Defensible Differentiation?
    If the feature is just a thin, programmatic wrapper around the OpenAI or Anthropic API, what exactly prevents your closest competitor from shipping the exact same feature tomorrow afternoon? True defensibility in AI comes from proprietary data. If your AI model is reasoning over unique, siloed enterprise data that only your platform possesses, you have a moat. If it is just answering generic questions using the foundation model's pre-trained knowledge, you have zero defensibility.

The Value Verdict

Finally, apply the Painkiller vs. Vitamin assessment. Does the AI entirely remove human labor from a workflow, or does it merely generate a mediocre draft that the user must spend ten minutes editing and correcting? If heavy human intervention is still required, you haven't eliminated the friction; you have just shifted it from creation to verification. Build AI that acts autonomously and decisively, bounded by deterministic controls, and watch your margins expand.

Like this analysis?

Get the weekly engineering economics briefing — one email, every Monday.

Subscribe Free →

More in AI Economics

Canonical Frameworks

Technical Insolvency Date

The Technical Insolvency Date (TID) is the specific future quarter when an organization's technical debt maintenance will consume 100% of engineering capacity, leaving zero time for new feature development. Every software organization accumulates technical debt over time — shortcuts taken under deadline pressure, aging infrastructure, deprecated dependencies, and code that nobody understands anymore. This debt isn't free. It requires ongoing maintenance hours: bug fixes, security patches, dependency updates, and workarounds for architectural limitations. The critical insight is that maintenance burden grows faster than most leaders realize. If your team currently spends 40% of its time on maintenance and that percentage is growing 3% per quarter, you can calculate the exact quarter when maintenance reaches 100%. That quarter is your Technical Insolvency Date. At the TID, your engineering team is fully consumed by keeping existing systems alive. Feature velocity drops to zero. No new capabilities. No competitive response. No innovation. Your R&D investment becomes pure maintenance spend — you're paying innovation-era salaries for maintenance-era output. The concept draws from financial insolvency: the point where a company's liabilities exceed its assets and it cannot meet its obligations. Technical insolvency is the same idea applied to engineering capacity — the point where your maintenance obligations exceed your available engineering hours. Most organizations don't realize they're approaching the TID because they track technical debt qualitatively rather than quantitatively. Telling a board "we have technical debt" gets deprioritized. Telling a board "we are 8 quarters from technical insolvency — the point where we can no longer ship any new features" gets immediate action and budget allocation.

Read Definition →

Audit Interview

The Audit Interview is a hiring protocol that tests verification skills instead of code generation skills. In the AI age, the scarce human skill is not writing code — it's catching what AI gets wrong. Traditional coding interviews ask candidates to write algorithms on a whiteboard or in a shared editor. This was a reasonable proxy for engineering skill when humans wrote all the code. But in 2026, AI tools like GitHub Copilot, Cursor, and Claude generate code faster and often more correctly than human candidates under interview pressure. When Anthropic discovered that candidates were using Claude to pass their own coding interviews, it proved that traditional interviews are testing the wrong thing. They're testing a skill that AI performs better than humans under artificial conditions. The Audit Interview flips the model. Instead of asking candidates to generate code, it presents them with AI-generated code that contains hidden flaws — security vulnerabilities, logic errors, performance anti-patterns, edge case failures, and architectural problems. The candidate's job is to find the bugs, rank them by severity, and make a ship/no-ship recommendation. The protocol works like this: candidates receive a realistic code review scenario (500-1000 lines of AI-generated code with 3-5 hidden flaws). They have 10 minutes to review the code, identify issues, and present their findings. The evaluation scores 4 dimensions of engineering judgment: 1. Verification: How many bugs did they find? Did they catch the security vulnerability? 2. Prioritization: Did they correctly rank issues by severity? 3. Communication: Can they explain the risk to a non-technical stakeholder? 4. Judgment: Would they ship this code? Under what conditions? With what caveats? The free Audit Interview tool at richardewing.io/tools/audit-interview generates realistic AI-written code with calibrated flaws for interviewers to use immediately.

Read Definition →
📊

Richard Ewing

The AI Economist — Quantifying engineering economics for technology leaders, PE firms, and boards.