BlogAI Economics
AI Economics10 min read read

The Generative AI Margin Squeeze

Leadership is demanding AI features, product teams are shipping them, and no one is calculating the unit economics until the cloud bill arrives.

By Richard Ewing·
Share:

The Generative AI Margin Squeeze: Why Power Users Destroy SaaS Economics

Across the enterprise software landscape, executive leadership is frantically demanding AI features, product teams are dutifully shipping them, and absolutely no one is calculating the underlying unit economics until the cloud infrastructure bill arrives. Venture capitalists and public markets are currently valuing generative AI startups exactly like traditional software-as-a-service (SaaS) businesses. In almost every case, this is a massive category error.

The Illusion of Infinite SaaS Margins

Traditional SaaS companies enjoy incredible financial leverage, typically boasting gross margins between 80% and 90%. The economic model is beautiful: you build the software once, and the marginal cost of adding a new user to the platform is effectively zero. Generative AI violently shatters this economic model.

When a user prompts a Large Language Model (LLM) inside your application to summarize a document or write an email, that specific query requires significant, highly expensive GPU compute. The marginal cost of usage is decidedly non-zero. The more your customers use the product, the more it costs you to run it. We call this structural paradox the Generative Margin Squeeze.

Synthetic COGS and the Power User Paradox

This introduces a terrifying dynamic into the SaaS playbook. In traditional software, a highly engaged power user is your greatest asset. They are your evangelists, they drive down your churn rate, and they easily justify your Customer Acquisition Cost (CAC). In a generative AI application, an unmanaged power user is a direct threat to your EBITDA.

If you charge a flat $20/month subscription for your SaaS product, but a single power user generates $30 worth of API calls to OpenAI, Anthropic, or your internal infrastructure, you instantly have negative unit economics. You are literally subsidizing your customer's AI usage. You are no longer a high-margin software company; you are a low-margin compute reseller.

The Evergreen Ratio: Defending Your Margins

To survive, AI product leaders must introduce aggressive Synthetic COGS modeling into their roadmaps. You cannot just measure daily active users (DAU) or user engagement; you must measure the exact compute cost of that engagement.

The solution is implementing the Evergreen Ratio.

The Evergreen Ratio is defined as the percentage of AI interactions that are served from a cached, pre-computed database versus those that require a live, expensive generation from the frontier model. If an overwhelming majority of your users are asking the AI to generate variations of the exact same output (e.g., summarizing standard quarterly earnings reports), you should not pay an LLM to reason through the problem from scratch every single time.

Leading organizations build interception layers (Deterministic Control Planes) that recognize routine queries and serve static, pre-approved assets. If your Evergreen Ratio is 0%, you are exposed to maximum financial volatility. The sweet spot for a highly profitable AI feature sits between 60% and 80% cached responses.

The Product P&L Test for AI

Before your team spends another six months building a Generative AI feature, force them to pass a rigid financial test:

  1. The Cost of Inference: Do you know exactly how many fractions of a cent it costs to run a single query through your chosen model architecture?
  2. The Margin Threshold: At what exact volume of user engagement does the feature flip from profitable to unprofitable? Have you instituted hardcoded fair-use caps or transition plans to consumption-based billing?
  3. The Value Prop: Does the AI fully automate the task, or does it just generate a sloppy draft the user must spend ten minutes editing? If human intervention is still required, you haven't eliminated labor costs—you've just shifted them.

If you cannot monetize your AI strategy through massive new revenue generation or specific, measurable cost mitigation, you are not building a product. You are conducting an incredibly expensive science experiment funded by your CFO.

Like this analysis?

Get the weekly engineering economics briefing — one email, every Monday.

Subscribe Free →

More in AI Economics

Canonical Frameworks

Innovation Tax

The Innovation Tax is the hidden cost of maintenance work that gets reported as innovation investment. It is OpEx masquerading as R&D investment, causing organizations to dramatically overestimate their effective engineering velocity and R&D productivity. Here's how it works: A VP of Engineering reports to the CEO that "65% of engineering time is spent on new features." The actual breakdown, when forensically audited, reveals that only 23% of engineering time produces genuine new capabilities. The remaining 42% is maintenance work embedded within feature sprints — bug fixes bundled into feature stories, infrastructure upgrades coded as dependencies, and refactoring disguised as feature prerequisites. This 42-point gap between reported and actual innovation investment is the Innovation Tax. It's not fraud — it's systematic self-deception enabled by the way agile teams organize work. When a sprint contains 10 stories and 4 of them are technical debt cleanup dressed as "tech stories" within a feature epic, the team genuinely believes they're spending 100% on features. The Innovation Tax is insidious because it compounds. As the maintenance burden grows quarter-over-quarter, the tax increases. But because teams don't measure it, CFOs and boards continue to believe R&D spending is generating proportional innovation output. By the time the gap becomes visible (missed deadlines, slow feature delivery, competitive lag), the organization is often approaching the Technical Insolvency Date. Benchmarks from Richard Ewing's audits show that most engineering organizations have an Innovation Tax between 30-50%. Organizations with Innovation Tax above 40% are in dangerous territory. Above 70% is terminal — the organization is approaching technical insolvency within 4-6 quarters.

Read Definition →

Kill Switch Protocol

The Kill Switch Protocol is a structured framework for identifying and deprecating "Zombie Features" — code that requires ongoing maintenance but generates zero incremental business value. Most software organizations have a dangerous bias: they add features but never remove them. Product teams celebrate launches. Nobody celebrates deletions. Over time, this creates what Richard Ewing calls "feature gravity" — a constantly growing codebase where 40-60% of the code serves no active users and generates no measurable revenue, yet still consumes engineering maintenance hours. Zombie features come in several varieties: - **Ghost Features**: features that were built, launched, and never adopted. They sit in the codebase, requiring maintenance, but have near-zero usage. - **Legacy Bridges**: compatibility layers, deprecated API versions, and backward-compatible code paths that serve a tiny percentage of users but add complexity to every future change. - **Vanity Features**: features built because a senior stakeholder wanted them, not because users needed them. Often protected by organizational politics rather than business merit. - **Abandoned Experiments**: A/B test variants that were never cleaned up, prototypes that became permanent, and "temporary" solutions that became load-bearing. The Kill Switch Protocol provides a systematic approach to identification, evaluation, and deprecation: 1. **Identify**: Flag features with less than 5% of peak usage, zero revenue attribution, or maintenance cost exceeding 10% of the feature's value contribution. 2. **Quantify**: Calculate the total cost of keeping each zombie alive (maintenance hours × fully-loaded engineer cost × opportunity cost multiplier). 3. **Assess Risk**: Evaluate deprecation risk — what breaks if this feature is removed? What customers are affected? 4. **Sunset Timeline**: Create a communication plan and graduated deprecation (warning → deprecation notice → feature flag → removal). 5. **Execute**: Remove the code with rollback capability. Monitor for unexpected breakage. The typical Kill Switch audit reveals that 30-50% of maintenance burden comes from zombie features. Removing them frees up 15-25% of engineering capacity for actual innovation.

Read Definition →
📊

Richard Ewing

The AI Economist — Quantifying engineering economics for technology leaders, PE firms, and boards.