Track 2 — AI Product Economics

Module 2.2: Model Selection & Optimization

The model tier spectrum, routing architectures, fine-tuning ROI, and distillation. Match model capability to task complexity to reduce costs 60-80%.

3 Lessons~45 minIntermediate-Advanced

🎯 What You'll Learn

  • The four model tiers and when to use each (frontier, mid-tier, open-source, specialized)
  • How to architect model routing for 60-80% cost reduction
  • How to calculate fine-tuning ROI and distillation break-even
1

Lesson 1: The Model Tier Spectrum

Not every AI task needs a frontier model. Understanding model tiers — and matching task complexity to model capability — is the single largest cost optimization lever.

Frontier Models

GPT-4o, Claude 3.5 Opus, Gemini Ultra. Best reasoning, highest cost. Use for: complex analysis, multi-step reasoning, creative generation.

Cost: $5-30/1M tokens. Use for < 10% of queries.
Mid-Tier Models

GPT-4o-mini, Claude Haiku, Gemini Flash. 85-90% of frontier quality at 10-20x lower cost. Use for: most production features.

Cost: $0.15-1/1M tokens. Should handle 70-80% of queries.
Open-Source Models

Llama 3, Mistral, Phi-3. Self-hosted, zero per-token cost (but infrastructure cost). Use for: high-volume, latency-sensitive, or privacy-critical tasks.

Cost: $0.01-0.10/1M tokens (infrastructure). 10-20% of queries.
Specialized Models

Fine-tuned models for specific domains. Better quality AND lower cost for narrow tasks. A fine-tuned Llama can outperform GPT-4 on your specific use case.

ROI: fine-tuning cost $5K-$50K. Break-even if saving > $2K/month in inference.
📝 Exercise

Categorize your AI queries into complexity tiers (simple/medium/complex). What percentage could be handled by a mid-tier model instead of a frontier model?

2

Lesson 2: Model Routing Architecture

Model routing directs each query to the cheapest model capable of handling it. A router that sends 70% of queries to a fast/cheap model saves 80%+ on inference costs.

Complexity Classification

Use a lightweight classifier (or simple heuristics like query length, keyword detection) to estimate query complexity. Route simple queries to cheap models, complex to expensive.

A $0.001 routing decision can save $0.05-0.10 per query
Cascading Strategy

Try the cheapest model first. If confidence is below threshold, escalate to the next tier. Most queries resolve at the cheapest tier.

Cascading reduces average cost 60-80% vs. always using the best model
Quality Monitoring

Track accuracy/satisfaction by model tier. If the cheap model produces unacceptable results for certain query types, adjust routing rules.

Target: < 5% escalation rate for well-classified query types
📝 Exercise

Design a model routing strategy for your AI feature. Define 3 tiers, set routing rules, and estimate the cost savings vs. your current approach.

3

Lesson 3: Fine-Tuning Economics

Fine-tuning creates a specialized model that outperforms general models on your specific task, often at lower inference cost. But the ROI depends on volume.

Training Cost

Fine-tuning cost = training data preparation ($5K-$20K) + compute ($1K-$10K per training run) + iteration (3-5 runs typically). Total: $10K-$50K.

Break-even: if fine-tuned model saves > $2K/month in inference, ROI in 6-12 months
Quality Improvement

Fine-tuned models can be 20-40% more accurate on domain-specific tasks. This reduces retry rates (fewer wasted tokens) and improves user satisfaction.

Measure: accuracy lift × retry rate reduction × cost savings
Distillation

Train a small model on the outputs of a large model. The small model learns to mimic the large model at 10-100x lower inference cost. GPT-4 quality from a GPT-3.5-sized model.

Distillation works best when: narrow domain, consistent output format, high volume
📝 Exercise

Calculate fine-tuning ROI: (current monthly inference cost) - (projected fine-tuned model cost) = monthly savings. (Fine-tuning investment) / (monthly savings) = payback months.