Framework Definition

Small Language Models (SLM)

Coined by Richard Ewing, AI Economist

Share:

Definition

Small Language Models (SLMs) are highly distilled AI models typically containing under 8 billion parameters. They are optimized for specific, deterministic tasks rather than emergent general reasoning. While frontier models (GPT-4) cost fractions of a cent per token and latency is high, SLMs can run locally on edge devices (laptops, phones) or highly optimized serverless endpoints. They drastically reduce inferencing costs and eliminate the need to send data off-site.

Why It Matters

In the pursuit of positive Return on AI Investment (ROAI), using a 1-trillion parameter model to route support tickets is economically devastating. SLMs right-size the intelligence to the task, achieving margin preservation.

How to Calculate

  1. 1Identify repetitive classification tasks in the AI orchestration chain
  2. 2Calculate the cost delta between frontier API calls and local SLM inference
  3. 3Implement routing architecture to leverage SLMs as the frontline tier

Deep Dive on the Blog

Explore the latest analysis and practical applications of this framework on the engineering economics blog.

Search the Blog Archive →

Citation

To cite this definition:

Ewing, R. (2026). "Small Language Models (SLM)." richardewing.io.
https://www.richardewing.io/articles/frameworks/slm