Framework Definition

Small Language Models (SLM)

Coined by Richard Ewing, Product Economist

Definition

Small Language Models (SLMs) are highly distilled AI models typically containing under 8 billion parameters. They are optimized for specific, deterministic tasks rather than emergent general reasoning. While frontier models (GPT-4) cost fractions of a cent per token and latency is high, SLMs can run locally on edge devices (laptops, phones) or highly optimized serverless endpoints. They drastically reduce inferencing costs and eliminate the need to send data off-site.

Why It Matters

In the pursuit of positive Return on AI Investment (ROAI), using a 1-trillion parameter model to route support tickets is economically devastating. SLMs right-size the intelligence to the task, achieving margin preservation.

How to Calculate

  1. 1Identify repetitive classification tasks in the AI orchestration chain
  2. 2Calculate the cost delta between frontier API calls and local SLM inference
  3. 3Implement routing architecture to leverage SLMs as the frontline tier

Citation

To cite this definition:

Ewing, R. (2026). "Small Language Models (SLM)." richardewing.io.
https://www.richardewing.io/articles/frameworks/slm