Small Language Models (SLM)
Coined by Richard Ewing, Product Economist
Definition
Small Language Models (SLMs) are highly distilled AI models typically containing under 8 billion parameters. They are optimized for specific, deterministic tasks rather than emergent general reasoning. While frontier models (GPT-4) cost fractions of a cent per token and latency is high, SLMs can run locally on edge devices (laptops, phones) or highly optimized serverless endpoints. They drastically reduce inferencing costs and eliminate the need to send data off-site.
Why It Matters
In the pursuit of positive Return on AI Investment (ROAI), using a 1-trillion parameter model to route support tickets is economically devastating. SLMs right-size the intelligence to the task, achieving margin preservation.
How to Calculate
- 1Identify repetitive classification tasks in the AI orchestration chain
- 2Calculate the cost delta between frontier API calls and local SLM inference
- 3Implement routing architecture to leverage SLMs as the frontline tier
Related Articles
- "ROAI is the New ROI: Why CFOs Are Killing Your AI Pilots in 2026" — The Canon, Apr 2026
Calculate Yours
Use the interactive tool to calculate your Small Language Models (SLM).
Use the AI Unit Economics Benchmark (AUEB) →Citation
To cite this definition:
Ewing, R. (2026). "Small Language Models (SLM)." richardewing.io.
https://www.richardewing.io/articles/frameworks/slm