What is Model Hallucination Rate?
Model hallucination rate is the percentage of AI outputs that contain factual errors, fabricated information, or ungrounded claims.
Model hallucination rate is the percentage of AI outputs that contain factual errors, fabricated information, or ungrounded claims. It is the primary quality metric for any AI system that generates text, code, or structured data.
Hallucination rates vary significantly by model, task, and domain. Frontier models (GPT-4, Claude) hallucinate on 3-10% of factual queries. Smaller models can hallucinate on 15-30% of queries. Domain-specific queries without RAG can see hallucination rates of 20-40%.
Measuring hallucination rate requires ground truth data — verified correct answers against which model outputs can be evaluated. This is expensive to create but essential for production AI systems.
Richard Ewing frames hallucination as an economic risk rather than an accuracy problem. Each hallucination has a cost: the cost of the incorrect output itself, the cost of detecting the error, the cost of correcting downstream decisions based on the error, and the potential liability cost if the error causes harm.
Why It Matters
Hallucination rate determines the total cost of ownership for AI features. A system with 10% hallucination rate requires human review of all outputs, which often costs more than the AI saves. Use the AUEB at richardewing.io/tools/aueb to model the economics.
How to Measure
1. **Create Ground Truth**: Build a test set of questions with verified correct answers.
2. **Run Evaluations**: Generate model responses and compare against ground truth.
3. **Categorize Errors**: Factual errors, fabricated citations, logical contradictions, incomplete answers.
4. **Calculate Rate**: Hallucinated responses ÷ total responses × 100.
5. **Track Over Time**: Monitor hallucination rate as you update prompts, models, or retrieval systems.
Frequently Asked Questions
What is a normal hallucination rate for AI?
Frontier models (GPT-4, Claude) hallucinate on 3-10% of factual queries. With RAG, rates can drop to 1-3%. Without RAG on domain-specific questions, rates can reach 20-40%.
How do you reduce AI hallucination rate?
Use RAG to ground responses in documents, add verification layers, implement confidence scoring, fine-tune on domain data, and use structured outputs to constrain the response space.
Free Tools
Related Terms
Need Expert Help?
Richard Ewing is a Product Economist and AI Capital Auditor. He helps companies translate technical complexity into financial clarity.
Book Advisory Call →