What is Site Reliability Engineering (SRE)?
Site Reliability Engineering is a discipline that applies software engineering practices to infrastructure and operations problems.
Site Reliability Engineering is a discipline that applies software engineering practices to infrastructure and operations problems. Developed at Google, SRE treats operations as a software problem — automating manual tasks, building self-healing systems, and managing reliability through error budgets.
Key SRE concepts: SLIs (Service Level Indicators — metrics that measure service quality), SLOs (Service Level Objectives — target values for SLIs), SLAs (Service Level Agreements — contractual commitments to customers), and Error Budgets (the acceptable amount of unreliability, calculated as 1 - SLO).
The error budget concept is transformative: if your SLO is 99.9% uptime, your error budget is 0.1% (8.7 hours/year of acceptable downtime). When you have error budget remaining, you can deploy risky changes quickly. When your error budget is exhausted, you focus on reliability over features.
SRE team sizes vary: small companies might have 1-2 SREs, while Google has thousands. The general rule is 1 SRE per 5-10 application engineers.
Why It Matters
SRE provides a framework for balancing reliability with feature velocity. Without SRE practices, organizations either over-invest in reliability (slow feature delivery) or under-invest (frequent outages). Error budgets formalize this tradeoff.
Frequently Asked Questions
What is SRE?
Site Reliability Engineering applies software engineering to operations: automating manual tasks, building self-healing systems, and managing reliability through error budgets and SLOs.
How is SRE different from DevOps?
DevOps is a culture and set of practices. SRE is a specific implementation with defined roles, error budgets, SLOs, and quantitative approaches. Google describes SRE as "a specific implementation of DevOps."
Related Terms
Need Expert Help?
Richard Ewing is a Product Economist and AI Capital Auditor. He helps companies translate technical complexity into financial clarity.
Book Advisory Call →