Glossary/Rate Limiting
Platform Engineering
1 min read
Share:

What is Rate Limiting?

TL;DR

Rate limiting is a technique for controlling the number of requests a client can make to an API or service within a given time window.

Rate limiting is a technique for controlling the number of requests a client can make to an API or service within a given time window. It protects services from abuse, ensures fair resource allocation, and prevents cascade failures.

Common algorithms: Token Bucket (allows burst traffic up to a limit), Sliding Window (smooth rate enforcement over time), Fixed Window (simple counter reset per interval), and Leaky Bucket (enforces constant output rate).

Rate limiting is implemented at multiple layers: API gateway (global rate limits), service level (per-endpoint limits), and infrastructure (connection limits, DDoS protection). HTTP 429 (Too Many Requests) is the standard response code.

Why It Matters

Rate limiting prevents a single misbehaving client from taking down an entire service. It's a fundamental building block of API security, fair resource allocation, and system stability.

Frequently Asked Questions

What is rate limiting?

Controlling how many requests a client can make within a time window. Protects services from overload, abuse, and ensures fair access. Returns HTTP 429 when limit exceeded.

Token bucket vs sliding window?

Token bucket allows burst traffic (good for APIs with bursty usage patterns). Sliding window provides smoother rate enforcement (good for APIs that need consistent throughput limits).

Related Terms

Need Expert Help?

Richard Ewing is a Product Economist and AI Capital Auditor. He helps companies translate technical complexity into financial clarity.

Book Advisory Call →