What is the best rate limiting algorithm?

Token bucket is the most flexible — allows short bursts while enforcing an average rate. Sliding window log is most accurate. Fixed window is simplest but has edge-case burst issues. Token bucket is the recommended default.

Should I implement rate limiting in my app or at the infrastructure level?

Both. Use a reverse proxy (Nginx, Cloudflare) for coarse rate limits against DDoS. Use app-level rate limiting for fine-grained per-user, per-endpoint, or per-plan limits.

How do I rate limit across multiple API servers?

Use a shared Redis instance as the rate limit state store. All servers check and update the same counters, giving consistent limits regardless of which server handles the request.

What HTTP status code should rate limiting return?

429 Too Many Requests. Include a Retry-After header with the number of seconds until the client can retry, and X-RateLimit-* headers with current limits.

API Rate Limiting Guide | Algorithms, Redis Implementation, and Best Practices

2026년 4월 16일 · 18분 읽기 · 수정 2026년 4월 16일 intermediate tutorial

이 글의 핵심

Rate limiting protects your API from abuse, ensures fair use, and prevents cascading failures. This guide covers the three main algorithms, Redis-backed implementation, and production patterns for Node.js APIs.

Why Rate Limiting Matters

Without rate limiting:

A single script can exhaust your database connections
One angry user can DoS your API
Credential stuffing attacks run unhindered
You have no way to enforce pricing tiers

Rate Limiting Algorithms

1. Fixed Window

Count requests in a fixed time window (e.g., 100 requests per minute).

Minute 0:00-1:00 → 100 requests allowed
Minute 1:00-2:00 → counter resets, 100 more allowed

Problem: A user can make 100 requests at 0:59 and 100 more at 1:01 — 200 requests in 2 seconds.

// Redis implementation
async function fixedWindowLimit(key, limit, windowSeconds) {
  const now = Math.floor(Date.now() / 1000)
  const windowKey = `ratelimit:${key}:${Math.floor(now / windowSeconds)}`

  const count = await redis.incr(windowKey)
  if (count === 1) await redis.expire(windowKey, windowSeconds)

  return { allowed: count <= limit, count, limit }
}

2. Sliding Window Log

Track exact timestamps of each request. Count requests in the last N seconds.

async function slidingWindowLog(key, limit, windowSeconds) {
  const now = Date.now()
  const windowStart = now - windowSeconds * 1000
  const logKey = `ratelimit:log:${key}`

  const pipeline = redis.pipeline()
  pipeline.zremrangebyscore(logKey, 0, windowStart)  // remove old entries
  pipeline.zadd(logKey, now, `${now}-${Math.random()}`)  // add current
  pipeline.zcard(logKey)                             // count in window
  pipeline.expire(logKey, windowSeconds)
  const results = await pipeline.exec()

  const count = results[2][1]
  return { allowed: count <= limit, count, limit }
}

Accurate but memory-intensive — stores one entry per request.

3. Token Bucket (Recommended)

A bucket fills with tokens at a constant rate. Each request consumes one token. Allows bursts up to bucket capacity.

// Token bucket with Redis
async function tokenBucket(key, capacity, refillRate, refillSeconds) {
  const now = Date.now() / 1000  // seconds
  const bucketKey = `ratelimit:bucket:${key}`

  // Lua script for atomicity
  const script = `
    local key = KEYS[1]
    local capacity = tonumber(ARGV[1])
    local refill_rate = tonumber(ARGV[2])
    local refill_seconds = tonumber(ARGV[3])
    local now = tonumber(ARGV[4])

    local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
    local tokens = tonumber(bucket[1]) or capacity
    local last_refill = tonumber(bucket[2]) or now

    -- Refill tokens based on elapsed time
    local elapsed = now - last_refill
    local new_tokens = math.min(capacity, tokens + (elapsed * refill_rate / refill_seconds))

    if new_tokens < 1 then
      redis.call('HMSET', key, 'tokens', new_tokens, 'last_refill', now)
      redis.call('EXPIRE', key, refill_seconds * 2)
      return {0, math.ceil((1 - new_tokens) * refill_seconds / refill_rate)}
    end

    redis.call('HMSET', key, 'tokens', new_tokens - 1, 'last_refill', now)
    redis.call('EXPIRE', key, refill_seconds * 2)
    return {1, 0}
  `

  const [allowed, retryAfter] = await redis.eval(
    script, 1, bucketKey, capacity, refillRate, refillSeconds, now
  )

  return { allowed: allowed === 1, retryAfter }
}

// Usage: 100 requests per minute, burst up to 20
await tokenBucket('user:123', 20, 100, 60)

Express Middleware

Using express-rate-limit (simple)

npm install express-rate-limit rate-limit-redis

import rateLimit from 'express-rate-limit'
import { RedisStore } from 'rate-limit-redis'
import { redisClient } from './redis'

// Global rate limit
const globalLimit = rateLimit({
  windowMs: 15 * 60 * 1000,  // 15 minutes
  max: 1000,
  standardHeaders: true,     // X-RateLimit-* headers
  legacyHeaders: false,
  store: new RedisStore({ sendCommand: (...args) => redisClient.sendCommand(args) }),
  message: { error: 'Too many requests', retryAfter: 900 },
})

// Strict limit for auth endpoints
const authLimit = rateLimit({
  windowMs: 15 * 60 * 1000,
  max: 10,
  skipSuccessfulRequests: true,  // only count failed attempts
  message: { error: 'Too many login attempts' },
})

app.use(globalLimit)
app.post('/auth/login', authLimit, loginHandler)
app.post('/auth/register', authLimit, registerHandler)

Custom middleware with per-user limits

// Different limits per subscription tier
const TIER_LIMITS = {
  free: { requests: 100, window: 3600 },     // 100/hour
  pro: { requests: 1000, window: 3600 },     // 1000/hour
  enterprise: { requests: 10000, window: 3600 }, // 10000/hour
}

async function rateLimitMiddleware(req, res, next) {
  const user = req.user  // set by auth middleware
  const tier = user?.subscriptionTier ?? 'free'
  const { requests, window } = TIER_LIMITS[tier]

  const key = user ? `user:${user.id}` : `ip:${req.ip}`
  const { allowed, count, retryAfter } = await slidingWindowLimit(key, requests, window)

  // Set standard headers
  res.setHeader('X-RateLimit-Limit', requests)
  res.setHeader('X-RateLimit-Remaining', Math.max(0, requests - count))
  res.setHeader('X-RateLimit-Reset', Math.floor(Date.now() / 1000) + window)

  if (!allowed) {
    res.setHeader('Retry-After', retryAfter ?? window)
    return res.status(429).json({
      error: 'Rate limit exceeded',
      limit: requests,
      window: `${window}s`,
      retryAfter: retryAfter ?? window,
    })
  }

  next()
}

Per-Endpoint Rate Limiting

// Different limits for different operations
const limits = {
  // Expensive operations
  '/api/generate-report': { requests: 5, window: 3600 },    // 5/hour
  '/api/export': { requests: 10, window: 3600 },             // 10/hour

  // API key endpoints
  '/api/v1/': { requests: 10000, window: 3600 },            // 10k/hour

  // Public endpoints
  '/api/posts': { requests: 500, window: 3600 },             // 500/hour
}

function getLimit(path) {
  for (const [pattern, limit] of Object.entries(limits)) {
    if (path.startsWith(pattern)) return limit
  }
  return { requests: 1000, window: 3600 }  // default
}

Handling Rate Limit Responses (Client Side)

// JavaScript fetch client with automatic retry
async function fetchWithRetry(url, options = {}, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const response = await fetch(url, options)

    if (response.status === 429) {
      const retryAfter = parseInt(response.headers.get('Retry-After') ?? '60')
      console.warn(`Rate limited. Retrying in ${retryAfter}s`)
      await new Promise(r => setTimeout(r, retryAfter * 1000))
      continue
    }

    return response
  }
  throw new Error('Max retries exceeded')
}

// Axios interceptor
axios.interceptors.response.use(null, async (error) => {
  if (error.response?.status === 429) {
    const retryAfter = error.response.headers['retry-after'] ?? 60
    await new Promise(r => setTimeout(r, retryAfter * 1000))
    return axios.request(error.config)
  }
  return Promise.reject(error)
})

Production Patterns

IP-based vs user-based

function getRateLimitKey(req) {
  // Authenticated users: rate limit by user ID (fair per account)
  if (req.user?.id) return `user:${req.user.id}`

  // API keys: rate limit by key
  if (req.headers['x-api-key']) return `apikey:${req.headers['x-api-key']}`

  // Unauthenticated: rate limit by IP
  // Note: use X-Forwarded-For carefully behind proxies
  const ip = req.headers['x-forwarded-for']?.split(',')[0].trim() ?? req.ip
  return `ip:${ip}`
}

Bypass for trusted clients

const TRUSTED_IPS = new Set(['10.0.0.1', '10.0.0.2'])

function rateLimitMiddleware(req, res, next) {
  if (TRUSTED_IPS.has(req.ip)) return next()
  if (req.user?.isAdmin) return next()
  // ... apply rate limiting
}

Cloudflare / Nginx rate limiting (infrastructure layer)

# nginx.conf — coarse rate limiting before app
limit_req_zone $binary_remote_addr zone=api:10m rate=100r/m;

location /api/ {
  limit_req zone=api burst=20 nodelay;
  limit_req_status 429;
  proxy_pass http://app;
}

Response Headers Reference

X-RateLimit-Limit: 1000          # requests allowed in window
X-RateLimit-Remaining: 847       # requests remaining
X-RateLimit-Reset: 1713312000    # Unix timestamp when limit resets
Retry-After: 3600                # seconds until client can retry (on 429)

Key Takeaways

Algorithm	Best for
Fixed window	Simple, high-traffic counters
Sliding window log	Accurate limits, lower traffic
Token bucket	APIs that allow bursts (recommended default)

Pattern	Recommendation
State store	Redis (shared across servers)
Rate by	User ID > API key > IP
Status code	429 with Retry-After header
Infrastructure	Nginx/Cloudflare for DDoS, app for per-user
Different limits	Per endpoint, per tier, per operation type

Implement rate limiting early — retrofitting it onto a production API is painful. Start with express-rate-limit + Redis for simplicity, then implement custom token bucket logic if you need tier-based limits or burst control.