Skip to content

┌─< lukebayliss.com >─┐

/projects/api-gateway

Distributed API Gateway

High-performance API gateway built with Node.js, handling rate limiting, auth, and request routing across microservices.

Started
June 2023
Updated
Nov 2024
Role
Lead Engineer
Status
maintenance
  • Node.js
  • Redis
  • Docker
  • Kubernetes
  • PostgreSQL

An API gateway serving as the single entry point for a microservices architecture, handling approximately 50,000 requests per minute at peak load.

Key Features

Rate Limiting and Throttling

  • Distributed rate limiting using Redis with sliding window counters
  • Per-client quotas based on subscription tier
  • Automatic backoff and retry logic with exponential delays

Authentication and Authorization

  • JWT-based authentication with RSA-256 signing
  • Role-based access control (RBAC) with fine-grained permissions
  • API key management for third-party integrations

Request Routing

  • Dynamic service discovery via Kubernetes DNS
  • Circuit breaker pattern to prevent cascade failures
  • Health checking and automatic failover

Architecture Decisions

The gateway is stateless, allowing horizontal scaling based on CPU metrics. All rate limit counters and session data live in Redis clusters for sub-millisecond lookup times.

Circuit breakers prevent downstream service failures from cascading. If a service returns 50% error rates over 10 seconds, the circuit opens and requests fail fast rather than timing out.

Request logging happens asynchronously to avoid blocking the main path. Logs are buffered and flushed to Elasticsearch in batches.

Operational Highlights

Deployed on Kubernetes with auto-scaling policies. During traffic spikes, the gateway scales from 3 to 15 pods within 60 seconds.

Monitoring includes Prometheus metrics for request rates, latency percentiles, and error rates. Grafana dashboards provide real-time visibility into system health.

The gateway reduced average response time by 40% compared to direct microservice calls by implementing connection pooling and keep-alive.

Lessons Learned

  • Circuit breakers are essential. Without them, one slow service can bring down the entire stack.
  • Rate limiting must account for clock skew in distributed systems. We learned this the hard way when clients could bypass limits by hitting different gateway instances.
  • Observability is not optional. Without detailed metrics and tracing, debugging production issues is nearly impossible.