/projects/api-gateway

Distributed API Gateway

High-performance API gateway built with Node.js, handling rate limiting, auth, and request routing across microservices.

Started: June 2023
Updated: Nov 2024
Role: Lead Engineer
Status: maintenance

Node.js
Redis
Docker
Kubernetes
PostgreSQL

An API gateway serving as the single entry point for a microservices architecture, handling approximately 50,000 requests per minute at peak load.

Key Features

Rate Limiting and Throttling

Distributed rate limiting using Redis with sliding window counters
Per-client quotas based on subscription tier
Automatic backoff and retry logic with exponential delays

Authentication and Authorization

JWT-based authentication with RSA-256 signing
Role-based access control (RBAC) with fine-grained permissions
API key management for third-party integrations

Request Routing

Dynamic service discovery via Kubernetes DNS
Circuit breaker pattern to prevent cascade failures
Health checking and automatic failover

Architecture Decisions

The gateway is stateless, allowing horizontal scaling based on CPU metrics. All rate limit counters and session data live in Redis clusters for sub-millisecond lookup times.

Circuit breakers prevent downstream service failures from cascading. If a service returns 50% error rates over 10 seconds, the circuit opens and requests fail fast rather than timing out.

Request logging happens asynchronously to avoid blocking the main path. Logs are buffered and flushed to Elasticsearch in batches.

Operational Highlights

Deployed on Kubernetes with auto-scaling policies. During traffic spikes, the gateway scales from 3 to 15 pods within 60 seconds.

Monitoring includes Prometheus metrics for request rates, latency percentiles, and error rates. Grafana dashboards provide real-time visibility into system health.

The gateway reduced average response time by 40% compared to direct microservice calls by implementing connection pooling and keep-alive.

Lessons Learned

Circuit breakers are essential. Without them, one slow service can bring down the entire stack.
Rate limiting must account for clock skew in distributed systems. We learned this the hard way when clients could bypass limits by hitting different gateway instances.
Observability is not optional. Without detailed metrics and tracing, debugging production issues is nearly impossible.