API Gateway vs Layer 7 Load Balancer: Key Differences for Developers
Every production backend eventually forces you to make this decision: deploy an API Gateway, a Layer 7 load balancer, or both — and in what order. The wrong choice costs you months of retrofitting. The right choice determines whether your platform scales cleanly or collapses under operational complexity.
The short answer: A Layer 7 load balancer distributes HTTP traffic intelligently across backend instances. An API Gateway manages the contract between API consumers and your backend services. They solve different problems, often coexist in the same architecture, and choosing one over the other without understanding the distinction leads to either over-engineering or critical security gaps.
This guide is for engineers making real architectural decisions — not for readers looking for a glossary definition.
What Actually Happens at Layer 7
Before comparing the two, you need to understand what "Layer 7" means in practice. The OSI model places HTTP, gRPC, WebSocket, and MQTT at Layer 7 (Application Layer). A Layer 7-aware system can:
- Read HTTP headers, methods, and URL paths
- Inspect request bodies (with performance cost)
- Terminate TLS and read the decrypted content
- Make routing decisions based on any of the above
- Modify requests and responses in transit
Contrast this with Layer 4 (TCP/UDP), where a load balancer sees IP addresses and ports but knows nothing about the HTTP content inside the connection. Both API Gateways and L7 load balancers operate at Layer 7, which is why the confusion exists. The difference is in what they do with that visibility.
Layer 7 Load Balancer: What It Actually Does
A Layer 7 load balancer's primary job is traffic distribution with HTTP-awareness. It answers one question: which backend instance should handle this request?
Core Capabilities
- Health-check-aware routing: Removes unhealthy backends automatically based on HTTP health checks, not just TCP connectivity
- Sticky sessions: Routes a user's requests to the same backend using cookies or headers (useful for stateful apps)
- Content-based routing: Routes
/api/to one upstream and/static/to another, based on URL patterns - TLS termination: Handles SSL/TLS so backends receive unencrypted HTTP
- Connection pooling: Maintains persistent connections to backends, reducing TCP handshake overhead
- Retries and timeouts: Retries failed requests to a different backend automatically
- Compression: gzip/Brotli compression at the edge
- HTTP/2 and HTTP/3: Protocol translation between clients and backends
What It Does NOT Do (Usually)
- Authenticate API consumers by API key or JWT
- Enforce per-tenant rate limits
- Transform request/response payloads
- Manage API versioning lifecycle
- Emit per-route business metrics
- Enforce subscription-tier access control
Primary L7 Load Balancers in Production
| Tool | Strength | Typical Use Case |
|---|---|---|
| HAProxy | Extreme performance, precise control | High-throughput TCP/HTTP routing, financial systems |
| NGINX | Versatile, huge ecosystem | General-purpose reverse proxy, static serving + routing |
| Traefik | Dynamic config, Kubernetes-native | Container environments, automatic cert management |
| AWS ALB | Fully managed, AWS-native | AWS workloads, ECS/EKS clusters |
| GCP Cloud Load Balancing | Global anycast, autoscaling | GCP-native global traffic distribution |
| Envoy | Programmable, observability-rich | Service mesh data plane, advanced L7 routing |
API Gateway: What It Actually Does
Want to analyze your API security?
Import your OpenAPI spec and generate a Security Report automatically.
An API Gateway's primary job is API management — governing who can access your APIs, how often, in what form, and with what guarantees. It answers a broader question: should this request be allowed, and in what form should it reach the backend?
Core Capabilities
- Authentication and authorization: API key validation, JWT verification, OAuth 2.0 flows, mTLS
- Rate limiting: Per-consumer, per-tier, per-endpoint quotas (distributed, not just IP-based)
- Request/response transformation: Rewrite URLs, inject headers, transform JSON payloads
- API versioning: Route
/v1/and/v2/to different service versions with lifecycle management - Developer portal: API discovery, documentation, subscription management
- Analytics: Per-consumer API usage, latency histograms, error rate tracking
- Caching: Response caching keyed by route + consumer
- Circuit breaking: Stop forwarding to degraded backends
- Schema validation: Validate requests against OpenAPI specs before forwarding
- Plugin/middleware ecosystem: Extend behavior without modifying backend code
What It Does NOT Replace
An API Gateway is not a substitute for:
- Service mesh (Istio, Linkerd): East-west (service-to-service) traffic management inside your cluster
- WAF (Web Application Firewall): Deep packet inspection, OWASP rule sets, bot mitigation
- CDN: Global caching, edge compute, DDoS absorption at scale
Primary API Gateways in Production
| Tool | Strength | Typical Use Case |
|---|---|---|
| Kong | Plugin ecosystem, open source core | Enterprise API management, multi-cloud |
| AWS API Gateway | Serverless-native, Lambda integration | AWS-first architectures, serverless APIs |
| Envoy + control plane | Programmable, high performance | Platform teams building custom gateways |
| Apigee | Enterprise analytics, monetization | Telcos, large enterprises with API products |
| Tyk | Open source, Go-native | Self-hosted enterprise gateway |
| Azure API Management | Azure-native, hybrid connectivity | Azure workloads, hybrid cloud |
Side-by-Side Comparison Table
| Capability | Layer 7 Load Balancer | API Gateway |
|---|---|---|
| Traffic distribution | ✅ Core feature | ✅ Included (but not primary) |
| TLS termination | ✅ Core feature | ✅ Included |
| Health-check routing | ✅ Core feature | ✅ Basic support |
| Sticky sessions | ✅ Cookie/IP hash | ⚠️ Limited / plugin |
| API key authentication | ❌ Not native | ✅ Core feature |
| JWT / OAuth 2.0 | ❌ Not native | ✅ Core feature |
| Per-consumer rate limiting | ❌ IP-level only | ✅ Core feature |
| Request transformation | ⚠️ Header rewrite only | ✅ Full payload transform |
| Response caching | ⚠️ Basic proxy cache | ✅ Route + consumer keyed |
| API versioning | ⚠️ URL routing only | ✅ Lifecycle management |
| Circuit breaking | ⚠️ Basic retry/failover | ✅ Configurable per upstream |
| Schema validation | ❌ | ✅ OpenAPI-based |
| Developer portal | ❌ | ✅ Core feature |
| Per-route analytics | ⚠️ Log-based only | ✅ Native |
| Plugin/middleware system | ⚠️ Module-based (NGINX) | ✅ Rich ecosystem |
| Throughput (req/sec) | 🚀 Very high (HAProxy: 1M+ rps) | ⚠️ Lower (policy overhead) |
| Latency overhead | <1ms | 1–10ms |
| Operational complexity | Lower | Higher |
| Cost | Lower | Higher (licensing/cloud) |
Architecture Diagrams
Pattern 1: L7 Load Balancer Only (Simple Services)
┌─────────────────────────────────────────────────────┐
│ Internet │
└──────────────────────┬──────────────────────────────┘
│ HTTPS
▼
┌────────────────────────┐
│ L7 Load Balancer │
│ (NGINX / HAProxy / │
│ AWS ALB) │
│ - TLS termination │
│ - Health checks │
│ - Round-robin routing │
└────────┬───────────────┘
┌─────────┼──────────┐
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ App │ │ App │ │ App │
│ Server 1│ │ Server 2│ │ Server 3│
└─────────┘ └─────────┘ └─────────┘
Best for: Internal services, simple public APIs without per-consumer auth, high-throughput backend pools.
Pattern 2: API Gateway Only (Serverless / Managed)
┌─────────────────────────────────────────────────────┐
│ Internet │
└──────────────────────┬──────────────────────────────┘
│ HTTPS
▼
┌────────────────────────────┐
│ API Gateway │
│ (AWS API GW / Kong) │
│ - JWT/OAuth auth │
│ - Rate limiting │
│ - Request transform │
│ - Analytics │
└──────┬──────────┬──────────┘
│ │
┌───────▼──┐ ┌────▼───────┐
│ Lambda │ │ Microservice│
│ Function │ │ (HTTP) │
└──────────┘ └────────────┘
Best for: Public APIs with developer ecosystems, serverless architectures, monetized APIs.
Pattern 3: Both Together (Production Enterprise Architecture)
┌─────────────────────────────────────────────────────────────┐
│ Internet │
└────────────────────────────┬────────────────────────────────┘
│ HTTPS
▼
┌─────────────────────────┐
│ CDN / WAF │
│ (CloudFront / Fastly) │
└──────────────┬──────────┘
│
▼
┌─────────────────────────┐
│ L7 Load Balancer │
│ (AWS ALB / NGINX) │
│ - TLS termination │
│ - Health routing │
│ - DDoS absorption │
└──────┬──────────┬───────┘
│ │
┌────────▼──┐ ┌────▼──────────┐
│ API GW │ │ Static Origin │
│ (Kong/ │ │ (S3/CDN) │
│ Envoy) │ └───────────────┘
│ - Auth │
│ - Quotas │
│ - Routing │
└─────┬─────┘
┌────────┼────────┐
▼ ▼ ▼
┌──────────┐ ┌──────┐ ┌──────────┐
│ Service A│ │Svc B │ │ Service C│
└──────────┘ └──────┘ └──────────┘
Best for: Production multi-service platforms, public APIs with partner integrations, regulated industries.
Deep Dive: Authentication
This is where the architectural difference becomes undeniable.
L7 Load Balancer Authentication
NGINX and HAProxy can verify client certificates (mTLS) and forward headers. They cannot natively validate JWTs against JWKS endpoints or maintain API key databases.
# NGINX: Forward client cert DN to backend
server {
listen 443 ssl;
ssl_verify_client on;
ssl_client_certificate /etc/nginx/ca.crt;
location / {
proxy_set_header X-Client-DN $ssl_client_s_dn;
proxy_set_header X-Client-Verified $ssl_client_verify;
proxy_pass http://backend_pool;
}
}
This works for mTLS service-to-service auth. It does NOT work for JWT-based consumer auth.
API Gateway Authentication
Kong's JWT plugin verifies tokens against a JWKS endpoint, extracts claims, and injects consumer identity headers before forwarding:
# Kong: JWT plugin configuration (declarative)
plugins:
- name: jwt
config:
key_claim_name: kid
claims_to_verify:
- exp
- nbf
header_names:
- Authorization
uri_param_names: []
# Kong validates against stored consumer credentials
# and injects X-Consumer-ID, X-Consumer-Username
AWS API Gateway does this with Cognito Authorizers or Lambda Authorizers — completely managed, no infrastructure to maintain.
Production decision: If you need JWT verification across multiple services, put it in the API Gateway — not in each service. Centralizing auth in the gateway reduces attack surface and eliminates drift between service implementations.
Deep Dive: Rate Limiting
HAProxy Rate Limiting (IP-based)
# haproxy.cfg — stick table rate limiting
frontend http_frontend
bind *:80
stick-table type ip size 100k expire 30s store http_req_rate(10s)
http-request track-sc0 src
http-request deny deny_status 429 if { sc_http_req_rate(0) gt 100 }
default_backend app_servers
This limits 100 requests/10 seconds per IP address. Simple, fast, but has no concept of API consumers or subscription tiers.
Kong Rate Limiting (Consumer-aware)
# Kong: Rate limit by consumer with Redis backend
plugins:
- name: rate-limiting
config:
minute: 1000
hour: 10000
policy: redis
redis_host: redis.internal
redis_port: 6379
limit_by: consumer
fault_tolerant: true # Fail open if Redis is down
hide_client_headers: false
With Kong, a free-tier consumer gets 100 req/min; a paid consumer gets 10,000 req/min. This is impossible to implement correctly in HAProxy or NGINX without external state and custom Lua scripting.
Production note: Always set fault_tolerant: true in distributed rate limiters. If Redis goes down, failing closed (blocking all traffic) is almost always worse than temporarily allowing excess traffic.
Deep Dive: Routing
NGINX Routing (URL-based)
upstream v1_backend {
least_conn;
server api-v1-1:8080 weight=3;
server api-v1-2:8080 weight=3;
keepalive 32;
}
upstream v2_backend {
server api-v2-1:8080;
server api-v2-2:8080;
}
server {
listen 443 ssl http2;
location /v1/ {
proxy_pass http://v1_backend/;
proxy_set_header Connection "";
proxy_http_version 1.1;
}
location /v2/ {
proxy_pass http://v2_backend/;
}
# Header-based routing for canary
location /api/ {
set $upstream v1_backend;
if ($http_x_canary = "true") {
set $upstream v2_backend;
}
proxy_pass http://$upstream;
}
}
NGINX routing is fast and capable. But routing rules are static config that requires reload on change. There's no API to update routing dynamically.
Envoy Dynamic Routing
# Envoy: Route config with header-based routing and weighted clusters
route_config:
name: api_routes
virtual_hosts:
- name: api_service
domains: ["api.example.com"]
routes:
- match:
prefix: "/v2/"
headers:
- name: "x-canary"
present_match: true
route:
weighted_clusters:
clusters:
- name: api_v2_canary
weight: 10
- name: api_v2_stable
weight: 90
- match:
prefix: "/v1/"
route:
cluster: api_v1
timeout: 30s
retry_policy:
retry_on: "5xx,gateway-error,connect-failure"
num_retries: 3
per_try_timeout: 10s
Envoy's routing config can be pushed dynamically via xDS APIs without restart — critical for zero-downtime deployments and progressive traffic shifting.
Deep Dive: Circuit Breaking
Circuit breaking protects your backends from cascade failures. This is where API Gateways and advanced L7 proxies diverge significantly from simple load balancers.
# Envoy: Circuit breaker configuration
clusters:
- name: backend_service
circuit_breakers:
thresholds:
- priority: DEFAULT
max_connections: 1000
max_pending_requests: 1000
max_requests: 1000
max_retries: 3
track_remaining: true
outlier_detection:
consecutive_5xx: 5
interval: 10s
base_ejection_time: 30s
max_ejection_percent: 50
success_rate_minimum_hosts: 5
success_rate_request_volume: 100
Envoy's outlier detection ejects individual backend instances when they exceed error thresholds — a much more sophisticated mechanism than HAProxy's simple health check removal.
Practical rule: If you're running microservices and need circuit breaking, you need either a service mesh (Istio/Linkerd with Envoy) or an API Gateway with circuit breaker support. A basic L7 LB won't cut it.
SSL/TLS Termination: Practical Differences
Both L7 load balancers and API Gateways terminate TLS. The architectural question is where in the stack.
Recommended TLS Architecture (Production)
Client → TLS → ALB (terminate) → HTTP → Kong → HTTP → Backend
or
Client → TLS → ALB (terminate) → HTTP → Kong → TLS → Backend (re-encrypt)
or
Client → TLS → ALB (passthrough) → TLS → Kong (terminate) → HTTP → Backend
Re-encryption (ALB terminates, then re-encrypts to backend) is the most common enterprise pattern. It allows the ALB to inspect/log traffic while keeping backend communication encrypted — important for compliance.
mTLS to backend is required when backends need to verify the caller's identity. Kong supports forwarding client cert headers; Envoy supports full mTLS upstream connections natively.
# Kong: Configure upstream mTLS
curl -X POST http://localhost:8001/services/my-service/certificates \
-F "cert=@/path/to/client.crt" \
-F "key=@/path/to/client.key"
Observability: Where They Diverge Most
This is an underappreciated difference that only becomes painful after you've been operating in production for six months.
L7 Load Balancer Observability
HAProxy exports metrics via the stats socket and Prometheus endpoint:
# HAProxy metrics via Prometheus exporter
haproxy_process_current_connections
haproxy_backend_http_responses_total{backend="api_pool", code="5xx"}
haproxy_backend_response_time_average_seconds
You get connection counts, response codes, and latency per backend. You do NOT get per-API-consumer metrics, per-endpoint error rates, or business-level SLA tracking.
API Gateway Observability
Kong's Prometheus plugin emits:
kong_http_requests_total{service="user-api", route="get-user",
consumer="partner-acme", status="200"}
kong_latency_bucket{service="user-api", type="kong", le="10"}
kong_latency_bucket{service="user-api", type="upstream", le="100"}
Now you can answer: "How many requests did ACME Corp make to /v2/users in the last hour, and what was their p99 latency?"
This is the difference between infrastructure monitoring and API observability. Both matter. They are not the same thing.
Caching: Different Semantics
NGINX proxy caching:
proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=api_cache:10m
max_size=1g inactive=60m use_temp_path=off;
location /api/products {
proxy_cache api_cache;
proxy_cache_key "$scheme$request_method$host$request_uri";
proxy_cache_valid 200 5m;
proxy_cache_bypass $http_pragma;
add_header X-Cache-Status $upstream_cache_status;
proxy_pass http://backend;
}
This caches by URL. If /api/products returns the same response for all users, this works perfectly. It does NOT work for per-consumer responses (JWT-scoped data).
Kong's proxy-cache-advanced plugin:
plugins:
- name: proxy-cache-advanced
config:
response_code: [200, 301]
request_method: [GET, HEAD]
content_type: ["application/json"]
cache_ttl: 300
strategy: memory
# Cache key can include consumer ID
vary_headers: ["Authorization"]
By including the Authorization header in the cache key, you get per-consumer caching — critical for APIs returning user-scoped data.
API Gateways in AI Infrastructure and MCP Architectures
This is where architectural decisions made in traditional API infrastructure directly apply to emerging AI-native systems.
The AI Agent Request Path
AI agents (Claude, GPT-based systems, LangChain agents, AutoGPT derivatives) interact with backend systems via:
- REST APIs — standard JSON over HTTP
- MCP servers — Model Context Protocol, enabling structured tool use via JSON-RPC
- Streaming APIs — SSE or WebSocket for long-running agent tasks
All three go through your API Gateway if architected correctly.
Where MCP Servers Sit in the Stack
┌──────────────────────────────────────────────────────────────┐
│ AI Agent (Claude / GPT) │
└──────────────────────────────┬───────────────────────────────┘
│ JSON-RPC over HTTPS
▼
┌─────────────────────────────┐
│ API Gateway │
│ (Kong / AWS API GW) │
│ - Bearer token validation │
│ - Agent rate limiting │
│ - Tool call audit logging │
│ - Request transformation │
└──────────────┬──────────────┘
│ HTTP (internal)
┌─────────┼─────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ MCP │ │ MCP │ │ MCP │
│ Server A │ │ Server B │ │ Server C │
│ (tools) │ │ (memory) │ │ (search) │
└──────────┘ └──────────┘ └──────────┘
MCP servers expose tools (functions the AI can call), resources (data the AI can read), and prompts (structured prompt templates). These are accessed over HTTP/SSE in remote MCP deployments. They must sit behind an API Gateway in production for several non-negotiable reasons:
1. Authentication: AI agents present Bearer tokens. The gateway validates these before any tool call reaches the MCP server. Without this, any caller can invoke your tools.
2. Rate limiting: AI agents can generate thousands of tool calls per minute. Without rate limiting at the gateway, a runaway agent can exhaust your MCP server's resources or trigger expensive downstream API calls.
3. Audit logging: Every tool call an AI agent makes should be logged with the agent identity, tool name, parameters, and response. This is critical for debugging agent behavior and regulatory compliance. The gateway is the right place for this — not each individual MCP server.
4. Tool discovery governance: The gateway can enforce which tools a specific agent identity is allowed to access, creating an authorization layer above the MCP protocol's own capability negotiation.
For production deployment guidance specific to MCP servers, see Running MCP in Production, which covers transport selection, health checking, and operational patterns.
Practical MCP Gateway Configuration (Kong)
# kong.yaml — MCP server routing with auth and rate limiting
services:
- name: mcp-tools-service
url: http://mcp-server.internal:3000
connect_timeout: 5000
read_timeout: 60000 # MCP tools can be slow — set generous timeouts
write_timeout: 60000
routes:
- name: mcp-jsonrpc
service: mcp-tools-service
paths:
- /mcp
methods:
- POST
- GET # For SSE transport
headers:
Content-Type:
- application/json
plugins:
- name: jwt
service: mcp-tools-service
config:
claims_to_verify: [exp]
- name: rate-limiting
service: mcp-tools-service
config:
minute: 500
policy: redis
limit_by: consumer
fault_tolerant: true
- name: request-transformer
service: mcp-tools-service
config:
add:
headers:
- X-Agent-ID:$(consumer.id)
- X-Request-ID:$(uuid)
- name: file-log
service: mcp-tools-service
config:
path: /var/log/kong/mcp-audit.log
Before deploying MCP servers publicly, validate your server's security posture using MCPForge's verification tool, which checks for common misconfigurations including missing authentication, overly permissive CORS, and unvalidated tool parameters.
SSE Transport Considerations
MCP's SSE (Server-Sent Events) transport requires special handling at the gateway layer:
# NGINX: SSE proxy configuration for MCP
location /mcp/sse {
proxy_pass http://mcp_backend;
proxy_set_header Connection '';
proxy_http_version 1.1;
proxy_buffering off; # Critical for SSE
proxy_cache off;
proxy_read_timeout 3600s; # Long timeout for streaming
chunked_transfer_encoding on;
# Required headers for SSE
add_header Cache-Control no-cache;
add_header X-Accel-Buffering no;
}
L7 load balancers need explicit SSE configuration (disable buffering, extend timeouts). API Gateways like Kong handle this via route-level configuration but often require the same underlying proxy settings.
Common Misconceptions
"API Gateways handle all my L7 load balancing needs"
At low traffic volumes, this is true. At scale, it breaks down. Kong running 20 plugins per request adds meaningful CPU overhead. HAProxy at 1M+ requests/second on a single instance is something Kong cannot match at equivalent cost. Use the right tool for each layer.
"I don't need an API Gateway if I have a service mesh"
Service meshes (Istio, Linkerd) manage east-west traffic between services. API Gateways manage north-south traffic from external consumers. They solve different problems. A service mesh gives you mTLS, retries, and circuit breaking between internal services. It does not give you API key management, developer portals, or per-consumer rate limiting.
"NGINX can do everything an API Gateway can"
NGINX can be extended with Lua (via OpenResty) to implement API Gateway features. Teams that go down this path usually end up maintaining a significant custom codebase. This is engineering debt that grows every time you add a feature. Unless you have a specific performance or control requirement that justifies it, use a purpose-built API Gateway.
"Cloud load balancers are dumb L4 devices"
AWS ALB operates at Layer 7. It understands HTTP headers, paths, query strings, and host headers. It supports content-based routing, health checks, WebSocket, and HTTP/2. It is not a dumb TCP load balancer. AWS NLB is the L4 option.
"I can do auth in my backend services instead of the gateway"
You can. Most teams regret this at scale. When auth logic lives in 12 different microservices, each service needs to be updated when you rotate signing keys, change token formats, or add a new OAuth provider. Centralizing auth at the gateway means one place to update, audit, and test. The backend services trust the gateway's X-Consumer-ID header and focus on business logic.
Production Architecture Decision Guide
Use this table to determine your architecture pattern:
| Situation | Recommended Approach |
|---|---|
| Single public API, < 1000 req/min | API Gateway only (AWS API GW or Kong) |
| High-throughput internal service pool | L7 Load Balancer only (NGINX/HAProxy) |
| Public API with developer ecosystem | API Gateway + L7 LB at edge |
| Kubernetes microservices, no public API | Ingress (Traefik/NGINX) + service mesh |
| Kubernetes + public API | Ingress + API Gateway (Kong Ingress Controller) |
| Serverless functions (AWS) | AWS API Gateway + ALB (different layers) |
| AI agent infrastructure / MCP servers | API Gateway (auth + rate limit) + L7 LB at edge |
| Regulated industry (PCI, HIPAA) | L7 LB + API Gateway + WAF + mTLS |
| Multi-region global API | CDN + L7 LB (anycast) + API Gateway |
Performance Implications
Understanding the performance trade-off is essential before making architectural decisions.
Latency Budget
Typical request path latency additions:
┌────────────────────────────────────────┐
│ Component Added p50 │
├────────────────────────────────────────┤
│ HAProxy (L7 routing) < 0.5ms │
│ NGINX (reverse proxy) < 1ms │
│ Traefik (with middleware) 1–3ms │
│ Envoy (with filter chain) 1–2ms │
│ Kong (basic, no plugins) 2–4ms │
│ Kong (JWT + rate limit) 4–8ms │
│ AWS API Gateway 5–15ms │
│ Kong (many plugins) 8–20ms │
└────────────────────────────────────────┘
For a public API where round-trip to your users is already 50–200ms, adding 10ms at the gateway is acceptable. For internal microservice calls where latency is 2–5ms, adding 8ms at an API Gateway is a 4x overhead — use a service mesh sidecar instead.
Kong Latency Optimization
# Disable unused plugins per route
# Each plugin adds processing time
# Use declarative config (DB-less mode) for lower latency
# Kong DB-less mode eliminates database queries during request processing
# Enable keepalives to upstreams
curl -X PATCH http://localhost:8001/upstreams/my-upstream \
-d keepalive_pool_size=60 \
-d keepalive_idle_timeout=60
Security Checklist for Production
For comprehensive MCP server security analysis, check the MCPForge security reports to understand common vulnerabilities in gateway-adjacent AI infrastructure.
API Gateway Security
- JWT verification uses JWKS endpoint (not static secrets)
- Token expiry (
expclaim) is enforced - Rate limiting is in place per consumer and per IP
- All admin API endpoints are network-restricted (not public)
- Plugin configurations are reviewed for
fault_tolerantimplications - Request size limits are set (prevent large payload attacks)
- CORS is configured to specific allowed origins
- Response headers leak no internal service details
- Audit logging captures consumer ID, route, status code, and latency
L7 Load Balancer Security
- TLS 1.2 minimum enforced (TLS 1.3 preferred)
- Weak cipher suites disabled
- HSTS header added at LB level
- Backend health checks use dedicated
/healthendpoints - Slow loris protection enabled (client body/header timeouts)
- Internal admin interfaces bound to management network only
- Access logs include
X-Forwarded-Forand real client IP
# NGINX: Security hardening essentials
server {
# TLS hardening
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384;
ssl_prefer_server_ciphers off;
ssl_session_cache shared:SSL:10m;
ssl_session_timeout 1d;
ssl_session_tickets off;
# HSTS
add_header Strict-Transport-Security "max-age=63072000" always;
# Slow loris protection
client_body_timeout 10s;
client_header_timeout 10s;
# Hide NGINX version
server_tokens off;
# Size limits
client_max_body_size 10m;
}
Kubernetes-Specific Patterns
In Kubernetes, the gateway landscape maps to specific components:
┌─────────────────────────────────────────────────────────┐
│ Kubernetes Cluster │
│ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Ingress Controller │ │
│ │ (NGINX Ingress / Traefik / Kong Ingress) │ │
│ │ Function: L7 LB + basic routing + TLS │ │
│ └─────────────────┬───────────────────────────────┘ │
│ │ │
│ ┌─────────────────▼───────────────────────────────┐ │
│ │ API Gateway (optional) │ │
│ │ (Kong DP / Envoy Gateway / Gloo Edge) │ │
│ │ Function: Auth + rate limit + transform │ │
│ └──────────────┬──────────────────────────────────┘ │
│ │ │
│ ┌──────────────▼──────────────────────────────────┐ │
│ │ Service Mesh (optional) │ │
│ │ (Istio / Linkerd) │ │
│ │ Function: mTLS + retries + circuit break │ │
│ └─────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
The Kong Ingress Controller collapses the Ingress Controller and API Gateway into one component — useful for reducing operational overhead when you don't need a separate L7 LB tier.
# Kong Ingress Controller: KongPlugin for JWT auth
apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
name: jwt-auth
namespace: api
config:
key_claim_name: kid
claims_to_verify:
- exp
plugin: jwt
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: api-ingress
namespace: api
annotations:
konghq.com/plugins: jwt-auth
spec:
ingressClassName: kong
rules:
- host: api.example.com
http:
paths:
- path: /v1
pathType: Prefix
backend:
service:
name: api-service-v1
port:
number: 80
Making the Final Decision: A Practical Framework
When evaluating your architecture, answer these five questions:
1. Do external consumers need to authenticate with API keys or JWTs? Yes → You need an API Gateway.
2. Do you need per-consumer rate limiting or quota management? Yes → You need an API Gateway.
3. Do you have high-throughput, low-latency routing requirements (>50k req/sec)? Yes → You need a purpose-built L7 load balancer at the edge.
4. Are you building a developer ecosystem with a public API? Yes → You need an API Gateway with a developer portal.
5. Are you running AI agents or MCP servers that make tool calls? Yes → You need an API Gateway for auth, rate limiting, and audit logging, with the MCP server behind it.
For most production systems beyond simple hobby projects, the answer to multiple questions above is "yes" — which means you need both, layered appropriately.
Key Takeaways
- Layer 7 load balancers excel at traffic distribution, health routing, and TLS termination with minimal overhead. They are not API management tools.
- API Gateways manage the contract between API consumers and your backends: auth, quotas, versioning, analytics. They add latency and operational complexity that must be justified.
- Most production architectures use both: an L7 LB at the network edge, an API Gateway behind it for API management, and optionally a service mesh for east-west traffic.
- MCP servers must sit behind an API Gateway in any production AI infrastructure deployment. Auth, rate limiting, and audit logging cannot be optional.
- Don't over-engineer early — a single NGINX instance handles surprising scale. Add an API Gateway when you have API consumers whose access you need to manage, not before.
- Envoy and Kong are the most versatile options for teams that may need to evolve from L7 LB to full API Gateway without replacing their infrastructure.
- Performance overhead matters at scale — measure gateway latency in your specific workload before committing to a tool.