API Gateway (Kong): Deep Dive
This demo places Kong between clients and backend services. Clients send requests to Kong. Kong applies rate limiting, authenticates requests, and forwards them to the correct backend. The backends never see raw client traffic.
This document explains why API gateways exist, how Kong implements the pattern on Kubernetes, the algorithms behind rate limiting, authentication methods, and how Kong compares to alternatives like NGINX Ingress Controller and Envoy-based gateways.
Why Not Direct Client-to-Service Communication
Section titled “Why Not Direct Client-to-Service Communication”In the previous demo, the Ingress routes traffic directly to backend services. That works for simple setups. But as the number of services grows, problems appear.
Without a gateway, clients need to know about every service. If you have an orders service, a users service, and a products service, the client needs three different URLs. Add a service, and every client must update.
Cross-cutting concerns get duplicated. Rate limiting, authentication, logging, CORS handling: each service implements them independently. Bugs in one service’s auth layer do not exist in another’s, because they are different implementations.
Protocol translation becomes impossible. If a backend speaks gRPC but the client speaks REST, something needs to translate. Without a gateway, that translation lives in each service.
An API gateway solves all three problems. It is a single entry point that handles routing, security, and protocol concerns. Backends focus on business logic.
Kong Architecture
Section titled “Kong Architecture”Kong is a gateway built on top of NGINX and OpenResty (LuaJIT). It has two logical planes.
The data plane handles request traffic. It receives client requests, applies plugins (rate limiting, auth, transforms), and forwards requests to upstream services. In this demo, the data plane is the Kong pod that listens on port 80.
The control plane manages configuration. In Kubernetes-native mode, the Kong Ingress Controller reads Kubernetes resources (Ingress, Service, KongPlugin, KongConsumer) and configures the data plane automatically. There is no separate admin database. Kubernetes is the source of truth.
This demo installs Kong via Helm:
helm install kong kong/ingress -n api-gateway-demo \ --set gateway.proxy.type=NodePort \ --set gateway.resources.requests.memory=256Mi \ --set gateway.resources.requests.cpu=100m \ --set gateway.resources.limits.memory=512Mi \ --set gateway.resources.limits.cpu=500mThe gateway.proxy.type=NodePort setting exposes Kong’s proxy port on
each node. In production on OpenShift, you would use LoadBalancer or
a Route object instead.
Routing Through Kong
Section titled “Routing Through Kong”Kong uses standard Kubernetes Ingress resources for routing. The Ingress in this demo defines two paths:
apiVersion: networking.k8s.io/v1kind: Ingressmetadata: name: echo-routes namespace: api-gateway-demo annotations: konghq.com/strip-path: "true"spec: ingressClassName: kong rules: - http: paths: - path: /echo-1 pathType: Prefix backend: service: name: echo-1 port: number: 80 - path: /echo-2 pathType: Prefix backend: service: name: echo-2 port: number: 80Two details matter here.
First, ingressClassName: kong tells Kubernetes that Kong should handle
this Ingress, not the default NGINX Ingress Controller.
Second, konghq.com/strip-path: "true" removes the matched path prefix
before forwarding. A request to /echo-1/health arrives at the echo-1
service as /health. Without strip-path, the backend would receive
/echo-1/health and likely return a 404.
The Backend Services
Section titled “The Backend Services”Both echo services are nginx pods that return JSON identifying themselves:
apiVersion: v1kind: ConfigMapmetadata: name: echo-1-config namespace: api-gateway-demodata: default.conf: | server { listen 80; location / { default_type application/json; return 200 '{"service":"echo-1","message":"Hello from Echo Service 1","timestamp":"$time_iso8601"}'; } location /health { default_type application/json; return 200 '{"status":"healthy"}'; } }Echo-2 is identical except it returns "service":"echo-2". These simple
backends let you verify that Kong routes traffic correctly. When you hit
/echo-1, you should see echo-1’s response. When you hit /echo-2, you
should see echo-2’s response.
Rate Limiting
Section titled “Rate Limiting”Why Rate Limit
Section titled “Why Rate Limit”Without rate limiting, a single client can overwhelm a backend service. Whether it is a bug, a bot, or a denial-of-service attack, unbounded request rates cause cascading failures. Rate limiting protects backends by rejecting excess traffic at the gateway before it reaches the service.
The KongPlugin Resource
Section titled “The KongPlugin Resource”Kong implements rate limiting through a KongPlugin custom resource:
apiVersion: configuration.konghq.com/v1kind: KongPluginmetadata: name: rate-limit namespace: api-gateway-demo annotations: kubernetes.io/ingress.class: kongconfig: minute: 5 policy: local fault_tolerant: true hide_client_headers: falseplugin: rate-limitingThe plugin is attached to the Ingress via an annotation:
kubectl annotate ingress echo-routes -n api-gateway-demo \ konghq.com/plugins=rate-limit --overwriteNow every request through the Ingress passes through the rate limiting plugin. The 6th request within a minute gets a 429 response.
Configuration Breakdown
Section titled “Configuration Breakdown”minute: 5allows 5 requests per minute per client.policy: localtracks counters in the Kong pod’s local memory.fault_tolerant: truekeeps serving requests if the rate limiter itself fails (for example, if using a Redis-backed policy and Redis goes down).hide_client_headers: falseincludes rate limit headers in responses:X-RateLimit-Remaining,X-RateLimit-Limit, andRetry-After.
Rate Limiting Algorithms
Section titled “Rate Limiting Algorithms”Different algorithms produce different behaviors. Kong supports several.
Fixed window. Divide time into fixed intervals (one-minute windows). Count requests per window. Reset at the boundary. Simple but allows bursts. A client can send 5 requests at 10:00:59 and 5 more at 10:01:00, effectively getting 10 requests in 2 seconds.
Sliding window. Weight the current window’s count against the previous
window’s count based on elapsed time. Smoother than fixed window. Kong’s
rate-limiting plugin uses this approach by default.
Token bucket. Tokens are added at a fixed rate. Each request consumes
a token. When the bucket is empty, requests are rejected. Allows controlled
bursts up to the bucket’s capacity. Kong’s rate-limiting-advanced plugin
(Enterprise) supports this.
Leaky bucket. Requests enter a queue. The queue drains at a fixed rate. If the queue is full, new requests are rejected. Produces the smoothest output rate but adds latency.
For this demo, the sliding window algorithm with policy: local is the
default. It is sufficient for a single Kong pod.
Local vs Cluster Rate Limiting
Section titled “Local vs Cluster Rate Limiting”The policy: local setting stores counters in the Kong pod’s memory. If
you run three Kong replicas, each tracks its own counters. A client could
send 5 requests to each replica, totaling 15, without being rate limited.
For production with multiple Kong replicas, use policy: redis or
policy: cluster. Redis-backed rate limiting stores counters in Redis,
so all replicas share the same counts. This requires deploying Redis
alongside Kong.
Authentication
Section titled “Authentication”API Key Authentication
Section titled “API Key Authentication”The demo uses Kong’s key-auth plugin:
apiVersion: configuration.konghq.com/v1kind: KongPluginmetadata: name: api-key-auth namespace: api-gateway-demo annotations: kubernetes.io/ingress.class: kongplugin: key-authThis plugin rejects any request that does not include a valid apikey
header. Kong validates the key against KongConsumer resources:
apiVersion: configuration.konghq.com/v1kind: KongConsumermetadata: name: demo-user namespace: api-gateway-demo annotations: kubernetes.io/ingress.class: kongusername: demo-usercredentials: - demo-user-key---apiVersion: v1kind: Secretmetadata: name: demo-user-key namespace: api-gateway-demo labels: konghq.com/credential: key-authtype: OpaquestringData: key: my-secret-api-keyThe KongConsumer references a Secret. The Secret contains the API key. Kong
reads both and maintains an in-memory mapping. When a request arrives with
apikey: my-secret-api-key, Kong looks up the matching consumer and allows
the request.
Other Authentication Methods
Section titled “Other Authentication Methods”Kong supports several authentication plugins beyond API keys.
JWT (JSON Web Tokens). The client sends a signed JWT in the Authorization header. Kong validates the signature without calling an external service. Good for stateless authentication. Requires key distribution.
OAuth2. Kong acts as an OAuth2 provider or validates tokens against an external identity provider. More complex but supports scopes, refresh tokens, and third-party integrations.
mTLS (Mutual TLS). Both client and server present certificates. Kong validates the client certificate against a trusted CA. Strong authentication for service-to-service communication. Operationally complex because of certificate lifecycle management.
Basic Auth. Username and password in the Authorization header. Simple but insecure without TLS. Generally only used for internal services.
For most API use cases, JWT or API keys are the pragmatic choices. JWT is better for user-facing APIs where tokens carry identity claims. API keys are better for machine-to-machine communication where simplicity matters more than token expiration.
Load Balancing Strategies
Section titled “Load Balancing Strategies”When a backend service has multiple replicas, Kong distributes requests across them. Kong supports several load balancing algorithms.
Round-robin sends requests to each backend in sequence. Pod A, then pod B, then pod C, then back to pod A. Simple and fair when backends have similar capacity.
Least-connections sends each request to the backend with the fewest active connections. Better than round-robin when request processing times vary widely. A slow request on pod A does not cause pod B to sit idle.
Consistent hashing routes requests based on a hash of some value (client IP, header, cookie). The same client always reaches the same backend. Useful when backends have local caches and cache hit rates matter.
In this demo, Kubernetes Service load balancing (iptables rules managed by kube-proxy) handles distribution before Kong even sees the traffic. Kong’s load balancing becomes relevant when using Kong Upstreams instead of Kubernetes Services, or when Kong runs outside the cluster.
Request Transformation
Section titled “Request Transformation”Kong can modify requests before forwarding them. The request-transformer
plugin adds, removes, or renames headers, query parameters, and body
fields. Common uses include:
- Adding an internal header that backends expect (like
X-Request-ID) - Removing sensitive headers before they reach backends
- Rewriting query parameters for API versioning
Response Caching
Section titled “Response Caching”The proxy-cache plugin stores responses and serves them for subsequent
identical requests. This reduces backend load for read-heavy APIs. Cache
invalidation is time-based: you set a TTL, and Kong serves cached responses
until the TTL expires.
Caching at the gateway layer is effective for responses that do not change frequently and do not vary per user. Public product catalogs, static configuration, and reference data are good candidates. User-specific data is not.
Circuit Breaking
Section titled “Circuit Breaking”Kong supports passive health checking. If a backend returns consecutive errors, Kong temporarily removes it from the load balancing pool. This prevents sending traffic to a failing backend, which is the essence of circuit breaking.
Configure it through Kong Upstream resources. Set thresholds for how many failures trigger removal and how long the backend stays removed before Kong retries it.
KongPlugin and KongIngress CRDs
Section titled “KongPlugin and KongIngress CRDs”Kong extends Kubernetes with custom resource definitions (CRDs).
KongPlugin defines a plugin configuration. The plugin applies when
referenced by an annotation on an Ingress, Service, or KongConsumer.
This demo uses two KongPlugins: rate-limit and api-key-auth. Both
are attached to the echo-routes Ingress via annotations.
KongConsumer represents an API consumer (a user or application). It references Secrets that contain credentials (API keys, JWT secrets, basic auth passwords).
KongIngress provides fine-grained control over routing behavior that standard Ingress resources cannot express: custom timeouts, retry counts, header-based routing, and upstream protocol configuration.
These CRDs are installed by the Kong Helm chart. If you try to apply KongPlugin manifests before installing Kong, kubectl will reject them because the CRD does not exist yet. Always install Kong first.
Kong vs NGINX Ingress Controller
Section titled “Kong vs NGINX Ingress Controller”The default Minikube Ingress addon uses the NGINX Ingress Controller. Kong also uses NGINX under the hood. So why choose Kong?
NGINX Ingress Controller is a solid reverse proxy with annotation-based configuration. It handles routing, TLS termination, and basic rate limiting through nginx configuration snippets. It is free and widely deployed.
Kong adds a plugin ecosystem on top of NGINX. Plugins for authentication, rate limiting, request transformation, and observability are declarative Kubernetes resources. You do not write nginx config snippets. You create KongPlugin objects.
The trade-off: Kong has a larger footprint and more complexity. If you only need routing and TLS termination, NGINX Ingress Controller is simpler. If you need rate limiting, authentication, and request transformation, Kong provides them as first-class features.
Kong vs Envoy / Istio Gateway
Section titled “Kong vs Envoy / Istio Gateway”Envoy is a proxy designed for service mesh architectures. Istio uses Envoy as its data plane. The Istio Gateway API replaces Ingress for traffic management.
Envoy provides more advanced load balancing (zone-aware, weighted), automatic retries with circuit breaking, and deep observability (per-route metrics, distributed tracing). It is more complex to configure than Kong but more powerful for service-to-service traffic.
Istio Gateway uses the Kubernetes Gateway API, which is the successor to the Ingress API. It separates infrastructure configuration (Gateway) from routing configuration (HTTPRoute). This is a cleaner model than annotations on Ingress resources.
When to choose Kong: You need a standalone API gateway with a plugin ecosystem. Your use case is north-south traffic (clients to services).
When to choose Envoy/Istio: You need a full service mesh with mTLS between services, traffic splitting for canary deployments, and per-service observability. Your use case includes east-west traffic (service to service).
Many production systems use both. Kong handles north-south traffic at the edge. Istio handles east-west traffic inside the cluster. They are complementary, not competitive.
Plugin Execution Order
Section titled “Plugin Execution Order”When multiple plugins are attached to an Ingress, they execute in a defined order. Kong processes plugins in phases: certificate, rewrite, access, response, and log.
In this demo, the rate-limit plugin runs in the access phase. The
api-key-auth plugin also runs in the access phase. Within the same phase,
authentication plugins execute before traffic control plugins. So Kong
checks the API key first, then checks the rate limit.
This ordering matters. If rate limiting ran first, an unauthenticated client could consume rate limit tokens, preventing legitimate users from accessing the service.
Production Considerations
Section titled “Production Considerations”This demo is simplified for learning. A production Kong deployment needs more.
TLS termination. Kong should terminate TLS at the gateway. Clients connect over HTTPS. Kong forwards to backends over HTTP (or mTLS for sensitive traffic).
High availability. Run multiple Kong replicas. Use Redis-backed rate limiting so counters are shared. Put a load balancer in front of Kong pods.
Secrets management. API keys in Kubernetes Secrets (as in this demo) are base64-encoded, not encrypted. In production, use an external secrets manager (Vault, AWS Secrets Manager) and inject secrets through the External Secrets Operator or OpenShift Secrets.
Monitoring. Enable the Prometheus plugin to export request metrics. Set up dashboards for request rate, error rate, and latency per route. Alert on 5xx spikes and rate limit exhaustion.
Admin API security. Kong’s admin API can modify configuration. In Kubernetes-native mode, the Ingress Controller handles configuration, so the admin API is less critical. But if exposed, it must be protected.
Certificate rotation. If using mTLS, automate certificate rotation with cert-manager. Expired certificates cause outages.
How the Pieces Fit Together
Section titled “How the Pieces Fit Together”The request flow in this demo:
- Client sends
GET /echo-1withapikey: my-secret-api-keyheader. - Kong receives the request on port 80.
- The
key-authplugin validates the API key against KongConsumer resources. If invalid, Kong returns 401. - The
rate-limitingplugin checks the client’s request count. If over the limit, Kong returns 429 withRetry-Afterheader. - Kong matches the path
/echo-1to the Ingress rule. - Kong strips the
/echo-1prefix (because ofstrip-path: "true"). - Kong forwards the request to the
echo-1Service on port 80. - The echo-1 pod returns JSON.
- Kong forwards the response to the client with rate limit headers added.
Every step is handled by Kong. The echo-1 service knows nothing about authentication or rate limiting. It just returns JSON.
Where to Go Next
Section titled “Where to Go Next”- Microservices Platform covers the service decomposition patterns that Kong sits in front of.
- Event-Driven Kafka explores asynchronous messaging, a communication pattern that does not flow through the API gateway.