Skip to content

API Gateway (Kong): Deep Dive

This demo places Kong between clients and backend services. Clients send requests to Kong. Kong applies rate limiting, authenticates requests, and forwards them to the correct backend. The backends never see raw client traffic.

This document explains why API gateways exist, how Kong implements the pattern on Kubernetes, the algorithms behind rate limiting, authentication methods, and how Kong compares to alternatives like NGINX Ingress Controller and Envoy-based gateways.

Why Not Direct Client-to-Service Communication

Section titled “Why Not Direct Client-to-Service Communication”

In the previous demo, the Ingress routes traffic directly to backend services. That works for simple setups. But as the number of services grows, problems appear.

Without a gateway, clients need to know about every service. If you have an orders service, a users service, and a products service, the client needs three different URLs. Add a service, and every client must update.

Cross-cutting concerns get duplicated. Rate limiting, authentication, logging, CORS handling: each service implements them independently. Bugs in one service’s auth layer do not exist in another’s, because they are different implementations.

Protocol translation becomes impossible. If a backend speaks gRPC but the client speaks REST, something needs to translate. Without a gateway, that translation lives in each service.

An API gateway solves all three problems. It is a single entry point that handles routing, security, and protocol concerns. Backends focus on business logic.

Kong is a gateway built on top of NGINX and OpenResty (LuaJIT). It has two logical planes.

The data plane handles request traffic. It receives client requests, applies plugins (rate limiting, auth, transforms), and forwards requests to upstream services. In this demo, the data plane is the Kong pod that listens on port 80.

The control plane manages configuration. In Kubernetes-native mode, the Kong Ingress Controller reads Kubernetes resources (Ingress, Service, KongPlugin, KongConsumer) and configures the data plane automatically. There is no separate admin database. Kubernetes is the source of truth.

This demo installs Kong via Helm:

Terminal window
helm install kong kong/ingress -n api-gateway-demo \
--set gateway.proxy.type=NodePort \
--set gateway.resources.requests.memory=256Mi \
--set gateway.resources.requests.cpu=100m \
--set gateway.resources.limits.memory=512Mi \
--set gateway.resources.limits.cpu=500m

The gateway.proxy.type=NodePort setting exposes Kong’s proxy port on each node. In production on OpenShift, you would use LoadBalancer or a Route object instead.

Kong uses standard Kubernetes Ingress resources for routing. The Ingress in this demo defines two paths:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: echo-routes
namespace: api-gateway-demo
annotations:
konghq.com/strip-path: "true"
spec:
ingressClassName: kong
rules:
- http:
paths:
- path: /echo-1
pathType: Prefix
backend:
service:
name: echo-1
port:
number: 80
- path: /echo-2
pathType: Prefix
backend:
service:
name: echo-2
port:
number: 80

Two details matter here.

First, ingressClassName: kong tells Kubernetes that Kong should handle this Ingress, not the default NGINX Ingress Controller.

Second, konghq.com/strip-path: "true" removes the matched path prefix before forwarding. A request to /echo-1/health arrives at the echo-1 service as /health. Without strip-path, the backend would receive /echo-1/health and likely return a 404.

Both echo services are nginx pods that return JSON identifying themselves:

apiVersion: v1
kind: ConfigMap
metadata:
name: echo-1-config
namespace: api-gateway-demo
data:
default.conf: |
server {
listen 80;
location / {
default_type application/json;
return 200 '{"service":"echo-1","message":"Hello from Echo Service 1","timestamp":"$time_iso8601"}';
}
location /health {
default_type application/json;
return 200 '{"status":"healthy"}';
}
}

Echo-2 is identical except it returns "service":"echo-2". These simple backends let you verify that Kong routes traffic correctly. When you hit /echo-1, you should see echo-1’s response. When you hit /echo-2, you should see echo-2’s response.

Without rate limiting, a single client can overwhelm a backend service. Whether it is a bug, a bot, or a denial-of-service attack, unbounded request rates cause cascading failures. Rate limiting protects backends by rejecting excess traffic at the gateway before it reaches the service.

Kong implements rate limiting through a KongPlugin custom resource:

apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
name: rate-limit
namespace: api-gateway-demo
annotations:
kubernetes.io/ingress.class: kong
config:
minute: 5
policy: local
fault_tolerant: true
hide_client_headers: false
plugin: rate-limiting

The plugin is attached to the Ingress via an annotation:

Terminal window
kubectl annotate ingress echo-routes -n api-gateway-demo \
konghq.com/plugins=rate-limit --overwrite

Now every request through the Ingress passes through the rate limiting plugin. The 6th request within a minute gets a 429 response.

  • minute: 5 allows 5 requests per minute per client.
  • policy: local tracks counters in the Kong pod’s local memory.
  • fault_tolerant: true keeps serving requests if the rate limiter itself fails (for example, if using a Redis-backed policy and Redis goes down).
  • hide_client_headers: false includes rate limit headers in responses: X-RateLimit-Remaining, X-RateLimit-Limit, and Retry-After.

Different algorithms produce different behaviors. Kong supports several.

Fixed window. Divide time into fixed intervals (one-minute windows). Count requests per window. Reset at the boundary. Simple but allows bursts. A client can send 5 requests at 10:00:59 and 5 more at 10:01:00, effectively getting 10 requests in 2 seconds.

Sliding window. Weight the current window’s count against the previous window’s count based on elapsed time. Smoother than fixed window. Kong’s rate-limiting plugin uses this approach by default.

Token bucket. Tokens are added at a fixed rate. Each request consumes a token. When the bucket is empty, requests are rejected. Allows controlled bursts up to the bucket’s capacity. Kong’s rate-limiting-advanced plugin (Enterprise) supports this.

Leaky bucket. Requests enter a queue. The queue drains at a fixed rate. If the queue is full, new requests are rejected. Produces the smoothest output rate but adds latency.

For this demo, the sliding window algorithm with policy: local is the default. It is sufficient for a single Kong pod.

The policy: local setting stores counters in the Kong pod’s memory. If you run three Kong replicas, each tracks its own counters. A client could send 5 requests to each replica, totaling 15, without being rate limited.

For production with multiple Kong replicas, use policy: redis or policy: cluster. Redis-backed rate limiting stores counters in Redis, so all replicas share the same counts. This requires deploying Redis alongside Kong.

The demo uses Kong’s key-auth plugin:

apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
name: api-key-auth
namespace: api-gateway-demo
annotations:
kubernetes.io/ingress.class: kong
plugin: key-auth

This plugin rejects any request that does not include a valid apikey header. Kong validates the key against KongConsumer resources:

apiVersion: configuration.konghq.com/v1
kind: KongConsumer
metadata:
name: demo-user
namespace: api-gateway-demo
annotations:
kubernetes.io/ingress.class: kong
username: demo-user
credentials:
- demo-user-key
---
apiVersion: v1
kind: Secret
metadata:
name: demo-user-key
namespace: api-gateway-demo
labels:
konghq.com/credential: key-auth
type: Opaque
stringData:
key: my-secret-api-key

The KongConsumer references a Secret. The Secret contains the API key. Kong reads both and maintains an in-memory mapping. When a request arrives with apikey: my-secret-api-key, Kong looks up the matching consumer and allows the request.

Kong supports several authentication plugins beyond API keys.

JWT (JSON Web Tokens). The client sends a signed JWT in the Authorization header. Kong validates the signature without calling an external service. Good for stateless authentication. Requires key distribution.

OAuth2. Kong acts as an OAuth2 provider or validates tokens against an external identity provider. More complex but supports scopes, refresh tokens, and third-party integrations.

mTLS (Mutual TLS). Both client and server present certificates. Kong validates the client certificate against a trusted CA. Strong authentication for service-to-service communication. Operationally complex because of certificate lifecycle management.

Basic Auth. Username and password in the Authorization header. Simple but insecure without TLS. Generally only used for internal services.

For most API use cases, JWT or API keys are the pragmatic choices. JWT is better for user-facing APIs where tokens carry identity claims. API keys are better for machine-to-machine communication where simplicity matters more than token expiration.

When a backend service has multiple replicas, Kong distributes requests across them. Kong supports several load balancing algorithms.

Round-robin sends requests to each backend in sequence. Pod A, then pod B, then pod C, then back to pod A. Simple and fair when backends have similar capacity.

Least-connections sends each request to the backend with the fewest active connections. Better than round-robin when request processing times vary widely. A slow request on pod A does not cause pod B to sit idle.

Consistent hashing routes requests based on a hash of some value (client IP, header, cookie). The same client always reaches the same backend. Useful when backends have local caches and cache hit rates matter.

In this demo, Kubernetes Service load balancing (iptables rules managed by kube-proxy) handles distribution before Kong even sees the traffic. Kong’s load balancing becomes relevant when using Kong Upstreams instead of Kubernetes Services, or when Kong runs outside the cluster.

Kong can modify requests before forwarding them. The request-transformer plugin adds, removes, or renames headers, query parameters, and body fields. Common uses include:

  • Adding an internal header that backends expect (like X-Request-ID)
  • Removing sensitive headers before they reach backends
  • Rewriting query parameters for API versioning

The proxy-cache plugin stores responses and serves them for subsequent identical requests. This reduces backend load for read-heavy APIs. Cache invalidation is time-based: you set a TTL, and Kong serves cached responses until the TTL expires.

Caching at the gateway layer is effective for responses that do not change frequently and do not vary per user. Public product catalogs, static configuration, and reference data are good candidates. User-specific data is not.

Kong supports passive health checking. If a backend returns consecutive errors, Kong temporarily removes it from the load balancing pool. This prevents sending traffic to a failing backend, which is the essence of circuit breaking.

Configure it through Kong Upstream resources. Set thresholds for how many failures trigger removal and how long the backend stays removed before Kong retries it.

Kong extends Kubernetes with custom resource definitions (CRDs).

KongPlugin defines a plugin configuration. The plugin applies when referenced by an annotation on an Ingress, Service, or KongConsumer. This demo uses two KongPlugins: rate-limit and api-key-auth. Both are attached to the echo-routes Ingress via annotations.

KongConsumer represents an API consumer (a user or application). It references Secrets that contain credentials (API keys, JWT secrets, basic auth passwords).

KongIngress provides fine-grained control over routing behavior that standard Ingress resources cannot express: custom timeouts, retry counts, header-based routing, and upstream protocol configuration.

These CRDs are installed by the Kong Helm chart. If you try to apply KongPlugin manifests before installing Kong, kubectl will reject them because the CRD does not exist yet. Always install Kong first.

The default Minikube Ingress addon uses the NGINX Ingress Controller. Kong also uses NGINX under the hood. So why choose Kong?

NGINX Ingress Controller is a solid reverse proxy with annotation-based configuration. It handles routing, TLS termination, and basic rate limiting through nginx configuration snippets. It is free and widely deployed.

Kong adds a plugin ecosystem on top of NGINX. Plugins for authentication, rate limiting, request transformation, and observability are declarative Kubernetes resources. You do not write nginx config snippets. You create KongPlugin objects.

The trade-off: Kong has a larger footprint and more complexity. If you only need routing and TLS termination, NGINX Ingress Controller is simpler. If you need rate limiting, authentication, and request transformation, Kong provides them as first-class features.

Envoy is a proxy designed for service mesh architectures. Istio uses Envoy as its data plane. The Istio Gateway API replaces Ingress for traffic management.

Envoy provides more advanced load balancing (zone-aware, weighted), automatic retries with circuit breaking, and deep observability (per-route metrics, distributed tracing). It is more complex to configure than Kong but more powerful for service-to-service traffic.

Istio Gateway uses the Kubernetes Gateway API, which is the successor to the Ingress API. It separates infrastructure configuration (Gateway) from routing configuration (HTTPRoute). This is a cleaner model than annotations on Ingress resources.

When to choose Kong: You need a standalone API gateway with a plugin ecosystem. Your use case is north-south traffic (clients to services).

When to choose Envoy/Istio: You need a full service mesh with mTLS between services, traffic splitting for canary deployments, and per-service observability. Your use case includes east-west traffic (service to service).

Many production systems use both. Kong handles north-south traffic at the edge. Istio handles east-west traffic inside the cluster. They are complementary, not competitive.

When multiple plugins are attached to an Ingress, they execute in a defined order. Kong processes plugins in phases: certificate, rewrite, access, response, and log.

In this demo, the rate-limit plugin runs in the access phase. The api-key-auth plugin also runs in the access phase. Within the same phase, authentication plugins execute before traffic control plugins. So Kong checks the API key first, then checks the rate limit.

This ordering matters. If rate limiting ran first, an unauthenticated client could consume rate limit tokens, preventing legitimate users from accessing the service.

This demo is simplified for learning. A production Kong deployment needs more.

TLS termination. Kong should terminate TLS at the gateway. Clients connect over HTTPS. Kong forwards to backends over HTTP (or mTLS for sensitive traffic).

High availability. Run multiple Kong replicas. Use Redis-backed rate limiting so counters are shared. Put a load balancer in front of Kong pods.

Secrets management. API keys in Kubernetes Secrets (as in this demo) are base64-encoded, not encrypted. In production, use an external secrets manager (Vault, AWS Secrets Manager) and inject secrets through the External Secrets Operator or OpenShift Secrets.

Monitoring. Enable the Prometheus plugin to export request metrics. Set up dashboards for request rate, error rate, and latency per route. Alert on 5xx spikes and rate limit exhaustion.

Admin API security. Kong’s admin API can modify configuration. In Kubernetes-native mode, the Ingress Controller handles configuration, so the admin API is less critical. But if exposed, it must be protected.

Certificate rotation. If using mTLS, automate certificate rotation with cert-manager. Expired certificates cause outages.

The request flow in this demo:

  1. Client sends GET /echo-1 with apikey: my-secret-api-key header.
  2. Kong receives the request on port 80.
  3. The key-auth plugin validates the API key against KongConsumer resources. If invalid, Kong returns 401.
  4. The rate-limiting plugin checks the client’s request count. If over the limit, Kong returns 429 with Retry-After header.
  5. Kong matches the path /echo-1 to the Ingress rule.
  6. Kong strips the /echo-1 prefix (because of strip-path: "true").
  7. Kong forwards the request to the echo-1 Service on port 80.
  8. The echo-1 pod returns JSON.
  9. Kong forwards the response to the client with rate limit headers added.

Every step is handled by Kong. The echo-1 service knows nothing about authentication or rate limiting. It just returns JSON.

  • Microservices Platform covers the service decomposition patterns that Kong sits in front of.
  • Event-Driven Kafka explores asynchronous messaging, a communication pattern that does not flow through the API gateway.