Knative Serving

Run serverless workloads that scale to zero and back, with revision-based traffic splitting.

Time: ~20 minutes Difficulty: Intermediate

Resources: Knative Serving installs a control plane (~500MB RAM). Clean up other demos first: task clean:all

What You Will Learn

How Knative Serving provides serverless capabilities on Kubernetes
Scale-to-zero and automatic scale-up based on traffic
Revision-based deployments (immutable snapshots of your service)
Traffic splitting for canary and blue-green deployments
Concurrency-based autoscaling with target metrics

Prerequisites

Install Knative Serving with Kourier networking:

# Install Knative Serving CRDs and core
kubectl apply -f https://github.com/knative/serving/releases/latest/download/serving-crds.yaml
kubectl apply -f https://github.com/knative/serving/releases/latest/download/serving-core.yaml

# Install Kourier networking layer
kubectl apply -f https://github.com/knative/net-kourier/releases/latest/download/kourier.yaml

# Configure Knative to use Kourier
kubectl patch configmap/config-network -n knative-serving --type merge -p '{"data":{"ingress-class":"kourier.ingress.networking.knative.dev"}}'

# Wait for components
kubectl wait --for=condition=ready pod --all -n knative-serving --timeout=120s
kubectl wait --for=condition=ready pod --all -n kourier-system --timeout=120s

Configure DNS (use sslip.io for minikube):

kubectl apply -f https://github.com/knative/serving/releases/latest/download/serving-default-domain.yaml

Deploy

Navigate to the demo directory:

cd demos/knative-serving

Apply the namespace:

kubectl apply -f manifests/namespace.yaml

Deploy the initial hello service:

kubectl apply -f manifests/service-hello.yaml

Wait for the Knative Service to become ready:

kubectl wait --for=condition=ready ksvc hello -n knative-demo --timeout=120s

Get the service URL:

kubectl get ksvc hello -n knative-demo -o jsonpath='{.status.url}'

Verify

Check that the service is ready:

kubectl get ksvc hello -n knative-demo

You should see output like:

NAME    URL                                      LATESTCREATED   LATESTREADY     READY   REASON
hello   http://hello.knative-demo.10.0.0.1.sslip.io   hello-00001     hello-00001     True

Curl the service URL:

SERVICE_URL=$(kubectl get ksvc hello -n knative-demo -o jsonpath='{.status.url}')
curl $SERVICE_URL

You should see:

Hello World v1!

Test scale-to-zero

Wait 60-90 seconds without sending any requests. Then check pods:

kubectl get pods -n knative-demo

You should see no pods running (or pods terminating). The service scaled to zero because there was no traffic.

Now curl the service again:

curl $SERVICE_URL

Check pods immediately:

kubectl get pods -n knative-demo

You will see a pod spinning up (cold start). The Knative Activator queued your request while the pod started.

Test traffic splitting

Apply the v2 service configuration (80% latest, 20% v1):

kubectl apply -f manifests/service-hello-v2.yaml

Wait for the new revision to be ready:

kubectl wait --for=condition=ready ksvc hello -n knative-demo --timeout=120s

Check revisions:

kubectl get revisions -n knative-demo

You should see two revisions:

NAME           CONFIG NAME   K8S SERVICE NAME   GENERATION   READY   REASON
hello-00001    hello                            1            True
hello-00002    hello                            2            True

Curl the service multiple times to see traffic splitting:

for i in {1..10}; do curl $SERVICE_URL; done

You should see mostly “Hello World v2!” with occasional “Hello World v1!” responses (80/20 split).

What is Happening

manifests/
  namespace.yaml           # knative-demo namespace
  service-hello.yaml       # Initial Knative Service (creates hello-00001 revision)
  service-hello-v2.yaml    # Updated service with traffic split (creates hello-00002, 80% v2, 20% v1)
  service-hello-v3.yaml    # Another update with 50/50 split between v2 and v3
  service-autoscale.yaml   # Demo service with concurrency-based autoscaling

Knative Serving brings serverless capabilities to Kubernetes. Unlike a standard Deployment, a Knative Service automatically creates a Configuration and Route. Each time you update the service, Knative creates a new immutable Revision (a snapshot of the container image, environment variables, and resource limits).

Key concepts:

Scale-to-zero: When there is no traffic, Knative terminates all pods after a grace period (default 60 seconds). The Activator component sits in front of your service and queues incoming requests while pods spin up. This saves cluster resources for services that are idle most of the time.

Revisions: Every change to the service spec creates a new revision. Revisions are immutable and numbered sequentially (hello-00001, hello-00002). You can pin traffic to specific revisions for canary or blue-green deployments.

Traffic splitting: The traffic block defines how requests are distributed across revisions. You can route by percentage (80/20 canary) or by named tags. Traffic splitting happens at the Kourier ingress layer, before requests reach your pods.

Autoscaling: Knative watches concurrency (concurrent requests per pod) and RPS (requests per second). When concurrency exceeds the target, Knative spins up more pods. The autoscaler supports multiple modes: concurrency-based (default), RPS-based, and custom metrics. You can set min-scale (minimum replicas) and max-scale (cap on replicas).

Experiment

Deploy the autoscale demo and generate load to watch pods scale up:

kubectl apply -f manifests/service-autoscale.yaml
kubectl wait --for=condition=ready ksvc autoscale-demo -n knative-demo --timeout=120s

# Get the URL
AUTOSCALE_URL=$(kubectl get ksvc autoscale-demo -n knative-demo -o jsonpath='{.status.url}')

# Generate load (requires hey: go install github.com/rakyll/hey@latest)
hey -z 30s -c 50 $AUTOSCALE_URL

# Watch pods scale up
kubectl get pods -n knative-demo -w

You should see pods scale from 0 to 5 as concurrency increases, then back to 0 after traffic stops.

Apply the v3 service configuration for a 50/50 blue-green split:

kubectl apply -f manifests/service-hello-v3.yaml
kubectl wait --for=condition=ready ksvc hello -n knative-demo --timeout=120s

# Curl multiple times to see 50/50 split
for i in {1..10}; do curl $SERVICE_URL; done

Pin traffic to a specific revision using tags:

kubectl patch ksvc hello -n knative-demo --type merge -p '
{
  "spec": {
    "traffic": [
      {
        "revisionName": "hello-00001",
        "percent": 100,
        "tag": "stable"
      }
    ]
  }
}'

# Access the stable tag URL
kubectl get ksvc hello -n knative-demo -o jsonpath='{.status.traffic[?(@.tag=="stable")].url}'

Change the scale-to-zero grace period to 30 seconds:

kubectl patch ksvc hello -n knative-demo --type merge -p '
{
  "spec": {
    "template": {
      "metadata": {
        "annotations": {
          "autoscaling.knative.dev/scale-to-zero-grace-period": "30s"
        }
      }
    }
  }
}'

Check Knative metrics and status:

kubectl get ksvc,revision,route -n knative-demo
kubectl describe ksvc hello -n knative-demo
kubectl describe revision hello-00002 -n knative-demo

Check Activator logs to see request queuing during cold start:
Terminal window
```
kubectl logs -f deployment/activator -n knative-serving
```

Try changing the autoscaling mode to RPS-based instead of concurrency:

kubectl patch ksvc autoscale-demo -n knative-demo --type merge -p '
{
  "spec": {
    "template": {
      "metadata": {
        "annotations": {
          "autoscaling.knative.dev/metric": "rps",
          "autoscaling.knative.dev/target": "50"
        }
      }
    }
  }
}'

Cleanup

Delete the demo namespace:

kubectl delete namespace knative-demo

Optionally, remove Knative Serving components:

kubectl delete -f https://github.com/knative/net-kourier/releases/latest/download/kourier.yaml
kubectl delete -f https://github.com/knative/serving/releases/latest/download/serving-core.yaml
kubectl delete -f https://github.com/knative/serving/releases/latest/download/serving-crds.yaml
kubectl delete -f https://github.com/knative/serving/releases/latest/download/serving-default-domain.yaml

Next Step

Move on to Trivy Operator to scan your running containers for vulnerabilities and misconfigurations.