Skip to content

Redis Caching: Deep Dive

This document explains the design decisions, implementation patterns, and infrastructure details behind the Redis caching demo. It covers why things work the way they do, what tradeoffs were made, and what you should understand before applying these patterns to your own applications.


Every web application eventually hits a wall where the database becomes the bottleneck. The symptoms are predictable: response times creep up, database CPU spikes during peak traffic, and users start noticing.

The root cause is usually some combination of:

  • Expensive queries. JOINs across multiple tables, aggregations over large datasets, window functions, and full-text searches all consume significant CPU and I/O.
  • Repeated work. A product catalog page viewed by 10,000 users in a minute generates 10,000 identical database queries returning identical results.
  • Network overhead. In production, the database often lives on a separate server (or a managed service in another availability zone). Every query pays the cost of a network round-trip.
  • Connection contention. Database connection pools are finite. Under high concurrency, requests queue up waiting for an available connection.

Redis solves this by sitting between your application and the database as an in-memory key-value store. The first request runs the query and stores the result in Redis. Every subsequent request for the same data returns from memory, skipping the database entirely. The cached data expires after a configurable TTL, keeping results reasonably fresh.

The result: response times drop from hundreds of milliseconds to sub-millisecond. The database handles a fraction of the traffic. Connection pools stay healthy. This demo makes that difference visible and measurable.


The entire demo runs inside a single Minikube cluster. Three pods, three services, one namespace.

Minikube Cluster
redis-demo namespace
+---------------------------------------------------------+
| |
| Browser |
| | |
| | NodePort :31xxx |
| v |
| +-------------------------+ |
| | FastAPI App | |
| | cache-demo-app | |
| | Port 8000 | |
| | | |
| | 1. Check Redis | |
| | 2. If miss, query DB | |
| | 3. Store result in | |
| | Redis with TTL | |
| +------+----------+------+ |
| | | |
| DNS: "redis" DNS: "postgres" |
| | | |
| +----v----+ +--v----------+ |
| | Redis | | PostgreSQL | |
| | 7-alpine| | 16-alpine | |
| | Port | | Port 5432 | |
| | 6379 | | | |
| | | | 10 categories |
| | 64MB | | 500 products |
| | LRU | | 5,000 reviews |
| +---------+ +-------------+ |
| |
+---------------------------------------------------------+

The request flow:

  1. A browser request arrives at the Minikube node’s IP on a NodePort.
  2. Kubernetes routes it to the FastAPI app pod on port 8000.
  3. The app checks Redis for a cached response using a cache key derived from the endpoint and query parameters.
  4. Cache hit: Redis returns the stored JSON string. The app deserializes it and responds immediately. Total time: under 1ms.
  5. Cache miss: The app runs the SQL query against PostgreSQL, serializes the result as JSON, stores it in Redis with a TTL, and returns the data. Total time: 250-360ms.
  6. Subsequent identical requests hit the cache until the TTL expires, at which point the cycle repeats.

The three services communicate over Kubernetes internal DNS. The app refers to postgres and redis by their Service names. CoreDNS resolves these to ClusterIP addresses, which route to the backing pods.


This demo uses cache-aside (also called lazy-loading), the most common caching strategy for read-heavy workloads.

The pattern works like this:

Request arrives
|
v
Check Redis for cache key
|
+-- Found (HIT) --> return cached data
|
+-- Not found (MISS)
|
v
Run database query
|
v
Store result in Redis with TTL
|
v
Return result to caller

Here is the actual implementation from main.py for the categories endpoint:

@app.get("/api/categories")
async def get_categories(use_cache: bool = Query(True)):
cache_key = "categories:stats"
start = time.perf_counter()
if use_cache:
cached = await redis_client.get(cache_key) # Check Redis
if cached: # HIT
duration = (time.perf_counter() - start) * 1000
stats["cache_hits"] += 1
return {
"source": "cache",
"duration_ms": round(duration, 2),
"data": json.loads(cached)
}
# MISS: run the database query
stats["cache_misses"] += 1
data = await _heavy_category_stats()
duration = (time.perf_counter() - start) * 1000
if use_cache:
# Store in Redis with TTL
await redis_client.setex(cache_key, CACHE_TTL, _serialize(data))
return {
"source": "database",
"duration_ms": round(duration, 2),
"data": data
}

Every endpoint in the demo follows this exact structure. The use_cache parameter lets you bypass Redis entirely, which is what makes the side-by-side comparison possible.

Why cache-aside and not write-through?

In a write-through pattern, the cache is updated every time data is written to the database. This guarantees the cache is always fresh but adds latency to every write operation and requires tighter coupling between your write path and your cache layer.

Cache-aside is simpler. The application only writes to the cache on read misses. If the cache goes down, the application gracefully falls back to the database. No writes are affected. For read-heavy workloads (which most web applications are), this tradeoff is overwhelmingly favorable.

The cost is staleness. Cached data can be up to TTL seconds old. For a product catalog, that is perfectly acceptable. For a bank balance, it is not.


Each endpoint constructs a cache key that uniquely identifies the data being requested. The demo uses colon-separated naming, which is a Redis community convention.

EndpointCache KeyWhy
Category statscategories:statsSingle result set with no parameters. One key is enough.
Product searchproducts:search:{q}:{category_id}Different search terms and category filters produce different results. Each combination needs its own key.
Top productsproducts:topSingle result set with no parameters. One key is enough.

The product search key is built dynamically:

cache_key = f"products:search:{q}:{category_id}"

This means products:search:Widget:None and products:search:Pro:1 are separate entries in Redis. Searching for “Widget” across all categories returns different data than searching for “Pro” in Electronics, so they must be cached independently.

The colon convention matters. It is not just cosmetic. Redis tools and GUIs (like RedisInsight) use colons to create a tree-like hierarchy when displaying keys. A key like products:search:Widget:None shows up under products > search > Widget in the browser, making it easy to inspect what is cached.

Key cardinality. In this demo, the number of possible keys is bounded: 10 categories, a limited set of realistic search terms, and two fixed endpoints. In a production system with millions of possible search queries, key cardinality becomes a real concern. You would need to think about whether caching every unique query is worth the memory, or whether you should only cache popular queries. The LRU eviction policy (covered below) provides a natural safety net here.


Every cached entry expires after a configurable number of seconds. The demo defaults to 60 seconds:

CACHE_TTL = int(os.getenv("CACHE_TTL", "60")) # seconds

The TTL is applied atomically using Redis’s SETEX command:

await redis_client.setex(cache_key, CACHE_TTL, _serialize(data))

SETEX combines SET and EXPIRE into a single atomic operation. There is no window where a key exists without an expiration.

Why 60 seconds? It is a balance between two concerns:

  • Too short (e.g., 5 seconds): the cache provides less benefit because most requests still hit the database. The hit rate drops.
  • Too long (e.g., 30 minutes): data becomes stale. If a product’s price changes, users see the old price for up to 30 minutes.

For a product catalog where data changes infrequently, 60 seconds is a reasonable starting point. In production, you would tune this per endpoint based on how often the underlying data changes and how much staleness your users can tolerate.

What happens at expiry? Redis silently deletes the key. The next request for that data is a cache miss, triggers a fresh database query, and repopulates the cache. This is seamless. No explicit invalidation logic is needed.


The app maintains persistent connection pools to both PostgreSQL and Redis. This is critical for performance.

db_pool = await asyncpg.create_pool(PG_DSN, min_size=2, max_size=10)
  • min_size=2: Two connections are opened at startup and kept alive. This eliminates cold-start latency. The first two requests do not need to wait for a TCP handshake, TLS negotiation, and PostgreSQL authentication.
  • max_size=10: Under load, the pool can grow to 10 connections. This caps the number of simultaneous database connections, preventing the app from overwhelming PostgreSQL. Requests beyond 10 concurrent database queries will wait for a connection to become available.

Without pooling, every request would: open a TCP connection, authenticate with PostgreSQL, run the query, and close the connection. That overhead adds 5-20ms per request in a local environment and significantly more across a network.

The pool is created during the application’s lifespan startup event and closed on shutdown:

@asynccontextmanager
async def lifespan(app: FastAPI):
global db_pool, redis_client
db_pool = await asyncpg.create_pool(PG_DSN, min_size=2, max_size=10)
redis_client = redis.from_url(REDIS_URL, decode_responses=True)
yield
await db_pool.close()
await redis_client.close()

The redis.asyncio client manages its own internal connection pool automatically:

redis_client = redis.from_url(REDIS_URL, decode_responses=True)

The decode_responses=True flag tells the client to decode byte responses from Redis into Python strings. Without this, redis_client.get() would return b'{"id": 1, ...}' (bytes), and you would need to decode manually before passing to json.loads(). With it, you get '{"id": 1, ...}' (string) directly.

Both connections use Kubernetes service DNS names (postgres and redis), which resolve to ClusterIP addresses within the namespace.


The Redis deployment is configured with two important flags:

command: ["redis-server", "--maxmemory", "64mb", "--maxmemory-policy", "allkeys-lru"]

This caps Redis at 64MB of data. Once it hits this limit, Redis must evict existing keys before storing new ones. Without a maxmemory setting, Redis would grow until it consumed all available memory and got OOM-killed by Kubernetes.

For this demo, 64MB is generous. The entire dataset (10 categories, 500 products, 5,000 reviews) serializes to well under 1MB. In production, you would size this based on your working set: the subset of data that is actively being requested.

LRU stands for Least Recently Used. When Redis needs to free memory, it evicts the key that has not been accessed for the longest time.

The allkeys-lru variant applies this eviction to all keys, not just those with an explicit TTL set. This is the right choice for a pure cache because:

  • Keys that are frequently accessed stay cached.
  • Keys that are rarely accessed get evicted naturally.
  • No manual intervention is needed to manage memory.

Other eviction policies exist (volatile-lru, allkeys-random, noeviction), but allkeys-lru is the standard choice for caching use cases.

The Redis container has its own resource constraints:

resources:
requests:
memory: "64Mi"
cpu: "50m"
limits:
memory: "128Mi"
cpu: "200m"

The Kubernetes memory limit (128Mi) is intentionally higher than the Redis maxmemory (64MB). Redis needs additional memory beyond the data store for internal bookkeeping: connection buffers, replication buffers, the key metadata overhead, and the process itself. Setting the Kubernetes limit too close to maxmemory would cause OOM kills during memory spikes.


The three pods find each other through Kubernetes DNS. Each Service object creates a DNS record in CoreDNS.

App pod connects to "postgres"
--> CoreDNS resolves to: postgres.redis-demo.svc.cluster.local
--> Returns the ClusterIP of the postgres Service
--> kube-proxy routes traffic to the postgres pod
App pod connects to "redis"
--> CoreDNS resolves to: redis.redis-demo.svc.cluster.local
--> Returns the ClusterIP of the redis Service
--> kube-proxy routes traffic to the redis pod

Within the same namespace, the short name works. The app just uses postgres and redis in its connection strings:

env:
- name: PG_DSN
value: "postgresql://demo:demo@postgres:5432/demodb"
- name: REDIS_URL
value: "redis://redis:6379/0"

The full DNS names (postgres.redis-demo.svc.cluster.local) are not needed because all three resources share the redis-demo namespace.

ServiceTypePortPurpose
postgresClusterIP5432Internal-only access to PostgreSQL
redisClusterIP6379Internal-only access to Redis
cache-demo-appNodePort8000External access from the browser

ClusterIP services are only reachable from within the cluster. The app’s NodePort service exposes a random high port on the Minikube node’s IP, which is how your browser reaches the dashboard.

PostgreSQL is seeded automatically using a ConfigMap mounted as an init script:

volumeMounts:
- name: init-sql
mountPath: /docker-entrypoint-initdb.d
volumes:
- name: init-sql
configMap:
name: postgres-init

The postgres:16-alpine image runs any .sql files found in /docker-entrypoint-initdb.d/ on first startup. The ConfigMap contains the full schema and data generation: 10 categories, 500 products (generated with generate_series), and 5,000 reviews. No manual setup steps are needed.

ComponentMemory RequestMemory LimitCPU RequestCPU Limit
PostgreSQL128Mi256Mi100m500m
Redis64Mi128Mi50m200m
FastAPI App128Mi256Mi100m500m

These are deliberately conservative. The demo is designed to run on a Minikube cluster with limited resources. In production, you would increase these based on load testing and monitoring.


Three queries of increasing complexity make the case for caching. Each one deliberately exercises expensive SQL operations.

1. Category Stats: Three-Table JOIN with Aggregation

Section titled “1. Category Stats: Three-Table JOIN with Aggregation”
SELECT
c.id, c.name,
COUNT(DISTINCT p.id) AS product_count,
COUNT(r.id) AS review_count,
ROUND(AVG(r.rating)::numeric, 2) AS avg_rating,
ROUND(AVG(p.price)::numeric, 2) AS avg_price,
ROUND(MIN(p.price)::numeric, 2) AS min_price,
ROUND(MAX(p.price)::numeric, 2) AS max_price,
SUM(p.stock) AS total_stock
FROM categories c
LEFT JOIN products p ON p.category_id = c.id
LEFT JOIN reviews r ON r.product_id = p.id
GROUP BY c.id, c.name
ORDER BY avg_rating DESC NULLS LAST

This query touches all three tables and scans every row. The LEFT JOIN from categories to products to reviews produces a result set that multiplies out: 10 categories x 50 products each x 10 reviews each = 5,000 intermediate rows. The COUNT(DISTINCT p.id) is particularly costly because PostgreSQL must deduplicate before counting.

The aggregation functions (AVG, MIN, MAX, SUM) compute across the joined result set. Then GROUP BY collapses everything into 10 rows. Sorting by avg_rating DESC NULLS LAST requires a final sort pass.

2. Product Search: LIKE with Sequential Scan

Section titled “2. Product Search: LIKE with Sequential Scan”
WHERE ($1 = '' OR LOWER(p.name) LIKE '%' || LOWER($1) || '%'
OR LOWER(p.description) LIKE '%' || LOWER($1) || '%')
AND ($2::int IS NULL OR p.category_id = $2)

The LIKE '%term%' pattern is the key bottleneck. The leading wildcard prevents PostgreSQL from using a B-tree index. The database must scan every row and check whether the search term appears anywhere in the name or description. For 500 products this is fast. For 5 million products, it would be devastating without caching.

The LOWER() calls add additional overhead since each string must be lowercased before comparison. A production system would use PostgreSQL’s full-text search (tsvector/tsquery) or a dedicated search engine, but the point here is to demonstrate a query that benefits from caching.

WITH ranked AS (
SELECT
p.id, p.name, p.price,
c.name AS category,
COUNT(r.id) AS review_count,
ROUND(AVG(r.rating)::numeric, 2) AS avg_rating,
RANK() OVER (
PARTITION BY c.name
ORDER BY AVG(r.rating) DESC
) AS rank_in_category
FROM products p
JOIN categories c ON c.id = p.category_id
JOIN reviews r ON r.product_id = p.id
GROUP BY p.id, p.name, p.price, c.name
HAVING COUNT(r.id) >= 5
)
SELECT * FROM ranked WHERE rank_in_category <= 3
ORDER BY avg_rating DESC, review_count DESC

This is the most expensive query. It combines three costly operations:

  1. CTE (Common Table Expression): The WITH ranked AS (...) materializes the inner query before the outer SELECT runs against it.
  2. Window function: RANK() OVER (PARTITION BY c.name ORDER BY ...) requires PostgreSQL to sort the entire result set within each category partition.
  3. HAVING clause: The HAVING COUNT(r.id) >= 5 filter applies after the GROUP BY, meaning the aggregation work is done even for rows that will be discarded.

Each query includes an asyncio.sleep() to simulate realistic production conditions:

await asyncio.sleep(0.3) # Category stats: 300ms
await asyncio.sleep(0.25) # Product search: 250ms
await asyncio.sleep(0.35) # Top products: 350ms

These represent the overhead you would encounter in production: network round-trips to a remote database, connection wait times under load, and cold buffer cache reads. The actual query execution on a local PostgreSQL with 500 products is fast. The simulated latency makes the demo realistic.


Redis stores plain JSON strings. The serialization function handles Python Decimal types that asyncpg returns for NUMERIC columns:

def _serialize(data):
"""JSON-serialize, handling Decimal types from asyncpg."""
return json.dumps(data, default=str)

The default=str argument tells json.dumps to convert any type it does not know how to serialize (like Decimal("5.00")) into its string representation ("5.00"). This is a practical tradeoff: you lose type fidelity (the Decimal becomes a string), but you gain simplicity and human-readable cache contents.

For the categories endpoint, the stored value in Redis looks like:

[
{
"id": 1,
"name": "Electronics",
"product_count": 50,
"review_count": 500,
"avg_rating": "5.00",
"avg_price": "266.40",
"min_price": "9.67",
"max_price": "499.41",
"total_stock": 235520
},
...
]

On cache hit, json.loads() deserializes this back into Python dicts and lists. The numeric fields that were originally Decimal are now strings, but for an API response, this is fine. The JSON is returned directly to the client.

Why JSON and not a binary format?

JSON is human-readable, widely supported, and simple to debug. You can connect to Redis with redis-cli and GET categories:stats to see exactly what is cached. Binary formats like MessagePack or Protocol Buffers are more compact and faster to serialize, but for this data volume the difference is negligible. JSON is the right default until profiling proves otherwise.


Real numbers from this demo running on Minikube with the Docker driver:

QueryDatabase (no cache)Redis (cached)Speedup
Category stats~308ms~0.4ms~770x
Product search~260ms~0.3ms~860x
Top products~360ms~0.4ms~900x

The database times include both the actual SQL execution and the simulated latency. The Redis times are purely the cost of a GET command: a hash table lookup in memory.

The gap is not magic. It comes down to what each path does:

Database path:

  1. Acquire a connection from the pool (nanoseconds if available, milliseconds if waiting)
  2. Send the SQL query over a TCP connection to PostgreSQL
  3. PostgreSQL parses, plans, and executes the query
  4. PostgreSQL scans tables, joins rows, computes aggregations, sorts results
  5. PostgreSQL serializes the result and sends it back over TCP
  6. The app deserializes the rows into Python dicts
  7. The app serializes the dicts to JSON for the HTTP response

Redis path:

  1. Send a GET command over a TCP connection to Redis
  2. Redis looks up the key in a hash table (O(1) operation)
  3. Redis returns the stored string
  4. The app deserializes the JSON string (already in the right format)

Steps 3-6 of the database path are completely eliminated. The Redis path is a single hash table lookup returning pre-serialized data.

On a fresh start with an empty cache:

  • First request for each endpoint: Cache miss. Full database query runs. Result is stored in Redis. Response time is ~300ms.
  • Subsequent requests within the TTL: Cache hit. Response time is ~0.4ms.
  • After TTL expires (60 seconds): Cache miss again. The cycle repeats.

In steady state with regular traffic, the hit rate for this demo approaches 95%+ because the number of unique cache keys is small (a handful of endpoints with limited parameter variations) and the TTL is generous relative to the request frequency.

The /api/stats endpoint returns live cache metrics:

{
"cache_hits": 47,
"cache_misses": 6,
"hit_rate": 88.7,
"redis_memory_used": "1.02M",
"redis_keys": 3
}
  • Hit rate is the single most important metric for a cache. Below 80%, you should investigate whether your keys are too granular, your TTL is too short, or your traffic patterns do not benefit from caching.
  • Redis memory shows actual memory consumption. For this demo, it stays well under 2MB.
  • Redis keys shows how many entries are cached. With three endpoints and limited parameter combinations, this stays in the single digits.

Redis caching is a strong fit when:

  • Read-heavy workloads. Your app reads the same data far more often than it writes. Product catalogs, configuration data, user profiles, search results, dashboard metrics.
  • Expensive queries. Aggregations, multi-table JOINs, full-text search, or any query that consistently takes more than 50ms.
  • Tolerance for slightly stale data. If showing data that is up to 60 seconds old is acceptable, caching works well. Most content-oriented applications fall into this category.
  • High concurrency. 1,000 users requesting the same product page should not generate 1,000 identical database queries. One query, one cache entry, 999 cache hits.
  • Predictable access patterns. A small set of “hot” data is requested frequently. The Pareto principle applies: 20% of your data serves 80% of your requests. Caching that 20% has outsized impact.

Redis caching is the wrong tool when:

  • Data must be real-time. Financial transactions, inventory counts during flash sales, live bidding systems, or anything where stale data causes business problems. A 60-second TTL means a user could see a price that changed a minute ago. For a product catalog, that is fine. For a stock trading platform, it is not.
  • Write-heavy workloads. If the underlying data changes on every request, the cache invalidates constantly and provides no benefit. You pay the cost of writing to both the database and Redis without getting cache hits.
  • Every request is unique. If each request produces unique results (highly personalized feeds with no overlap between users, one-time report generation), there is nothing to cache. Every request is a miss.
  • Data is too large to fit in memory. Redis stores everything in RAM. If your working set is 100GB, you need 100GB of RAM for Redis. At some point, the cost of memory exceeds the cost of just scaling your database.
  • You need strong consistency. Cache-aside is eventually consistent by design. Between a database write and the cache TTL expiry, the cache serves stale data. If your application requires read-after-write consistency, you need a write-through or write-behind pattern, which adds significant complexity.
  • The queries are already fast. If your queries return in under 5ms, adding a cache layer introduces complexity (another service to monitor, another failure mode, cache invalidation logic) for marginal performance gain. Measure first, cache second.