Redis Caching: Deep Dive
This document explains the design decisions, implementation patterns, and infrastructure details behind the Redis caching demo. It covers why things work the way they do, what tradeoffs were made, and what you should understand before applying these patterns to your own applications.
Table of Contents
Section titled “Table of Contents”- Why Redis Caching Matters
- Architecture Overview
- The Cache-Aside Pattern
- Cache Key Design
- TTL Strategy
- Connection Pooling
- Redis Configuration: Memory and Eviction
- Kubernetes Internals
- The Database Queries
- What Redis Actually Stores
- Performance Characteristics
- When to Use Redis Caching
- When NOT to Use Redis Caching
Why Redis Caching Matters
Section titled “Why Redis Caching Matters”Every web application eventually hits a wall where the database becomes the bottleneck. The symptoms are predictable: response times creep up, database CPU spikes during peak traffic, and users start noticing.
The root cause is usually some combination of:
- Expensive queries. JOINs across multiple tables, aggregations over large datasets, window functions, and full-text searches all consume significant CPU and I/O.
- Repeated work. A product catalog page viewed by 10,000 users in a minute generates 10,000 identical database queries returning identical results.
- Network overhead. In production, the database often lives on a separate server (or a managed service in another availability zone). Every query pays the cost of a network round-trip.
- Connection contention. Database connection pools are finite. Under high concurrency, requests queue up waiting for an available connection.
Redis solves this by sitting between your application and the database as an in-memory key-value store. The first request runs the query and stores the result in Redis. Every subsequent request for the same data returns from memory, skipping the database entirely. The cached data expires after a configurable TTL, keeping results reasonably fresh.
The result: response times drop from hundreds of milliseconds to sub-millisecond. The database handles a fraction of the traffic. Connection pools stay healthy. This demo makes that difference visible and measurable.
Architecture Overview
Section titled “Architecture Overview”The entire demo runs inside a single Minikube cluster. Three pods, three services, one namespace.
Minikube Cluster redis-demo namespace +---------------------------------------------------------+ | | | Browser | | | | | | NodePort :31xxx | | v | | +-------------------------+ | | | FastAPI App | | | | cache-demo-app | | | | Port 8000 | | | | | | | | 1. Check Redis | | | | 2. If miss, query DB | | | | 3. Store result in | | | | Redis with TTL | | | +------+----------+------+ | | | | | | DNS: "redis" DNS: "postgres" | | | | | | +----v----+ +--v----------+ | | | Redis | | PostgreSQL | | | | 7-alpine| | 16-alpine | | | | Port | | Port 5432 | | | | 6379 | | | | | | | | 10 categories | | | 64MB | | 500 products | | | LRU | | 5,000 reviews | | +---------+ +-------------+ | | | +---------------------------------------------------------+The request flow:
- A browser request arrives at the Minikube node’s IP on a NodePort.
- Kubernetes routes it to the FastAPI app pod on port 8000.
- The app checks Redis for a cached response using a cache key derived from the endpoint and query parameters.
- Cache hit: Redis returns the stored JSON string. The app deserializes it and responds immediately. Total time: under 1ms.
- Cache miss: The app runs the SQL query against PostgreSQL, serializes the result as JSON, stores it in Redis with a TTL, and returns the data. Total time: 250-360ms.
- Subsequent identical requests hit the cache until the TTL expires, at which point the cycle repeats.
The three services communicate over Kubernetes internal DNS. The app refers to postgres and redis by their Service names. CoreDNS resolves these to ClusterIP addresses, which route to the backing pods.
The Cache-Aside Pattern
Section titled “The Cache-Aside Pattern”This demo uses cache-aside (also called lazy-loading), the most common caching strategy for read-heavy workloads.
The pattern works like this:
Request arrives | vCheck Redis for cache key | +-- Found (HIT) --> return cached data | +-- Not found (MISS) | v Run database query | v Store result in Redis with TTL | v Return result to callerHere is the actual implementation from main.py for the categories endpoint:
@app.get("/api/categories")async def get_categories(use_cache: bool = Query(True)): cache_key = "categories:stats" start = time.perf_counter()
if use_cache: cached = await redis_client.get(cache_key) # Check Redis if cached: # HIT duration = (time.perf_counter() - start) * 1000 stats["cache_hits"] += 1 return { "source": "cache", "duration_ms": round(duration, 2), "data": json.loads(cached) }
# MISS: run the database query stats["cache_misses"] += 1 data = await _heavy_category_stats() duration = (time.perf_counter() - start) * 1000
if use_cache: # Store in Redis with TTL await redis_client.setex(cache_key, CACHE_TTL, _serialize(data))
return { "source": "database", "duration_ms": round(duration, 2), "data": data }Every endpoint in the demo follows this exact structure. The use_cache parameter lets you bypass Redis entirely, which is what makes the side-by-side comparison possible.
Why cache-aside and not write-through?
In a write-through pattern, the cache is updated every time data is written to the database. This guarantees the cache is always fresh but adds latency to every write operation and requires tighter coupling between your write path and your cache layer.
Cache-aside is simpler. The application only writes to the cache on read misses. If the cache goes down, the application gracefully falls back to the database. No writes are affected. For read-heavy workloads (which most web applications are), this tradeoff is overwhelmingly favorable.
The cost is staleness. Cached data can be up to TTL seconds old. For a product catalog, that is perfectly acceptable. For a bank balance, it is not.
Cache Key Design
Section titled “Cache Key Design”Each endpoint constructs a cache key that uniquely identifies the data being requested. The demo uses colon-separated naming, which is a Redis community convention.
| Endpoint | Cache Key | Why |
|---|---|---|
| Category stats | categories:stats | Single result set with no parameters. One key is enough. |
| Product search | products:search:{q}:{category_id} | Different search terms and category filters produce different results. Each combination needs its own key. |
| Top products | products:top | Single result set with no parameters. One key is enough. |
The product search key is built dynamically:
cache_key = f"products:search:{q}:{category_id}"This means products:search:Widget:None and products:search:Pro:1 are separate entries in Redis. Searching for “Widget” across all categories returns different data than searching for “Pro” in Electronics, so they must be cached independently.
The colon convention matters. It is not just cosmetic. Redis tools and GUIs (like RedisInsight) use colons to create a tree-like hierarchy when displaying keys. A key like products:search:Widget:None shows up under products > search > Widget in the browser, making it easy to inspect what is cached.
Key cardinality. In this demo, the number of possible keys is bounded: 10 categories, a limited set of realistic search terms, and two fixed endpoints. In a production system with millions of possible search queries, key cardinality becomes a real concern. You would need to think about whether caching every unique query is worth the memory, or whether you should only cache popular queries. The LRU eviction policy (covered below) provides a natural safety net here.
TTL Strategy
Section titled “TTL Strategy”Every cached entry expires after a configurable number of seconds. The demo defaults to 60 seconds:
CACHE_TTL = int(os.getenv("CACHE_TTL", "60")) # secondsThe TTL is applied atomically using Redis’s SETEX command:
await redis_client.setex(cache_key, CACHE_TTL, _serialize(data))SETEX combines SET and EXPIRE into a single atomic operation. There is no window where a key exists without an expiration.
Why 60 seconds? It is a balance between two concerns:
- Too short (e.g., 5 seconds): the cache provides less benefit because most requests still hit the database. The hit rate drops.
- Too long (e.g., 30 minutes): data becomes stale. If a product’s price changes, users see the old price for up to 30 minutes.
For a product catalog where data changes infrequently, 60 seconds is a reasonable starting point. In production, you would tune this per endpoint based on how often the underlying data changes and how much staleness your users can tolerate.
What happens at expiry? Redis silently deletes the key. The next request for that data is a cache miss, triggers a fresh database query, and repopulates the cache. This is seamless. No explicit invalidation logic is needed.
Connection Pooling
Section titled “Connection Pooling”The app maintains persistent connection pools to both PostgreSQL and Redis. This is critical for performance.
PostgreSQL: asyncpg Pool
Section titled “PostgreSQL: asyncpg Pool”db_pool = await asyncpg.create_pool(PG_DSN, min_size=2, max_size=10)min_size=2: Two connections are opened at startup and kept alive. This eliminates cold-start latency. The first two requests do not need to wait for a TCP handshake, TLS negotiation, and PostgreSQL authentication.max_size=10: Under load, the pool can grow to 10 connections. This caps the number of simultaneous database connections, preventing the app from overwhelming PostgreSQL. Requests beyond 10 concurrent database queries will wait for a connection to become available.
Without pooling, every request would: open a TCP connection, authenticate with PostgreSQL, run the query, and close the connection. That overhead adds 5-20ms per request in a local environment and significantly more across a network.
The pool is created during the application’s lifespan startup event and closed on shutdown:
@asynccontextmanagerasync def lifespan(app: FastAPI): global db_pool, redis_client db_pool = await asyncpg.create_pool(PG_DSN, min_size=2, max_size=10) redis_client = redis.from_url(REDIS_URL, decode_responses=True) yield await db_pool.close() await redis_client.close()Redis: Built-in Connection Pool
Section titled “Redis: Built-in Connection Pool”The redis.asyncio client manages its own internal connection pool automatically:
redis_client = redis.from_url(REDIS_URL, decode_responses=True)The decode_responses=True flag tells the client to decode byte responses from Redis into Python strings. Without this, redis_client.get() would return b'{"id": 1, ...}' (bytes), and you would need to decode manually before passing to json.loads(). With it, you get '{"id": 1, ...}' (string) directly.
Both connections use Kubernetes service DNS names (postgres and redis), which resolve to ClusterIP addresses within the namespace.
Redis Configuration: Memory and Eviction
Section titled “Redis Configuration: Memory and Eviction”The Redis deployment is configured with two important flags:
command: ["redis-server", "--maxmemory", "64mb", "--maxmemory-policy", "allkeys-lru"]maxmemory: 64mb
Section titled “maxmemory: 64mb”This caps Redis at 64MB of data. Once it hits this limit, Redis must evict existing keys before storing new ones. Without a maxmemory setting, Redis would grow until it consumed all available memory and got OOM-killed by Kubernetes.
For this demo, 64MB is generous. The entire dataset (10 categories, 500 products, 5,000 reviews) serializes to well under 1MB. In production, you would size this based on your working set: the subset of data that is actively being requested.
maxmemory-policy: allkeys-lru
Section titled “maxmemory-policy: allkeys-lru”LRU stands for Least Recently Used. When Redis needs to free memory, it evicts the key that has not been accessed for the longest time.
The allkeys-lru variant applies this eviction to all keys, not just those with an explicit TTL set. This is the right choice for a pure cache because:
- Keys that are frequently accessed stay cached.
- Keys that are rarely accessed get evicted naturally.
- No manual intervention is needed to manage memory.
Other eviction policies exist (volatile-lru, allkeys-random, noeviction), but allkeys-lru is the standard choice for caching use cases.
Kubernetes Resource Limits
Section titled “Kubernetes Resource Limits”The Redis container has its own resource constraints:
resources: requests: memory: "64Mi" cpu: "50m" limits: memory: "128Mi" cpu: "200m"The Kubernetes memory limit (128Mi) is intentionally higher than the Redis maxmemory (64MB). Redis needs additional memory beyond the data store for internal bookkeeping: connection buffers, replication buffers, the key metadata overhead, and the process itself. Setting the Kubernetes limit too close to maxmemory would cause OOM kills during memory spikes.
Kubernetes Internals
Section titled “Kubernetes Internals”DNS Resolution and Service Discovery
Section titled “DNS Resolution and Service Discovery”The three pods find each other through Kubernetes DNS. Each Service object creates a DNS record in CoreDNS.
App pod connects to "postgres" --> CoreDNS resolves to: postgres.redis-demo.svc.cluster.local --> Returns the ClusterIP of the postgres Service --> kube-proxy routes traffic to the postgres pod
App pod connects to "redis" --> CoreDNS resolves to: redis.redis-demo.svc.cluster.local --> Returns the ClusterIP of the redis Service --> kube-proxy routes traffic to the redis podWithin the same namespace, the short name works. The app just uses postgres and redis in its connection strings:
env:- name: PG_DSN value: "postgresql://demo:demo@postgres:5432/demodb"- name: REDIS_URL value: "redis://redis:6379/0"The full DNS names (postgres.redis-demo.svc.cluster.local) are not needed because all three resources share the redis-demo namespace.
Services
Section titled “Services”| Service | Type | Port | Purpose |
|---|---|---|---|
postgres | ClusterIP | 5432 | Internal-only access to PostgreSQL |
redis | ClusterIP | 6379 | Internal-only access to Redis |
cache-demo-app | NodePort | 8000 | External access from the browser |
ClusterIP services are only reachable from within the cluster. The app’s NodePort service exposes a random high port on the Minikube node’s IP, which is how your browser reaches the dashboard.
Seed Data via ConfigMap
Section titled “Seed Data via ConfigMap”PostgreSQL is seeded automatically using a ConfigMap mounted as an init script:
volumeMounts:- name: init-sql mountPath: /docker-entrypoint-initdb.dvolumes:- name: init-sql configMap: name: postgres-initThe postgres:16-alpine image runs any .sql files found in /docker-entrypoint-initdb.d/ on first startup. The ConfigMap contains the full schema and data generation: 10 categories, 500 products (generated with generate_series), and 5,000 reviews. No manual setup steps are needed.
Resource Budgets
Section titled “Resource Budgets”| Component | Memory Request | Memory Limit | CPU Request | CPU Limit |
|---|---|---|---|---|
| PostgreSQL | 128Mi | 256Mi | 100m | 500m |
| Redis | 64Mi | 128Mi | 50m | 200m |
| FastAPI App | 128Mi | 256Mi | 100m | 500m |
These are deliberately conservative. The demo is designed to run on a Minikube cluster with limited resources. In production, you would increase these based on load testing and monitoring.
The Database Queries
Section titled “The Database Queries”Three queries of increasing complexity make the case for caching. Each one deliberately exercises expensive SQL operations.
1. Category Stats: Three-Table JOIN with Aggregation
Section titled “1. Category Stats: Three-Table JOIN with Aggregation”SELECT c.id, c.name, COUNT(DISTINCT p.id) AS product_count, COUNT(r.id) AS review_count, ROUND(AVG(r.rating)::numeric, 2) AS avg_rating, ROUND(AVG(p.price)::numeric, 2) AS avg_price, ROUND(MIN(p.price)::numeric, 2) AS min_price, ROUND(MAX(p.price)::numeric, 2) AS max_price, SUM(p.stock) AS total_stockFROM categories cLEFT JOIN products p ON p.category_id = c.idLEFT JOIN reviews r ON r.product_id = p.idGROUP BY c.id, c.nameORDER BY avg_rating DESC NULLS LASTThis query touches all three tables and scans every row. The LEFT JOIN from categories to products to reviews produces a result set that multiplies out: 10 categories x 50 products each x 10 reviews each = 5,000 intermediate rows. The COUNT(DISTINCT p.id) is particularly costly because PostgreSQL must deduplicate before counting.
The aggregation functions (AVG, MIN, MAX, SUM) compute across the joined result set. Then GROUP BY collapses everything into 10 rows. Sorting by avg_rating DESC NULLS LAST requires a final sort pass.
2. Product Search: LIKE with Sequential Scan
Section titled “2. Product Search: LIKE with Sequential Scan”WHERE ($1 = '' OR LOWER(p.name) LIKE '%' || LOWER($1) || '%' OR LOWER(p.description) LIKE '%' || LOWER($1) || '%') AND ($2::int IS NULL OR p.category_id = $2)The LIKE '%term%' pattern is the key bottleneck. The leading wildcard prevents PostgreSQL from using a B-tree index. The database must scan every row and check whether the search term appears anywhere in the name or description. For 500 products this is fast. For 5 million products, it would be devastating without caching.
The LOWER() calls add additional overhead since each string must be lowercased before comparison. A production system would use PostgreSQL’s full-text search (tsvector/tsquery) or a dedicated search engine, but the point here is to demonstrate a query that benefits from caching.
3. Top Products: CTE with Window Function
Section titled “3. Top Products: CTE with Window Function”WITH ranked AS ( SELECT p.id, p.name, p.price, c.name AS category, COUNT(r.id) AS review_count, ROUND(AVG(r.rating)::numeric, 2) AS avg_rating, RANK() OVER ( PARTITION BY c.name ORDER BY AVG(r.rating) DESC ) AS rank_in_category FROM products p JOIN categories c ON c.id = p.category_id JOIN reviews r ON r.product_id = p.id GROUP BY p.id, p.name, p.price, c.name HAVING COUNT(r.id) >= 5)SELECT * FROM ranked WHERE rank_in_category <= 3ORDER BY avg_rating DESC, review_count DESCThis is the most expensive query. It combines three costly operations:
- CTE (Common Table Expression): The
WITH ranked AS (...)materializes the inner query before the outerSELECTruns against it. - Window function:
RANK() OVER (PARTITION BY c.name ORDER BY ...)requires PostgreSQL to sort the entire result set within each category partition. - HAVING clause: The
HAVING COUNT(r.id) >= 5filter applies after theGROUP BY, meaning the aggregation work is done even for rows that will be discarded.
Simulated Latency
Section titled “Simulated Latency”Each query includes an asyncio.sleep() to simulate realistic production conditions:
await asyncio.sleep(0.3) # Category stats: 300msawait asyncio.sleep(0.25) # Product search: 250msawait asyncio.sleep(0.35) # Top products: 350msThese represent the overhead you would encounter in production: network round-trips to a remote database, connection wait times under load, and cold buffer cache reads. The actual query execution on a local PostgreSQL with 500 products is fast. The simulated latency makes the demo realistic.
What Redis Actually Stores
Section titled “What Redis Actually Stores”Redis stores plain JSON strings. The serialization function handles Python Decimal types that asyncpg returns for NUMERIC columns:
def _serialize(data): """JSON-serialize, handling Decimal types from asyncpg.""" return json.dumps(data, default=str)The default=str argument tells json.dumps to convert any type it does not know how to serialize (like Decimal("5.00")) into its string representation ("5.00"). This is a practical tradeoff: you lose type fidelity (the Decimal becomes a string), but you gain simplicity and human-readable cache contents.
For the categories endpoint, the stored value in Redis looks like:
[ { "id": 1, "name": "Electronics", "product_count": 50, "review_count": 500, "avg_rating": "5.00", "avg_price": "266.40", "min_price": "9.67", "max_price": "499.41", "total_stock": 235520 }, ...]On cache hit, json.loads() deserializes this back into Python dicts and lists. The numeric fields that were originally Decimal are now strings, but for an API response, this is fine. The JSON is returned directly to the client.
Why JSON and not a binary format?
JSON is human-readable, widely supported, and simple to debug. You can connect to Redis with redis-cli and GET categories:stats to see exactly what is cached. Binary formats like MessagePack or Protocol Buffers are more compact and faster to serialize, but for this data volume the difference is negligible. JSON is the right default until profiling proves otherwise.
Performance Characteristics
Section titled “Performance Characteristics”Observed Response Times
Section titled “Observed Response Times”Real numbers from this demo running on Minikube with the Docker driver:
| Query | Database (no cache) | Redis (cached) | Speedup |
|---|---|---|---|
| Category stats | ~308ms | ~0.4ms | ~770x |
| Product search | ~260ms | ~0.3ms | ~860x |
| Top products | ~360ms | ~0.4ms | ~900x |
The database times include both the actual SQL execution and the simulated latency. The Redis times are purely the cost of a GET command: a hash table lookup in memory.
Why the Difference Is So Large
Section titled “Why the Difference Is So Large”The gap is not magic. It comes down to what each path does:
Database path:
- Acquire a connection from the pool (nanoseconds if available, milliseconds if waiting)
- Send the SQL query over a TCP connection to PostgreSQL
- PostgreSQL parses, plans, and executes the query
- PostgreSQL scans tables, joins rows, computes aggregations, sorts results
- PostgreSQL serializes the result and sends it back over TCP
- The app deserializes the rows into Python dicts
- The app serializes the dicts to JSON for the HTTP response
Redis path:
- Send a
GETcommand over a TCP connection to Redis - Redis looks up the key in a hash table (O(1) operation)
- Redis returns the stored string
- The app deserializes the JSON string (already in the right format)
Steps 3-6 of the database path are completely eliminated. The Redis path is a single hash table lookup returning pre-serialized data.
Cache Hit/Miss Patterns
Section titled “Cache Hit/Miss Patterns”On a fresh start with an empty cache:
- First request for each endpoint: Cache miss. Full database query runs. Result is stored in Redis. Response time is ~300ms.
- Subsequent requests within the TTL: Cache hit. Response time is ~0.4ms.
- After TTL expires (60 seconds): Cache miss again. The cycle repeats.
In steady state with regular traffic, the hit rate for this demo approaches 95%+ because the number of unique cache keys is small (a handful of endpoints with limited parameter variations) and the TTL is generous relative to the request frequency.
What the Stats Endpoint Tells You
Section titled “What the Stats Endpoint Tells You”The /api/stats endpoint returns live cache metrics:
{ "cache_hits": 47, "cache_misses": 6, "hit_rate": 88.7, "redis_memory_used": "1.02M", "redis_keys": 3}- Hit rate is the single most important metric for a cache. Below 80%, you should investigate whether your keys are too granular, your TTL is too short, or your traffic patterns do not benefit from caching.
- Redis memory shows actual memory consumption. For this demo, it stays well under 2MB.
- Redis keys shows how many entries are cached. With three endpoints and limited parameter combinations, this stays in the single digits.
When to Use Redis Caching
Section titled “When to Use Redis Caching”Redis caching is a strong fit when:
- Read-heavy workloads. Your app reads the same data far more often than it writes. Product catalogs, configuration data, user profiles, search results, dashboard metrics.
- Expensive queries. Aggregations, multi-table JOINs, full-text search, or any query that consistently takes more than 50ms.
- Tolerance for slightly stale data. If showing data that is up to 60 seconds old is acceptable, caching works well. Most content-oriented applications fall into this category.
- High concurrency. 1,000 users requesting the same product page should not generate 1,000 identical database queries. One query, one cache entry, 999 cache hits.
- Predictable access patterns. A small set of “hot” data is requested frequently. The Pareto principle applies: 20% of your data serves 80% of your requests. Caching that 20% has outsized impact.
When NOT to Use Redis Caching
Section titled “When NOT to Use Redis Caching”Redis caching is the wrong tool when:
- Data must be real-time. Financial transactions, inventory counts during flash sales, live bidding systems, or anything where stale data causes business problems. A 60-second TTL means a user could see a price that changed a minute ago. For a product catalog, that is fine. For a stock trading platform, it is not.
- Write-heavy workloads. If the underlying data changes on every request, the cache invalidates constantly and provides no benefit. You pay the cost of writing to both the database and Redis without getting cache hits.
- Every request is unique. If each request produces unique results (highly personalized feeds with no overlap between users, one-time report generation), there is nothing to cache. Every request is a miss.
- Data is too large to fit in memory. Redis stores everything in RAM. If your working set is 100GB, you need 100GB of RAM for Redis. At some point, the cost of memory exceeds the cost of just scaling your database.
- You need strong consistency. Cache-aside is eventually consistent by design. Between a database write and the cache TTL expiry, the cache serves stale data. If your application requires read-after-write consistency, you need a write-through or write-behind pattern, which adds significant complexity.
- The queries are already fast. If your queries return in under 5ms, adding a cache layer introduces complexity (another service to monitor, another failure mode, cache invalidation logic) for marginal performance gain. Measure first, cache second.