Thanos: The Components

The previous post covers why Thanos exists and how it extends Prometheus. This one goes deeper into each component, how they're configured, and where they run.

The setup uses the Bitnami Helm chart (v17.3.1) with the official Thanos image (v0.39.2). Bitnami provides the Kubernetes manifests, the official image provides the binary.

                          Grafana
                    ┌───────────────┐
                    │Query Frontend │ cache + split
                    │  :9090 (LB)   │
                    └───────┬───────┘
                    ┌───────────────┐
                    │    Query      │ dedup + merge
                    └───┬───────┬───┘
                        │       │
            ┌───────────┘       └───────────┐
            ▼                               ▼
    ┌───────────────┐               ┌───────────────┐
    │   Sidecar     │               │Store Gateway  │
    │ (recent data) │               │(history data) │
    └───────┬───────┘               └───────┬───────┘
            │                               │
            ▼                               ▼
    ┌───────────────┐               ┌───────────────┐
    │  Prometheus   │               │     MinIO     │
    │   (PVC 8GB)   │               │   (metrics)   │
    └───────────────┘               └───────┬───────┘
            │                               ▲
            │  upload 2h blocks             │
            └──────────► Sidecar ───────────┘
                                    ┌───────┴───────┐
                                    │   Compactor   │
                                    │ compact + downsample
                                    │   (PVC 10GB)  │
                                    └───────────────┘

    ┌───────────────┐
    │  Bucket Web   │ ──reads──► MinIO
    │  :8080 (LB)   │
    └───────────────┘

Sidecar

The Sidecar runs inside the Prometheus pod. It watches the local TSDB directory and uploads completed 2-hour blocks to MinIO. It's the only component that writes to object storage.

The Sidecar also serves as a gRPC endpoint that Thanos Query can talk to for recent data that hasn't been uploaded yet. This means Query can access both the last few hours (via Sidecar) and historical data (via Store Gateway) in a single request.

The Sidecar is configured in kube-prometheus-stack, not in the Thanos chart. It gets the MinIO credentials from a Secret and uploads to the thanos-metrics bucket.


Store Gateway

The Store Gateway reads blocks from MinIO and serves them to Thanos Query. It doesn't store data itself. It has a 3 GB PVC that acts as a local cache for block indices and metadata, so it doesn't have to download them from MinIO on every query.

Runs on the large tier node. Resources: 50m/192Mi request, 250m/384Mi limit.


Query

Query is the unified interface. It talks to both the Sidecar (recent data) and the Store Gateway (historical data) and deduplicates the results. From Grafana's perspective, it's just a Prometheus-compatible endpoint.

It discovers the Sidecar automatically via DNS using the kube-prometheus-stack-thanos-discovery service in the metrics namespace.

Query has autoscaling enabled: 1 to 3 replicas based on CPU (70%) and memory (80%). In practice it stays at 1 replica, but if I run heavy dashboards it can scale up. Query timeout is set to 5 minutes.

Runs on the medium tier node. Resources: 50m/192Mi request, 250m/512Mi limit.


Query Frontend

A caching layer that sits in front of Query. It splits large time-range queries into smaller chunks, caches the results, and serves repeated queries from cache.

Also has autoscaling: 1 to 2 replicas. Exposed via LoadBalancer on 192.168.68.214:9090, which is what Grafana's Thanos datasource points to.

Runs on the medium tier node. Resources: 50m/128Mi request, 250m/256Mi limit.


Compactor

The Compactor is the background worker. It does two things:

Compaction. Merges small 2-hour blocks into larger ones. Fewer blocks means less I/O on MinIO and faster queries. Blocks go from Level 1 (original 2h) to Level 2 (merged) to Level 3+ (further merged).

Downsampling. Reduces the resolution of old data:

  • Raw data kept for 15 days
  • 5-minute resolution kept for 30 days
  • 1-hour resolution kept for 60 days

The Compactor has a 10 GB PVC as workspace. It downloads blocks from MinIO, merges them locally, uploads the result, and deletes the originals.

Runs on the large tier node. Resources: 50m/256Mi request, 250m/512Mi limit.


Bucket Web

A simple UI that shows what's in the MinIO bucket. Block count, sizes, compaction levels, time ranges. Useful for debugging when something looks off in queries.

Exposed via LoadBalancer on 192.168.68.215:8080.

Runs on the medium tier node. Minimal resources: 25m/64Mi request.


What's not enabled

Ruler is disabled. Alert rules run in Prometheus directly, no need for a separate Thanos Ruler.

Receive is disabled. The homelab uses sidecar mode (Prometheus pushes to MinIO via Sidecar). Receive mode is for when you want to push metrics to Thanos directly, which adds complexity without benefit here.


Node placement summary

Component Tier PVC
Sidecar large (with Prometheus) shares Prometheus 8 GB PVC
Store Gateway large 3 GB (index cache)
Query medium (autoscale 1-3) none (stateless)
Query Frontend medium (autoscale 1-2) none (stateless)
Compactor large 10 GB (workspace)
Bucket Web medium none (stateless)