Long-term Metrics with Thanos

homelab

2026-03-12

Long-term Metrics with Thanos¶

Prometheus is great for recent data, but it's not designed for long-term storage. In my homelab, Prometheus keeps 6 days of data locally. After that, it's gone. Thanos solves this by uploading Prometheus blocks to object storage and making them queryable.

How it works¶

Prometheus writes metrics to disk in 2-hour blocks. A Thanos Sidecar runs in the same pod, watches for completed blocks, and uploads them to MinIO.

Prometheus ──> WAL ──> Block (2h) ──> Sidecar ──> MinIO

On the query side, Thanos Query talks to both the Sidecar (recent data) and the Store Gateway (historical data from MinIO). From Grafana's perspective, it's a single Prometheus-compatible datasource with months of data.

The components¶

Component	What it does
Sidecar	Uploads blocks from Prometheus to MinIO
Store Gateway	Reads historical blocks from MinIO
Query	Unified query interface across all sources
Query Frontend	Caching layer for queries
Compactor	Merges small blocks and downsamples old data
Bucket Web	UI to inspect what's in the bucket

Retention and downsampling¶

Not all data needs full resolution forever. The Compactor handles this:

Resolution	Retention	Use case
Raw	15 days	Recent detailed data
5 minute	30 days	Medium-term trends
1 hour	60 days	Long-term trends

Old data gets downsampled automatically. A query for "CPU usage last 2 months" doesn't need per-second granularity. 1-hour resolution is enough and uses far less storage.

Why MinIO¶

Thanos needs S3-compatible object storage to store metrics blocks. In AWS you'd use S3 directly. In a homelab, there's no S3. MinIO fills that gap. It's an open-source object storage server that implements the S3 API and runs as a pod inside the cluster.

I run MinIO with a single 30 GB PVC on Longhorn. It serves two buckets: thanos-metrics for Thanos and loki-logs for Loki. Both tools just see an S3 endpoint and don't care that it's running on the same cluster.

MinIO credentials are stored in Doppler and synced via External Secrets. Thanos and Loki each get their own Secret with the endpoint, access key, and secret key.

Storage¶

The Compactor has its own 10 GB PVC as workspace for merging blocks.

Daily growth is around 300 MB for metrics. With the retention policy above, total storage stays manageable.

Grafana integration¶

Grafana has two Prometheus-compatible datasources:

Prometheus for the last 6 days (direct)
Thanos for everything older (via Store Gateway + MinIO)

In practice, I use Thanos for almost everything. It covers both recent and historical data in a single query.

Thanos adds complexity, but the tradeoff is worth it. Being able to look at metrics from weeks ago during an investigation is something you don't appreciate until you need it.

The Compactor is the component that surprised me most. It runs quietly in the background, but without it, the bucket fills up with thousands of tiny 2-hour blocks and queries get slow. Compaction and downsampling are what make long-term storage practical.