Long-term Metrics with Thanos¶
Prometheus is great for recent data, but it's not designed for long-term storage. In my homelab, Prometheus keeps 6 days of data locally. After that, it's gone. Thanos solves this by uploading Prometheus blocks to object storage and making them queryable.
How it works¶
Prometheus writes metrics to disk in 2-hour blocks. A Thanos Sidecar runs in the same pod, watches for completed blocks, and uploads them to MinIO.
On the query side, Thanos Query talks to both the Sidecar (recent data) and the Store Gateway (historical data from MinIO). From Grafana's perspective, it's a single Prometheus-compatible datasource with months of data.
The components¶
| Component | What it does |
|---|---|
| Sidecar | Uploads blocks from Prometheus to MinIO |
| Store Gateway | Reads historical blocks from MinIO |
| Query | Unified query interface across all sources |
| Query Frontend | Caching layer for queries |
| Compactor | Merges small blocks and downsamples old data |
| Bucket Web | UI to inspect what's in the bucket |
Retention and downsampling¶
Not all data needs full resolution forever. The Compactor handles this:
| Resolution | Retention | Use case |
|---|---|---|
| Raw | 15 days | Recent detailed data |
| 5 minute | 30 days | Medium-term trends |
| 1 hour | 60 days | Long-term trends |
Old data gets downsampled automatically. A query for "CPU usage last 2 months" doesn't need per-second granularity. 1-hour resolution is enough and uses far less storage.
Why MinIO¶
Thanos needs S3-compatible object storage to store metrics blocks. In AWS you'd use S3 directly. In a homelab, there's no S3. MinIO fills that gap. It's an open-source object storage server that implements the S3 API and runs as a pod inside the cluster.
I run MinIO with a single 30 GB PVC on Longhorn. It serves two buckets: thanos-metrics for Thanos and loki-logs for Loki. Both tools just see an S3 endpoint and don't care that it's running on the same cluster.
MinIO credentials are stored in Doppler and synced via External Secrets. Thanos and Loki each get their own Secret with the endpoint, access key, and secret key.
Storage¶
The Compactor has its own 10 GB PVC as workspace for merging blocks.
Daily growth is around 300 MB for metrics. With the retention policy above, total storage stays manageable.
Grafana integration¶
Grafana has two Prometheus-compatible datasources:
- Prometheus for the last 6 days (direct)
- Thanos for everything older (via Store Gateway + MinIO)
In practice, I use Thanos for almost everything. It covers both recent and historical data in a single query.
Thanos adds complexity, but the tradeoff is worth it. Being able to look at metrics from weeks ago during an investigation is something you don't appreciate until you need it.
The Compactor is the component that surprised me most. It runs quietly in the background, but without it, the bucket fills up with thousands of tiny 2-hour blocks and queries get slow. Compaction and downsampling are what make long-term storage practical.