Logging with Loki and Fluent Bit¶
Metrics tell you something is wrong. Logs tell you why. After getting Prometheus and Thanos running, the next step was a logging pipeline.
The pipeline¶
Fluent Bit runs on every node as a DaemonSet. It reads container log files, enriches them with Kubernetes metadata (pod name, namespace, labels), and sends them to Loki over HTTP.
Loki receives the logs, indexes them by labels, and stores the actual log content as compressed chunks in MinIO. It's designed to be cheap. Instead of indexing every word like Elasticsearch, Loki only indexes labels. The log content is stored as-is and filtered at query time.
Grafana queries Loki using LogQL. You can search by labels, filter by text, and combine logs with metrics on the same dashboard.
Deployment mode¶
Loki supports three modes: single binary, simple scalable, and microservices. For a homelab, single binary is the right choice. One pod runs all roles (distributor, ingester, querier, compactor). Simple to operate, easy to debug.
The upgrade path is clear: if the single pod starts hitting CPU or memory limits consistently, move to simple scalable. For now, it handles the homelab load without issues.
Storage and retention¶
Loki stores chunks and indices in a MinIO bucket called loki-logs. The same MinIO instance that Thanos uses, different bucket.
Retention is 15 days. The compactor runs every 10 minutes and deletes expired data automatically. Daily growth is around 200 MB, so total storage stays under 3 GB.
Opt-in with labels¶
Not every pod needs its logs in Loki. I built an opt-in system using a custom label: o11y.ruiz.sh/logs: "true".
Fluent Bit reads all container logs (it has to, it's a DaemonSet), but a Lua filter checks each record for the label. If the pod doesn't have it, the log is dropped before reaching Loki.
This keeps storage lean. Infrastructure addons all have the label. New apps need to add it explicitly.
Alerting from logs¶
Loki has a built-in Ruler that can evaluate alert rules based on log queries. If error logs spike above a threshold, Loki fires an alert to Alertmanager, which routes it to Slack.
It can also create recording rules that turn log patterns into Prometheus metrics. For example, counting error logs per service per minute and writing that as a metric to Prometheus via remote_write. Then you can query it with PromQL and build dashboards in Grafana.
Loki is not Elasticsearch, and that's the point. It's simpler, cheaper, and good enough for most use cases. The tradeoff is that full-text search is slower since it scans chunks at query time instead of using an inverted index.
The opt-in label system was one of the better decisions. Without it, every pod in the cluster would dump logs into Loki, and most of it would be noise.