Fluent Bit: The Log Collector

The logging post covers the full pipeline. This one goes deeper into Fluent Bit specifically: how it collects logs, what filtering happens before anything reaches Loki, and the Lua scripts that make the opt-in system work.


Why Fluent Bit

Promtail has been deprecated by Grafana as the primary log agent. Fluent Bit is lightweight (written in C), has an active community, supports multiple outputs, and is less coupled to Loki. If I ever want to send logs somewhere else (Kafka, S3, Elasticsearch), I just add another output.


The pipeline inside Fluent Bit

Input (tail) ──> Kubernetes filter ──> Opt-in filter (Lua)
                                     drop if no label
                                   JSON parser ──> Level normalization (Lua)
                                                    Loki output

Each step:

Input. Tails /var/log/containers/*.log using the CRI parser. Excludes kube-system logs and Fluent Bit's own logs to avoid noise and feedback loops. Multiline support is enabled for stack traces.

Kubernetes filter. Enriches each log record with pod name, namespace, container name, and pod labels by querying the Kubernetes API. Annotations are excluded to reduce payload size.

Opt-in filter. A Lua script (filter_o11y.lua) checks if the pod has the o11y.ruiz.sh/logs: "true" label. If not, the record is dropped. This is what makes the opt-in system work at the collection layer.

JSON parser. If the log line is JSON (common for structured logging), it parses it and merges the fields into the record. This is how fields like level, msg, error become queryable in Loki.

Level normalization. Another Lua script (normalize_level.lua) handles the mess of different level field names across applications. It copies from severity, log_level, logLevel, LOG_LEVEL, severity_text into a single level field, then normalizes values: warning becomes warn, fatal and critical become error. If nothing is found, it defaults to unknown.

Loki output. Sends to Loki's HTTP API with the X-Scope-OrgID: homelab header (multi-tenancy). Labels sent: job, namespace, pod, container, level. The kubernetes and stream fields are removed before sending to keep the payload clean.


The opt-in Lua script

The interesting part of filter_o11y.lua:

function filter_o11y(tag, timestamp, record)
  local k = record["kubernetes"]
  if k ~= nil and type(k) == "table" then
    local labels = k["labels"]
    if labels ~= nil and type(labels) == "table" then
      if labels["o11y.ruiz.sh/logs"] == "true" then
        return 0, timestamp, record  -- keep
      end
    end
  end
  -- fallback: flattened key
  if record["o11y_ruiz_sh_logs"] == "true" then
    return 0, timestamp, record  -- keep
  end
  return -1, timestamp, record  -- drop
end

It checks two paths because the Kubernetes filter can either nest labels in a table or flatten them with underscores depending on the config. The script handles both.

Returning -1 drops the record. Returning 0 keeps it. Simple and effective.


Resource usage

Fluent Bit is a DaemonSet, so it runs on every node. Resources per pod:

  • Requests: 50m CPU, 64Mi memory
  • Limits: 128Mi memory (no CPU limit to avoid throttling)

The Mem_Buf_Limit is set to 5 MB per input. If Loki is temporarily unavailable, Fluent Bit buffers up to 5 MB in memory before applying backpressure. This prevents memory from growing unbounded.


The Refresh_Interval is 30 seconds, meaning new log files are detected every 30 seconds. For faster feedback during debugging, 10 seconds would be better, but it increases inotify pressure on nodes with many pods.

I'd also like to add a filter that drops logs matching certain patterns (health check spam, noisy debug output) before they reach Loki. Right now the opt-in label is all-or-nothing per pod.