Why I Removed CPU Limits¶
I was seeing occasional latency spikes in Grafana queries and slow ArgoCD syncs. Nothing was at high CPU usage. The nodes had plenty of capacity. After digging into it, the problem was CPU throttling caused by CPU limits.
How CPU scheduling works in Kubernetes¶
Two settings control CPU allocation:
CPU requests tell the scheduler how much CPU a pod needs guaranteed. The scheduler uses this to decide which node to place the pod on. At runtime, the kernel (via CFS) guarantees this amount. If you request 100m, you always get at least 100m.
CPU limits cap how much CPU a pod can use, even if the node has idle cores. The kernel enforces this by throttling the process. If a pod hits its limit, it gets paused until the next scheduling period. The pod isn't using too much. It's being artificially slowed down.
The key insight: requests guarantee the minimum. Limits cap the maximum. When you set a limit, you're telling Kubernetes "don't let this pod use more than X, even if nobody else needs the CPU right now."
The problem with limits¶
CPU is a compressible resource. If a node runs low on CPU, the kernel naturally distributes it proportionally based on requests. Pods with higher requests get more CPU. Nobody gets killed. It just slows down gracefully.
This is different from memory, which is incompressible. If a pod uses more memory than its limit, it gets OOM-killed. There's no graceful degradation. That's why memory limits are essential.
With CPU limits, the throttling happens even when the node is 20% utilized. The pod doesn't know or care that there's spare capacity. It hits the limit and gets paused. This shows up as increased latency, slower response times, and longer processing times for no good reason.
What I changed¶
I removed CPU limits from all workloads in the homelab. Every pod now has CPU requests (to guarantee scheduling and minimum allocation) but no CPU limit. Memory limits stay.
Before:
After:
This applies to Prometheus, Thanos, Loki, Grafana, ArgoCD, Fluent Bit, and everything else running in the cluster.
The evidence¶
The metric container_cpu_cfs_throttled_periods_total tells the story. This counter tracks how many times the CFS scheduler paused a container because it hit its CPU limit.
The query:
Before removing limits, ArgoCD pods were showing values of 1.0 to 1.5, meaning they were being throttled multiple times per scheduling period. The nodes had plenty of idle CPU. The pods just weren't allowed to use it.
After removing limits, the throttling dropped to near zero across the board. Same workloads, same traffic, no more artificial pauses.
If you suspect CPU throttling in your cluster, this is the first metric to check.

The result¶
The latency spikes in Grafana disappeared. ArgoCD syncs got faster. Prometheus scrapes became more consistent. The nodes still have plenty of headroom because the total CPU requests are well below the available capacity.
When CPU limits make sense¶
There are cases where you might still want them:
- Multi-tenant clusters where you need strict isolation between teams
- Noisy neighbor problems where one pod could starve others
- Cost allocation where you need to enforce budgets per namespace
In a homelab or a cluster where you control all the workloads, they're unnecessary overhead.
Sources¶
The community consensus on this is pretty clear:
- Google Cloud: Best practices for running cost-optimized Kubernetes applications on GKE recommends requests without limits for burstable CPU workloads.
- Robusta.dev: For the Love of God, Stop Using CPU Limits on Kubernetes is the reference article from the community explaining why CPU limits cause unnecessary throttling.
- Omio Engineering: CPU limits and aggressive throttling in Kubernetes documents a real case where removing CPU limits solved latency problems.
- StormForge: The Great Kubernetes Limits Debate summarizes the community consensus: skip CPU limits, keep memory limits.
- PerfectScale: Kubernetes CPU Limits Best Practices explains that requests, not limits, are what guarantee minimum CPU.