@@ -390,9 +390,6 @@ WindowsPodSandboxConfig.
390
390
configurations will see values that may not represent actual configurations. As a
391
391
mitigation, this change needs to be documented and highlighted in the
392
392
release notes, and in top-level Kubernetes documents.
393
- 1 . Resizing memory lower: Lowering cgroup memory limits may not work as pages
394
- could be in use, and approaches such as setting limit near current usage may
395
- be required. This issue needs further investigation.
396
393
1 . Scheduler race condition: If a resize happens concurrently with the scheduler evaluating the node
397
394
where the pod is resized, it can result in a node being over-scheduled, which will cause the pod
398
395
to be rejected with an ` OutOfCPU ` or ` OutOfMemory ` error. Solving this race condition is out of
@@ -810,11 +807,17 @@ Setting the memory limit below current memory usage can cause problems. If the k
810
807
sufficient memory, the outcome depends on the cgroups version. With cgroups v1 the change will
811
808
simply be rejected by the kernel, whereas with cgroups v2 it will trigger an oom-kill.
812
809
813
- In the initial beta release of in-place resize, we will ** disallow** ` PreferNoRestart ` memory limit
814
- decreases, enforced through API validation. The intent is for this restriction to be relaxed in the
815
- future, but the design of how limit decreases will be approached is still undecided.
810
+ If the memory resize restart policy is ` NotRequired ` (or unspecified), the Kubelet will make a
811
+ ** best-effort** attempt to prevent oom-kills when decreasing memory limits, but doesn't provide any
812
+ guarantees. Before decreasing container memory limits, the Kubelet will read the container memory
813
+ usage (via the StatsProvider). If usage is greater than the desired limit, the resize will be
814
+ skipped for that container. The pod condition ` PodResizeInProgress ` will remain, with an ` Error `
815
+ reason, and a message reporting the current usage & desired limit. This is considered best-effort
816
+ since it is still subject to a TOCTOU race condition where the usage exceeds the limit after the
817
+ check is performed. A similar check will also be performed at the pod level before lowering the pod
818
+ cgroup memory limit.
816
819
817
- Memory limit decreases with ` RestartRequired ` are still allowed .
820
+ _ Version skew note: _ Kubernetes v1.33 (and earlier) nodes only check the pod-level memory usage .
818
821
819
822
### Swap
820
823
@@ -891,7 +894,8 @@ This will be reconsidered post-beta as a future enhancement.
891
894
892
895
### Future Enhancements
893
896
894
- 1 . Allow memory limits to be decreased, and handle the case where limits are set below usage.
897
+ 1 . Improve memory limit decrease oom-kill prevention by leveraging other kernel mechanisms or using
898
+ gradual decreaese.
895
899
1 . Kubelet (or Scheduler) evicts lower priority Pods from Node to make room for
896
900
resize. Pre-emption by Kubelet may be simpler and offer lower latencies.
897
901
1 . Allow ResizePolicy to be set on Pod level, acting as default if (some of)
@@ -1546,6 +1550,8 @@ _This section must be completed when targeting beta graduation to a release._
1546
1550
and update CRI ` UpdateContainerResources ` contract
1547
1551
- Add back ` AllocatedResources ` field to resolve a scheduler corner case
1548
1552
- Introduce Actuated resources for actuation
1553
+ - 2025-06-03 - v1.34 post-beta updates
1554
+ - Allow no-restart memory limit decreases
1549
1555
1550
1556
## Drawbacks
1551
1557
0 commit comments