Skip to content

Commit 17cf87e

Browse files
committed
KEP-1287: Allow memory limit decreases
1 parent 9d77f18 commit 17cf87e

File tree

1 file changed

+14
-8
lines changed
  • keps/sig-node/1287-in-place-update-pod-resources

1 file changed

+14
-8
lines changed

keps/sig-node/1287-in-place-update-pod-resources/README.md

Lines changed: 14 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -390,9 +390,6 @@ WindowsPodSandboxConfig.
390390
configurations will see values that may not represent actual configurations. As a
391391
mitigation, this change needs to be documented and highlighted in the
392392
release notes, and in top-level Kubernetes documents.
393-
1. Resizing memory lower: Lowering cgroup memory limits may not work as pages
394-
could be in use, and approaches such as setting limit near current usage may
395-
be required. This issue needs further investigation.
396393
1. Scheduler race condition: If a resize happens concurrently with the scheduler evaluating the node
397394
where the pod is resized, it can result in a node being over-scheduled, which will cause the pod
398395
to be rejected with an `OutOfCPU` or `OutOfMemory` error. Solving this race condition is out of
@@ -810,11 +807,17 @@ Setting the memory limit below current memory usage can cause problems. If the k
810807
sufficient memory, the outcome depends on the cgroups version. With cgroups v1 the change will
811808
simply be rejected by the kernel, whereas with cgroups v2 it will trigger an oom-kill.
812809

813-
In the initial beta release of in-place resize, we will **disallow** `PreferNoRestart` memory limit
814-
decreases, enforced through API validation. The intent is for this restriction to be relaxed in the
815-
future, but the design of how limit decreases will be approached is still undecided.
810+
If the memory resize restart policy is `NotRequired` (or unspecified), the Kubelet will make a
811+
**best-effort** attempt to prevent oom-kills when decreasing memory limits, but doesn't provide any
812+
guarantees. Before decreasing container memory limits, the Kubelet will read the container memory
813+
usage (via the StatsProvider). If usage is greater than the desired limit, the resize will be
814+
skipped for that container. The pod condition `PodResizeInProgress` will remain, with an `Error`
815+
reason, and a message reporting the current usage & desired limit. This is considered best-effort
816+
since it is still subject to a TOCTOU race condition where the usage exceeds the limit after the
817+
check is performed. A similar check will also be performed at the pod level before lowering the pod
818+
cgroup memory limit.
816819

817-
Memory limit decreases with `RestartRequired` are still allowed.
820+
_Version skew note:_ Kubernetes v1.33 (and earlier) nodes only check the pod-level memory usage.
818821

819822
### Swap
820823

@@ -891,7 +894,8 @@ This will be reconsidered post-beta as a future enhancement.
891894

892895
### Future Enhancements
893896

894-
1. Allow memory limits to be decreased, and handle the case where limits are set below usage.
897+
1. Improve memory limit decrease oom-kill prevention by leveraging other kernel mechanisms or using
898+
gradual decreaese.
895899
1. Kubelet (or Scheduler) evicts lower priority Pods from Node to make room for
896900
resize. Pre-emption by Kubelet may be simpler and offer lower latencies.
897901
1. Allow ResizePolicy to be set on Pod level, acting as default if (some of)
@@ -1546,6 +1550,8 @@ _This section must be completed when targeting beta graduation to a release._
15461550
and update CRI `UpdateContainerResources` contract
15471551
- Add back `AllocatedResources` field to resolve a scheduler corner case
15481552
- Introduce Actuated resources for actuation
1553+
- 2025-06-03 - v1.34 post-beta updates
1554+
- Allow no-restart memory limit decreases
15491555

15501556
## Drawbacks
15511557

0 commit comments

Comments
 (0)