Improve memory planning for submodule hierarchies. #11860

hsharma35 · 2025-06-23T21:21:13Z

Summary:
Improves the memory planning across hierarchies in apply_algo in memory_planning.py:

Plan memory bottom-to-top, starting with the leaf submodules and ending at top-level graph module (root). This is now consistent with how delegates are compiled / memory planned. Future PRs/diffs will add support for planned buffers in delegates.
Allocate max bufsize for all submodules as graph_module.meta['input_mem_buffer_sizes'], rather than sum. This allows us to reclaim the space used by one submodule for another submodule.

Before this change the apply_algo in memory_planning.py would:

Plan memory top-to-bottom, starting with the top-level graph module (root).
Populate the input_mem_buffer_sizes so that each new submodule will allocate memory after the max buffer size of previous memory.

For example:

root [A bytes]
- root.child0 [B bytes]
   - root.child0.child0 [C bytes]
- root.child1 [D bytes]

(before this diff) Planned memory looks like:

--- A + B + C + D ----------------
Space for root.child1
--- A + B + C --------------------
Space for root.child0.child0
--- A + B ------------------------
Space for root.child0
--- A ----------------------------
Space for root
--- 0 ----------------------------

Note that tensors for child0 and child1 have no overlap but still use completely different space.

(after this diff) Planned memory looks like:

--- max(C + B, D) + A ----------
root
--- max(C + B, D) --------------
root.child0        |
--- C ------------ | root.child1
root.child0.child0 | 
--- 0 --------------------------

Note:
We can update memory planning algo to plan nodes with submodules (while/map/cond or even delegate) to use graph_module.meta['non_const_buffer_size'] and reduce space even further. Implementation for this is not part of this PR/Diff. This will allow us to reuse space for root.child0.child0 in root.child0, and space for root.child0/root.child1 in `root.

Differential Revision: D76940237

pytorch-bot · 2025-06-23T21:21:17Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11860

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 New Failures

As of commit 9da6f8c with merge base f072e64 ():

NEW FAILURES - The following jobs have failed:

Lint / lintrunner / linux-job (gh)
>>> Lint for exir/tests/test_memory_planning.py:
pull / unittest / linux / linux-job (gh)
exir/emit/test/test_emit.py::TestEmit::test_delegate_deduplicate
pull / unittest / macos / macos-job (gh)
exir/tests/test_joint_graph.py::TestJointGraph::test_joint_graph
pull / unittest-editable / linux / linux-job (gh)
exir/emit/test/test_emit.py::TestEmit::test_delegate_deduplicate
pull / unittest-editable / macos / macos-job (gh)
exir/emit/test/test_emit.py::TestEmit::test_delegate_deduplicate

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2025-06-23T21:21:47Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

facebook-github-bot · 2025-06-23T21:21:50Z

This pull request was exported from Phabricator. Differential Revision: D76940237

Summary: Improves the memory planning across hierarchies in apply_algo in memory_planning.py: 1. Plan memory bottom-to-top, starting with the leaf submodules and ending at top-level graph module (root). This is now consistent with how delegates are compiled / memory planned. Future PRs/diffs will add support for planned buffers in delegates. 2. Allocate max bufsize for all submodules as `graph_module.meta['input_mem_buffer_sizes']`, rather than sum. This allows us to reclaim the space used by one submodule for another submodule. Before this change the apply_algo in memory_planning.py would: 1. Plan memory top-to-bottom, starting with the top-level graph module (root). 2. Populate the `input_mem_buffer_sizes` so that each new submodule will allocate memory after the max buffer size of previous memory. For example: ``` root [A bytes] - root.child0 [B bytes] - root.child0.child0 [C bytes] - root.child1 [D bytes] ``` (before this diff) Planned memory looks like: ``` --- A + B + C + D ---------------- Space for root.child1 --- A + B + C -------------------- Space for root.child0.child0 --- A + B ------------------------ Space for root.child0 --- A ---------------------------- Space for root --- 0 ---------------------------- ``` Note that tensors for child0 and child1 have no overlap but still use completely different space. (after this diff) Planned memory looks like: ``` --- max(C + B, D) + A ---------- root --- max(C + B, D) -------------- root.child0 | --- C ------------ | root.child1 root.child0.child0 | --- 0 -------------------------- ``` Note: We can update memory planning algo to plan nodes with submodules (while/map/cond or even delegate) to use `graph_module.meta['non_const_buffer_size']` and reduce space even further. Implementation for this is not part of this PR/Diff. This will allow us to reuse space for `root.child0.child0` in `root.child0`, and space for `root.child0`/`root.child1` in `root. Differential Revision: D76940237

facebook-github-bot · 2025-06-24T01:25:23Z

This pull request was exported from Phabricator. Differential Revision: D76940237

hsharma35 requested review from JacobSzwejbka and larryliu0820 as code owners June 23, 2025 21:21

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 23, 2025

facebook-github-bot added the fb-exported label Jun 23, 2025

hsharma35 force-pushed the export-D76940237 branch from 98bbc9b to 9da6f8c Compare June 24, 2025 01:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve memory planning for submodule hierarchies. #11860

Improve memory planning for submodule hierarchies. #11860

Uh oh!

hsharma35 commented Jun 23, 2025

Uh oh!

pytorch-bot bot commented Jun 23, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jun 23, 2025

Uh oh!

facebook-github-bot commented Jun 23, 2025

Uh oh!

facebook-github-bot commented Jun 24, 2025

Uh oh!

Uh oh!

Improve memory planning for submodule hierarchies. #11860

Are you sure you want to change the base?

Improve memory planning for submodule hierarchies. #11860

Uh oh!

Conversation

hsharma35 commented Jun 23, 2025

Uh oh!

pytorch-bot bot commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11860

❌ 5 New Failures

Uh oh!

github-actions bot commented Jun 23, 2025

This PR needs a release notes: label

Uh oh!

facebook-github-bot commented Jun 23, 2025

Uh oh!

facebook-github-bot commented Jun 24, 2025

Uh oh!

Uh oh!

pytorch-bot bot commented Jun 23, 2025 •

edited

Loading

This PR needs a `release notes:` label