Skip to content

JIT: Clean up and optimize StackLevelSetter #112561

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Feb 20, 2025

Conversation

jakobbotsch
Copy link
Member

@jakobbotsch jakobbotsch commented Feb 14, 2025

StackLevelSetter is used to compute the max number of pushed stack slots that the function is going to have. Today we use that as a breadcrumb for whether a frame pointer is going to be required or not.
It turns out that we have already computed this information outside x86: it is basically the same thing as the size of the fixed outgoing arg area. Hence this PR changes things to avoid running StackLevelSetter and instead reuse the information computed earlier.

This is not always possible as StackLevelSetter is also used to optimize throw helpers. But we can detect this situation ahead of time and skip it in most cases.

StackLevelSetter was running in MinOpts before, so this shows up as a nice MinOpts TP win.

On a side note: it seems unlikely that we actually need to switch to frame pointer outside x86 here, but I'm leaving the logic intact for now (hence this is no-diff).

The stack levels computed are not actually used for anything outside
x86, so avoid computing them in that case. Also, the max stack level
does not actually get used by the backend, even for x86, so no need to
save it in `Compiler`.
@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 14, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Comment on lines -9560 to -9561
// Check that we have place for the push.
assert(compiler->fgGetPtrArgCntMax() >= 1);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what the comment is referring to. This value has not been used to allocate anything by the backend. The only thing it is used for is deciding between EBP/ESP frame and whether we need to switch to partial interruptible mode, but that's all done during StackLevelSetter.

@@ -4019,7 +4019,7 @@ PhaseStatus Compiler::fgSetBlockOrder()
BasicBlock::s_nMaxTrees = 0;
#endif

if (compCanEncodePtrArgCntMax() && fgHasCycleWithoutGCSafePoint())
if (fgHasCycleWithoutGCSafePoint())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fgPtrArgCntMax is computed much after this, so the check was always true.

@jakobbotsch
Copy link
Member Author

Actually CheckArgCnt is using this even outside x86 to make decisions about the frame pointer. Not sure that makes much sense, but should probably be changed before this clean up.

@jakobbotsch jakobbotsch marked this pull request as ready for review February 17, 2025 10:11
@jakobbotsch
Copy link
Member Author

cc @dotnet/jit-contrib PTAL @AndyAyersMS

No diffs except for throughput improvements since we no longer run this phase outside x86 in the vast majority of cases.

I think in the future we might want to rethink the heuristics for when a frame pointer is allocated and remove the quirk added here. I think for linux-x64 it might be necessary for diagnostics, but for win-x64 it would free up a GP register quite often, and I'm not sure if there are any significant downsides there.

@jakobbotsch
Copy link
Member Author

ping @AndyAyersMS

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though perhaps you can make it a bit faster still.

@@ -125,10 +114,21 @@ PhaseStatus StackLevelSetter::DoPhase()
void StackLevelSetter::ProcessBlock(BasicBlock* block)
{
assert(currentStackLevel == 0);
#ifndef TARGET_X86
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we host this and skip the whole process blocks loop in the caller?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good catch. Addressed.

@jakobbotsch jakobbotsch merged commit b71cad1 into dotnet:main Feb 20, 2025
40 checks passed
@jakobbotsch jakobbotsch deleted the clean-up-stack-level-setter branch February 20, 2025 07:49
@github-actions github-actions bot locked and limited conversation to collaborators Mar 22, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants