-
Notifications
You must be signed in to change notification settings - Fork 5k
JIT: Clean up and optimize StackLevelSetter
#112561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Clean up and optimize StackLevelSetter
#112561
Conversation
The stack levels computed are not actually used for anything outside x86, so avoid computing them in that case. Also, the max stack level does not actually get used by the backend, even for x86, so no need to save it in `Compiler`.
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
// Check that we have place for the push. | ||
assert(compiler->fgGetPtrArgCntMax() >= 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what the comment is referring to. This value has not been used to allocate anything by the backend. The only thing it is used for is deciding between EBP/ESP frame and whether we need to switch to partial interruptible mode, but that's all done during StackLevelSetter
.
@@ -4019,7 +4019,7 @@ PhaseStatus Compiler::fgSetBlockOrder() | |||
BasicBlock::s_nMaxTrees = 0; | |||
#endif | |||
|
|||
if (compCanEncodePtrArgCntMax() && fgHasCycleWithoutGCSafePoint()) | |||
if (fgHasCycleWithoutGCSafePoint()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fgPtrArgCntMax
is computed much after this, so the check was always true.
Actually |
cc @dotnet/jit-contrib PTAL @AndyAyersMS No diffs except for throughput improvements since we no longer run this phase outside x86 in the vast majority of cases. I think in the future we might want to rethink the heuristics for when a frame pointer is allocated and remove the quirk added here. I think for linux-x64 it might be necessary for diagnostics, but for win-x64 it would free up a GP register quite often, and I'm not sure if there are any significant downsides there. |
ping @AndyAyersMS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, though perhaps you can make it a bit faster still.
src/coreclr/jit/stacklevelsetter.cpp
Outdated
@@ -125,10 +114,21 @@ PhaseStatus StackLevelSetter::DoPhase() | |||
void StackLevelSetter::ProcessBlock(BasicBlock* block) | |||
{ | |||
assert(currentStackLevel == 0); | |||
#ifndef TARGET_X86 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't we host this and skip the whole process blocks loop in the caller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good catch. Addressed.
StackLevelSetter
is used to compute the max number of pushed stack slots that the function is going to have. Today we use that as a breadcrumb for whether a frame pointer is going to be required or not.It turns out that we have already computed this information outside x86: it is basically the same thing as the size of the fixed outgoing arg area. Hence this PR changes things to avoid running
StackLevelSetter
and instead reuse the information computed earlier.This is not always possible as
StackLevelSetter
is also used to optimize throw helpers. But we can detect this situation ahead of time and skip it in most cases.StackLevelSetter
was running in MinOpts before, so this shows up as a nice MinOpts TP win.On a side note: it seems unlikely that we actually need to switch to frame pointer outside x86 here, but I'm leaving the logic intact for now (hence this is no-diff).