-
Notifications
You must be signed in to change notification settings - Fork 14.5k
[Transforms][Utils][PromoteMem2Reg] Propagate nnan and ninf flags on par with the nsz flag #114271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-clang @llvm/pr-subscribers-llvm-transforms Author: Paul Osmialowski (pawosm-arm) ChangesFollowing the change introduced by the PR #83381, this patch extends it with the same treatment of the nnan fast-math flag. This is to address the performance drop caused by PR#83200 which prevented vital InstCombine transformation due to the lack of relevant fast-math flags. The PromoteMem2Reg utility is used by the SROA pass, where Phi nodes are being created. Proposed change allows propagation of the nnan flag down to these Phi nodes. Full diff: https://github.com/llvm/llvm-project/pull/114271.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp b/llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp
index 656bb1ebd1161e..8a42bdddb08119 100644
--- a/llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp
+++ b/llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp
@@ -394,6 +394,9 @@ struct PromoteMem2Reg {
/// Whether the function has the no-signed-zeros-fp-math attribute set.
bool NoSignedZeros = false;
+ /// Whether the function has the no-nans-fp-math attribute set.
+ bool NoNaNs = false;
+
public:
PromoteMem2Reg(ArrayRef<AllocaInst *> Allocas, DominatorTree &DT,
AssumptionCache *AC)
@@ -752,6 +755,7 @@ void PromoteMem2Reg::run() {
ForwardIDFCalculator IDF(DT);
NoSignedZeros = F.getFnAttribute("no-signed-zeros-fp-math").getValueAsBool();
+ NoNaNs = F.getFnAttribute("no-nans-fp-math").getValueAsBool();
for (unsigned AllocaNum = 0; AllocaNum != Allocas.size(); ++AllocaNum) {
AllocaInst *AI = Allocas[AllocaNum];
@@ -1140,6 +1144,11 @@ void PromoteMem2Reg::RenamePass(BasicBlock *BB, BasicBlock *Pred,
if (isa<FPMathOperator>(APN) && NoSignedZeros)
APN->setHasNoSignedZeros(true);
+ // This allows select instruction folding relevant to floating point
+ // reductions whose operand is a PHI.
+ if (isa<FPMathOperator>(APN) && NoNaNs)
+ APN->setHasNoNaNs(true);
+
// The currently active variable for this block is now the PHI.
IncomingVals[AllocaNo] = APN;
AllocaATInfo[AllocaNo].updateForNewPhi(APN, DIB);
diff --git a/llvm/test/Transforms/SROA/propagate-fast-math-flags-on-phi.ll b/llvm/test/Transforms/SROA/propagate-fast-math-flags-on-phi.ll
index 2cc26363daf9c5..4eda5108b7aba4 100644
--- a/llvm/test/Transforms/SROA/propagate-fast-math-flags-on-phi.ll
+++ b/llvm/test/Transforms/SROA/propagate-fast-math-flags-on-phi.ll
@@ -77,3 +77,81 @@ return: ; preds = %entry,%if.then
%retval = load double, ptr %x.addr
ret double %retval
}
+
+define double @phi_with_nnan(double %x) "no-nans-fp-math"="true" {
+; CHECK-LABEL: define double @phi_with_nnan(
+; CHECK-SAME: double [[X:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[CMP:%.*]] = fcmp olt double [[X]], 0.000000e+00
+; CHECK-NEXT: br i1 [[CMP]], label [[IF_THEN:%.*]], label [[RETURN:%.*]]
+; CHECK: if.then:
+; CHECK-NEXT: [[FNEG:%.*]] = fneg double [[X]]
+; CHECK-NEXT: br label [[RETURN]]
+; CHECK: return:
+; CHECK-NEXT: [[X_ADDR_0:%.*]] = phi nnan double [ [[FNEG]], [[IF_THEN]] ], [ undef, [[ENTRY:%.*]] ]
+; CHECK-NEXT: ret double [[X_ADDR_0]]
+entry:
+ %x.addr = alloca double
+ %cmp = fcmp olt double %x, 0.0
+ br i1 %cmp, label %if.then, label %return
+
+if.then: ; preds = %entry
+ %fneg = fneg double %x
+ store double %fneg, ptr %x.addr
+ br label %return
+
+return: ; preds = %entry,%if.then
+ %retval = load double, ptr %x.addr
+ ret double %retval
+}
+
+define <2 x double> @vector_phi_with_nnan(<2 x double> %x, i1 %cmp, <2 x double> %a, <2 x double> %b) "no-nans-fp-math"="true" {
+; CHECK-LABEL: define <2 x double> @vector_phi_with_nnan(
+; CHECK-SAME: <2 x double> [[X:%.*]], i1 [[CMP:%.*]], <2 x double> [[A:%.*]], <2 x double> [[B:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT: entry:
+; CHECK-NEXT: br i1 [[CMP]], label [[IF_THEN:%.*]], label [[RETURN:%.*]]
+; CHECK: if.then:
+; CHECK-NEXT: br label [[RETURN]]
+; CHECK: return:
+; CHECK-NEXT: [[X_ADDR_0:%.*]] = phi nnan <2 x double> [ [[B]], [[IF_THEN]] ], [ [[A]], [[ENTRY:%.*]] ]
+; CHECK-NEXT: ret <2 x double> [[X_ADDR_0]]
+entry:
+ %x.addr = alloca <2 x double>
+ store <2 x double> %a, ptr %x.addr
+ br i1 %cmp, label %if.then, label %return
+
+if.then: ; preds = %entry
+ store <2 x double> %b, ptr %x.addr
+ br label %return
+
+return: ; preds = %entry,%if.then
+ %retval = load <2 x double>, ptr %x.addr
+ ret <2 x double> %retval
+}
+
+define double @phi_without_nnan(double %x) "no-nans-fp-math"="false" {
+; CHECK-LABEL: define double @phi_without_nnan(
+; CHECK-SAME: double [[X:%.*]]) #[[ATTR1:[0-9]+]] {
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[CMP:%.*]] = fcmp olt double [[X]], 0.000000e+00
+; CHECK-NEXT: br i1 [[CMP]], label [[IF_THEN:%.*]], label [[RETURN:%.*]]
+; CHECK: if.then:
+; CHECK-NEXT: [[FNEG:%.*]] = fneg double [[X]]
+; CHECK-NEXT: br label [[RETURN]]
+; CHECK: return:
+; CHECK-NEXT: [[X_ADDR_0:%.*]] = phi double [ [[FNEG]], [[IF_THEN]] ], [ undef, [[ENTRY:%.*]] ]
+; CHECK-NEXT: ret double [[X_ADDR_0]]
+entry:
+ %x.addr = alloca double
+ %cmp = fcmp olt double %x, 0.0
+ br i1 %cmp, label %if.then, label %return
+
+if.then: ; preds = %entry
+ %fneg = fneg double %x
+ store double %fneg, ptr %x.addr
+ br label %return
+
+return: ; preds = %entry,%if.then
+ %retval = load double, ptr %x.addr
+ ret double %retval
+}
|
Can you add a phase ordering test that shows what case this helps? The nsz case was more difficult to recover; I think nnan/ninf are simpler. |
I've rebased my PR so this branch could also contain phase-ordering test for vectorizing predicated selects introduced by commit 577c7dd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change looks reasonable to me and consistent with an existing solution to a similar problem. Worse case, the new code should be removed when we're at a point that instruction fast math flags are sufficiently preserved/propagated.
That said, @pawosm-arm please wait until middle of next week before landing the PR just in case anybody strongly disagrees with this path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably also handle no-infs-fp-math
to save the next person the trouble.
Yes, the original problem that I've faced could be addressed various ways, and 0f91944 indeed fixes it too. Yet we see a value in the change proposed here too hence a good reason for merging it. |
I'd leave it to a separate PR, as this one has a specific story behind the proposed change, anything extra would go beyond it. |
I've clearly misunderstood the context of the rebase. What is the value in merging this change if the original issue has already been resolved? I guess this fix is simpler and thus might help compile time but then that wasn't the original intent. |
There are two ways of looking at this situation: One, I can imagine many ways the The other is that there's a general movement away from having the function attributes (in favor of using only the instruction flags) and thus by adding another use we make this transition harder. My PR will catch all cases and has very low complexity. Seems I need @nikic opinion on the subject, whether we should land it or abandon it. |
I do think this change still makes sense, especially from a consistency point of view. If SROA sets one of the value-based FMF flags (nsz) then it stands to reason that it should also set the other two (nnan and ninf). Unless there is some reason why nsz would be more problematic than nnan/ninf in this context (it does have substantially different semantics). |
This patch should help with the performance in other ways than the motivating case from #113686 (the creating of fmin/fmax is one of them). Those could probably be fixed in other ways, and my understanding was that there was a long term goal to move away form the function level FMF attributes. But this should help performance in a fairly generic way. |
nsz is more problematic and difficult to recover than nnan/ninf. Violations of no-nans no-infs can be treated as poison. nsz is fuzzier and just expands the set of permissible values, and is thus attached to specific operations and not recoverable from the inputs. You can't simply add some form of assertion that a value cannot be -0, in the way you could for a nan. It's not problematic to preserve nnan in the same way here, but it's also less important. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
arsenm sums it up quite well, I think. Personally, I dislike the FMF being on select/phi only somewhat less than I dislike the function-level attributes, and I'd rather avoid needing to use them if there's a better way forward. Because nnan/ninf induce poison, you can get a lot of potential by poison propagation. But nsz doesn't induce poison, and that means that where you have patterns like fabs formation, you need the flag on the select to know that you can change the result of a value which is otherwise unchanged. |
…par with the nsz flag Following the change introduced by the PR llvm#83381, this patch extends it with the same treatment of the nnan and ninf fast-math flags. This is to address the performance drop caused by PR#83200 which prevented vital InstCombine transformation due to the lack of the relevant fast-math flag. The PromoteMem2Reg utility is used by the SROA pass, where Phi nodes are being created. Proposed change allows propagation of the nnan and ninf flags down to these Phi nodes.
Following the change introduced by the PR #83381, this patch extends it with the same treatment of the nnan fast-math flag. This is to address the performance drop caused by PR #83200 which prevented vital InstCombine transformation due to the lack of relevant fast-math flags.
The PromoteMem2Reg utility is used by the SROA pass, where Phi nodes are being created. Proposed change allows propagation of the nnan flag down to these Phi nodes.