Skip to content

[Transforms][Utils][PromoteMem2Reg] Propagate nnan and ninf flags on par with the nsz flag #114271

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

pawosm-arm
Copy link
Contributor

@pawosm-arm pawosm-arm commented Oct 30, 2024

Following the change introduced by the PR #83381, this patch extends it with the same treatment of the nnan fast-math flag. This is to address the performance drop caused by PR #83200 which prevented vital InstCombine transformation due to the lack of relevant fast-math flags.

The PromoteMem2Reg utility is used by the SROA pass, where Phi nodes are being created. Proposed change allows propagation of the nnan flag down to these Phi nodes.

@llvmbot
Copy link
Member

llvmbot commented Oct 30, 2024

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-llvm-transforms

Author: Paul Osmialowski (pawosm-arm)

Changes

Following the change introduced by the PR #83381, this patch extends it with the same treatment of the nnan fast-math flag. This is to address the performance drop caused by PR#83200 which prevented vital InstCombine transformation due to the lack of relevant fast-math flags.

The PromoteMem2Reg utility is used by the SROA pass, where Phi nodes are being created. Proposed change allows propagation of the nnan flag down to these Phi nodes.


Full diff: https://github.com/llvm/llvm-project/pull/114271.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp (+9)
  • (modified) llvm/test/Transforms/SROA/propagate-fast-math-flags-on-phi.ll (+78)
diff --git a/llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp b/llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp
index 656bb1ebd1161e..8a42bdddb08119 100644
--- a/llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp
+++ b/llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp
@@ -394,6 +394,9 @@ struct PromoteMem2Reg {
   /// Whether the function has the no-signed-zeros-fp-math attribute set.
   bool NoSignedZeros = false;
 
+  /// Whether the function has the no-nans-fp-math attribute set.
+  bool NoNaNs = false;
+
 public:
   PromoteMem2Reg(ArrayRef<AllocaInst *> Allocas, DominatorTree &DT,
                  AssumptionCache *AC)
@@ -752,6 +755,7 @@ void PromoteMem2Reg::run() {
   ForwardIDFCalculator IDF(DT);
 
   NoSignedZeros = F.getFnAttribute("no-signed-zeros-fp-math").getValueAsBool();
+  NoNaNs = F.getFnAttribute("no-nans-fp-math").getValueAsBool();
 
   for (unsigned AllocaNum = 0; AllocaNum != Allocas.size(); ++AllocaNum) {
     AllocaInst *AI = Allocas[AllocaNum];
@@ -1140,6 +1144,11 @@ void PromoteMem2Reg::RenamePass(BasicBlock *BB, BasicBlock *Pred,
         if (isa<FPMathOperator>(APN) && NoSignedZeros)
           APN->setHasNoSignedZeros(true);
 
+        // This allows select instruction folding relevant to floating point
+        // reductions whose operand is a PHI.
+        if (isa<FPMathOperator>(APN) && NoNaNs)
+          APN->setHasNoNaNs(true);
+
         // The currently active variable for this block is now the PHI.
         IncomingVals[AllocaNo] = APN;
         AllocaATInfo[AllocaNo].updateForNewPhi(APN, DIB);
diff --git a/llvm/test/Transforms/SROA/propagate-fast-math-flags-on-phi.ll b/llvm/test/Transforms/SROA/propagate-fast-math-flags-on-phi.ll
index 2cc26363daf9c5..4eda5108b7aba4 100644
--- a/llvm/test/Transforms/SROA/propagate-fast-math-flags-on-phi.ll
+++ b/llvm/test/Transforms/SROA/propagate-fast-math-flags-on-phi.ll
@@ -77,3 +77,81 @@ return:                          ; preds = %entry,%if.then
   %retval = load double, ptr %x.addr
   ret double %retval
 }
+
+define double @phi_with_nnan(double %x) "no-nans-fp-math"="true" {
+; CHECK-LABEL: define double @phi_with_nnan(
+; CHECK-SAME: double [[X:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[CMP:%.*]] = fcmp olt double [[X]], 0.000000e+00
+; CHECK-NEXT:    br i1 [[CMP]], label [[IF_THEN:%.*]], label [[RETURN:%.*]]
+; CHECK:       if.then:
+; CHECK-NEXT:    [[FNEG:%.*]] = fneg double [[X]]
+; CHECK-NEXT:    br label [[RETURN]]
+; CHECK:       return:
+; CHECK-NEXT:    [[X_ADDR_0:%.*]] = phi nnan double [ [[FNEG]], [[IF_THEN]] ], [ undef, [[ENTRY:%.*]] ]
+; CHECK-NEXT:    ret double [[X_ADDR_0]]
+entry:
+  %x.addr = alloca double
+  %cmp = fcmp olt double %x, 0.0
+  br i1 %cmp, label %if.then, label %return
+
+if.then:                         ; preds = %entry
+  %fneg = fneg double %x
+  store double %fneg, ptr %x.addr
+  br label %return
+
+return:                          ; preds = %entry,%if.then
+  %retval = load double, ptr %x.addr
+  ret double %retval
+}
+
+define <2 x double> @vector_phi_with_nnan(<2 x double> %x, i1 %cmp, <2 x double> %a, <2 x double> %b) "no-nans-fp-math"="true" {
+; CHECK-LABEL: define <2 x double> @vector_phi_with_nnan(
+; CHECK-SAME: <2 x double> [[X:%.*]], i1 [[CMP:%.*]], <2 x double> [[A:%.*]], <2 x double> [[B:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    br i1 [[CMP]], label [[IF_THEN:%.*]], label [[RETURN:%.*]]
+; CHECK:       if.then:
+; CHECK-NEXT:    br label [[RETURN]]
+; CHECK:       return:
+; CHECK-NEXT:    [[X_ADDR_0:%.*]] = phi nnan <2 x double> [ [[B]], [[IF_THEN]] ], [ [[A]], [[ENTRY:%.*]] ]
+; CHECK-NEXT:    ret <2 x double> [[X_ADDR_0]]
+entry:
+  %x.addr = alloca <2 x double>
+  store <2 x double> %a, ptr %x.addr
+  br i1 %cmp, label %if.then, label %return
+
+if.then:                         ; preds = %entry
+  store <2 x double> %b, ptr %x.addr
+  br label %return
+
+return:                          ; preds = %entry,%if.then
+  %retval = load <2 x double>, ptr %x.addr
+  ret <2 x double> %retval
+}
+
+define double @phi_without_nnan(double %x) "no-nans-fp-math"="false" {
+; CHECK-LABEL: define double @phi_without_nnan(
+; CHECK-SAME: double [[X:%.*]]) #[[ATTR1:[0-9]+]] {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[CMP:%.*]] = fcmp olt double [[X]], 0.000000e+00
+; CHECK-NEXT:    br i1 [[CMP]], label [[IF_THEN:%.*]], label [[RETURN:%.*]]
+; CHECK:       if.then:
+; CHECK-NEXT:    [[FNEG:%.*]] = fneg double [[X]]
+; CHECK-NEXT:    br label [[RETURN]]
+; CHECK:       return:
+; CHECK-NEXT:    [[X_ADDR_0:%.*]] = phi double [ [[FNEG]], [[IF_THEN]] ], [ undef, [[ENTRY:%.*]] ]
+; CHECK-NEXT:    ret double [[X_ADDR_0]]
+entry:
+  %x.addr = alloca double
+  %cmp = fcmp olt double %x, 0.0
+  br i1 %cmp, label %if.then, label %return
+
+if.then:                         ; preds = %entry
+  %fneg = fneg double %x
+  store double %fneg, ptr %x.addr
+  br label %return
+
+return:                          ; preds = %entry,%if.then
+  %retval = load double, ptr %x.addr
+  ret double %retval
+}

@pawosm-arm pawosm-arm requested a review from yashssh October 30, 2024 17:17
@dtcxzyw dtcxzyw requested a review from arsenm October 30, 2024 23:58
@arsenm
Copy link
Contributor

arsenm commented Oct 31, 2024

Can you add a phase ordering test that shows what case this helps? The nsz case was more difficult to recover; I think nnan/ninf are simpler.

@pawosm-arm
Copy link
Contributor Author

pawosm-arm commented Oct 31, 2024

I've rebased my PR so this branch could also contain phase-ordering test for vectorizing predicated selects introduced by commit 577c7dd

@llvmbot llvmbot added the clang Clang issues not falling into any other category label Oct 31, 2024
@pawosm-arm pawosm-arm requested a review from davemgreen November 1, 2024 10:27
Copy link
Collaborator

@paulwalker-arm paulwalker-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks reasonable to me and consistent with an existing solution to a similar problem. Worse case, the new code should be removed when we're at a point that instruction fast math flags are sufficiently preserved/propagated.

That said, @pawosm-arm please wait until middle of next week before landing the PR just in case anybody strongly disagrees with this path.

@nikic
Copy link
Contributor

nikic commented Nov 5, 2024

I've rebased my PR so this branch could also contain phase-ordering test for vectorizing predicated selects introduced by commit 577c7dd

Do I understand correctly that this PR doesn't change the PhaseOrdering test because the motivating issue was already addressed by 0f91944?

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably also handle no-infs-fp-math to save the next person the trouble.

@pawosm-arm
Copy link
Contributor Author

I've rebased my PR so this branch could also contain phase-ordering test for vectorizing predicated selects introduced by commit 577c7dd

Do I understand correctly that this PR doesn't change the PhaseOrdering test because the motivating issue was already addressed by 0f91944?

Yes, the original problem that I've faced could be addressed various ways, and 0f91944 indeed fixes it too. Yet we see a value in the change proposed here too hence a good reason for merging it.

@pawosm-arm
Copy link
Contributor Author

We should probably also handle no-infs-fp-math to save the next person the trouble.

I'd leave it to a separate PR, as this one has a specific story behind the proposed change, anything extra would go beyond it.

@paulwalker-arm
Copy link
Collaborator

paulwalker-arm commented Nov 5, 2024

Yet we see a value in the change proposed here too hence a good reason for merging it.

I've clearly misunderstood the context of the rebase. What is the value in merging this change if the original issue has already been resolved? I guess this fix is simpler and thus might help compile time but then that wasn't the original intent.

@pawosm-arm
Copy link
Contributor Author

pawosm-arm commented Nov 5, 2024

Yet we see a value in the change proposed here too hence a good reason for merging it.

I've clearly misunderstood the context of the rebase. What is the value in merging this change if the original issue has already been resolved? I guess this fix is simpler and thus might help compile time but then that wasn't the original intent.

There are two ways of looking at this situation:

One, I can imagine many ways the nnan flag could not be retrieved if it isn't ensured (the way my patch is doing it), Dave's patch solves the problem more elaborate way we know it works in specific situations, yet how can we be sure there aren't other places in which nnan should be easily visible if a transformation should succeed?

The other is that there's a general movement away from having the function attributes (in favor of using only the instruction flags) and thus by adding another use we make this transition harder.

My PR will catch all cases and has very low complexity. Seems I need @nikic opinion on the subject, whether we should land it or abandon it.

@nikic
Copy link
Contributor

nikic commented Nov 5, 2024

I do think this change still makes sense, especially from a consistency point of view. If SROA sets one of the value-based FMF flags (nsz) then it stands to reason that it should also set the other two (nnan and ninf). Unless there is some reason why nsz would be more problematic than nnan/ninf in this context (it does have substantially different semantics).

@davemgreen
Copy link
Collaborator

This patch should help with the performance in other ways than the motivating case from #113686 (the creating of fmin/fmax is one of them). Those could probably be fixed in other ways, and my understanding was that there was a long term goal to move away form the function level FMF attributes. But this should help performance in a fairly generic way.

@arsenm
Copy link
Contributor

arsenm commented Nov 5, 2024

I do think this change still makes sense, especially from a consistency point of view. If SROA sets one of the value-based FMF flags (nsz) then it stands to reason that it should also set the other two (nnan and ninf). Unless there is some reason why nsz would be more problematic than nnan/ninf in this context (it does have substantially different semantics).

nsz is more problematic and difficult to recover than nnan/ninf. Violations of no-nans no-infs can be treated as poison. nsz is fuzzier and just expands the set of permissible values, and is thus attached to specific operations and not recoverable from the inputs. You can't simply add some form of assertion that a value cannot be -0, in the way you could for a nan.

It's not problematic to preserve nnan in the same way here, but it's also less important.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jcranmer-intel
Copy link
Contributor

arsenm sums it up quite well, I think.

Personally, I dislike the FMF being on select/phi only somewhat less than I dislike the function-level attributes, and I'd rather avoid needing to use them if there's a better way forward. Because nnan/ninf induce poison, you can get a lot of potential by poison propagation. But nsz doesn't induce poison, and that means that where you have patterns like fabs formation, you need the flag on the select to know that you can change the result of a value which is otherwise unchanged.

…par with the nsz flag

Following the change introduced by the PR llvm#83381, this patch extends
it with the same treatment of the nnan and ninf fast-math flags.

This is to address the performance drop caused by PR#83200 which
prevented vital InstCombine transformation due to the lack of the
relevant fast-math flag.

The PromoteMem2Reg utility is used by the SROA pass, where Phi nodes
are being created. Proposed change allows propagation of the nnan and
ninf flags down to these Phi nodes.
@pawosm-arm pawosm-arm changed the title [Transforms][Utils][PromoteMem2Reg] Propagate nnan flag on par with the nsz flag [Transforms][Utils][PromoteMem2Reg] Propagate nnan and ninf flags on par with the nsz flag Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang Clang issues not falling into any other category floating-point Floating-point math llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants