[Transforms][Utils][PromoteMem2Reg] Propagate nnan and ninf flags on par with the nsz flag #114271

pawosm-arm · 2024-10-30T17:16:30Z

Following the change introduced by the PR #83381, this patch extends it with the same treatment of the nnan fast-math flag. This is to address the performance drop caused by PR #83200 which prevented vital InstCombine transformation due to the lack of relevant fast-math flags.

The PromoteMem2Reg utility is used by the SROA pass, where Phi nodes are being created. Proposed change allows propagation of the nnan flag down to these Phi nodes.

llvmbot · 2024-10-30T17:17:07Z

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-llvm-transforms

Author: Paul Osmialowski (pawosm-arm)

Changes

Following the change introduced by the PR #83381, this patch extends it with the same treatment of the nnan fast-math flag. This is to address the performance drop caused by PR#83200 which prevented vital InstCombine transformation due to the lack of relevant fast-math flags.

The PromoteMem2Reg utility is used by the SROA pass, where Phi nodes are being created. Proposed change allows propagation of the nnan flag down to these Phi nodes.

Full diff: https://github.com/llvm/llvm-project/pull/114271.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp (+9)
(modified) llvm/test/Transforms/SROA/propagate-fast-math-flags-on-phi.ll (+78)

diff --git a/llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp b/llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp
index 656bb1ebd1161e..8a42bdddb08119 100644
--- a/llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp
+++ b/llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp
@@ -394,6 +394,9 @@ struct PromoteMem2Reg {
   /// Whether the function has the no-signed-zeros-fp-math attribute set.
   bool NoSignedZeros = false;
 
+  /// Whether the function has the no-nans-fp-math attribute set.
+  bool NoNaNs = false;
+
 public:
   PromoteMem2Reg(ArrayRef<AllocaInst *> Allocas, DominatorTree &DT,
                  AssumptionCache *AC)
@@ -752,6 +755,7 @@ void PromoteMem2Reg::run() {
   ForwardIDFCalculator IDF(DT);
 
   NoSignedZeros = F.getFnAttribute("no-signed-zeros-fp-math").getValueAsBool();
+  NoNaNs = F.getFnAttribute("no-nans-fp-math").getValueAsBool();
 
   for (unsigned AllocaNum = 0; AllocaNum != Allocas.size(); ++AllocaNum) {
     AllocaInst *AI = Allocas[AllocaNum];
@@ -1140,6 +1144,11 @@ void PromoteMem2Reg::RenamePass(BasicBlock *BB, BasicBlock *Pred,
         if (isa<FPMathOperator>(APN) && NoSignedZeros)
           APN->setHasNoSignedZeros(true);
 
+        // This allows select instruction folding relevant to floating point
+        // reductions whose operand is a PHI.
+        if (isa<FPMathOperator>(APN) && NoNaNs)
+          APN->setHasNoNaNs(true);
+
         // The currently active variable for this block is now the PHI.
         IncomingVals[AllocaNo] = APN;
         AllocaATInfo[AllocaNo].updateForNewPhi(APN, DIB);
diff --git a/llvm/test/Transforms/SROA/propagate-fast-math-flags-on-phi.ll b/llvm/test/Transforms/SROA/propagate-fast-math-flags-on-phi.ll
index 2cc26363daf9c5..4eda5108b7aba4 100644
--- a/llvm/test/Transforms/SROA/propagate-fast-math-flags-on-phi.ll
+++ b/llvm/test/Transforms/SROA/propagate-fast-math-flags-on-phi.ll
@@ -77,3 +77,81 @@ return:                          ; preds = %entry,%if.then
   %retval = load double, ptr %x.addr
   ret double %retval
 }
+
+define double @phi_with_nnan(double %x) "no-nans-fp-math"="true" {
+; CHECK-LABEL: define double @phi_with_nnan(
+; CHECK-SAME: double [[X:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[CMP:%.*]] = fcmp olt double [[X]], 0.000000e+00
+; CHECK-NEXT:    br i1 [[CMP]], label [[IF_THEN:%.*]], label [[RETURN:%.*]]
+; CHECK:       if.then:
+; CHECK-NEXT:    [[FNEG:%.*]] = fneg double [[X]]
+; CHECK-NEXT:    br label [[RETURN]]
+; CHECK:       return:
+; CHECK-NEXT:    [[X_ADDR_0:%.*]] = phi nnan double [ [[FNEG]], [[IF_THEN]] ], [ undef, [[ENTRY:%.*]] ]
+; CHECK-NEXT:    ret double [[X_ADDR_0]]
+entry:
+  %x.addr = alloca double
+  %cmp = fcmp olt double %x, 0.0
+  br i1 %cmp, label %if.then, label %return
+
+if.then:                         ; preds = %entry
+  %fneg = fneg double %x
+  store double %fneg, ptr %x.addr
+  br label %return
+
+return:                          ; preds = %entry,%if.then
+  %retval = load double, ptr %x.addr
+  ret double %retval
+}
+
+define <2 x double> @vector_phi_with_nnan(<2 x double> %x, i1 %cmp, <2 x double> %a, <2 x double> %b) "no-nans-fp-math"="true" {
+; CHECK-LABEL: define <2 x double> @vector_phi_with_nnan(
+; CHECK-SAME: <2 x double> [[X:%.*]], i1 [[CMP:%.*]], <2 x double> [[A:%.*]], <2 x double> [[B:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    br i1 [[CMP]], label [[IF_THEN:%.*]], label [[RETURN:%.*]]
+; CHECK:       if.then:
+; CHECK-NEXT:    br label [[RETURN]]
+; CHECK:       return:
+; CHECK-NEXT:    [[X_ADDR_0:%.*]] = phi nnan <2 x double> [ [[B]], [[IF_THEN]] ], [ [[A]], [[ENTRY:%.*]] ]
+; CHECK-NEXT:    ret <2 x double> [[X_ADDR_0]]
+entry:
+  %x.addr = alloca <2 x double>
+  store <2 x double> %a, ptr %x.addr
+  br i1 %cmp, label %if.then, label %return
+
+if.then:                         ; preds = %entry
+  store <2 x double> %b, ptr %x.addr
+  br label %return
+
+return:                          ; preds = %entry,%if.then
+  %retval = load <2 x double>, ptr %x.addr
+  ret <2 x double> %retval
+}
+
+define double @phi_without_nnan(double %x) "no-nans-fp-math"="false" {
+; CHECK-LABEL: define double @phi_without_nnan(
+; CHECK-SAME: double [[X:%.*]]) #[[ATTR1:[0-9]+]] {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[CMP:%.*]] = fcmp olt double [[X]], 0.000000e+00
+; CHECK-NEXT:    br i1 [[CMP]], label [[IF_THEN:%.*]], label [[RETURN:%.*]]
+; CHECK:       if.then:
+; CHECK-NEXT:    [[FNEG:%.*]] = fneg double [[X]]
+; CHECK-NEXT:    br label [[RETURN]]
+; CHECK:       return:
+; CHECK-NEXT:    [[X_ADDR_0:%.*]] = phi double [ [[FNEG]], [[IF_THEN]] ], [ undef, [[ENTRY:%.*]] ]
+; CHECK-NEXT:    ret double [[X_ADDR_0]]
+entry:
+  %x.addr = alloca double
+  %cmp = fcmp olt double %x, 0.0
+  br i1 %cmp, label %if.then, label %return
+
+if.then:                         ; preds = %entry
+  %fneg = fneg double %x
+  store double %fneg, ptr %x.addr
+  br label %return
+
+return:                          ; preds = %entry,%if.then
+  %retval = load double, ptr %x.addr
+  ret double %retval
+}

arsenm · 2024-10-31T00:08:10Z

Can you add a phase ordering test that shows what case this helps? The nsz case was more difficult to recover; I think nnan/ninf are simpler.

pawosm-arm · 2024-10-31T11:22:29Z

I've rebased my PR so this branch could also contain phase-ordering test for vectorizing predicated selects introduced by commit 577c7dd

paulwalker-arm

The change looks reasonable to me and consistent with an existing solution to a similar problem. Worse case, the new code should be removed when we're at a point that instruction fast math flags are sufficiently preserved/propagated.

That said, @pawosm-arm please wait until middle of next week before landing the PR just in case anybody strongly disagrees with this path.

nikic · 2024-11-05T10:23:38Z

I've rebased my PR so this branch could also contain phase-ordering test for vectorizing predicated selects introduced by commit 577c7dd

Do I understand correctly that this PR doesn't change the PhaseOrdering test because the motivating issue was already addressed by 0f91944?

nikic

We should probably also handle no-infs-fp-math to save the next person the trouble.

pawosm-arm · 2024-11-05T12:03:31Z

I've rebased my PR so this branch could also contain phase-ordering test for vectorizing predicated selects introduced by commit 577c7dd

Do I understand correctly that this PR doesn't change the PhaseOrdering test because the motivating issue was already addressed by 0f91944?

Yes, the original problem that I've faced could be addressed various ways, and 0f91944 indeed fixes it too. Yet we see a value in the change proposed here too hence a good reason for merging it.

pawosm-arm · 2024-11-05T12:06:13Z

We should probably also handle no-infs-fp-math to save the next person the trouble.

I'd leave it to a separate PR, as this one has a specific story behind the proposed change, anything extra would go beyond it.

paulwalker-arm · 2024-11-05T12:18:46Z

Yet we see a value in the change proposed here too hence a good reason for merging it.

I've clearly misunderstood the context of the rebase. What is the value in merging this change if the original issue has already been resolved? I guess this fix is simpler and thus might help compile time but then that wasn't the original intent.

pawosm-arm · 2024-11-05T12:48:21Z

Yet we see a value in the change proposed here too hence a good reason for merging it.

I've clearly misunderstood the context of the rebase. What is the value in merging this change if the original issue has already been resolved? I guess this fix is simpler and thus might help compile time but then that wasn't the original intent.

There are two ways of looking at this situation:

One, I can imagine many ways the nnan flag could not be retrieved if it isn't ensured (the way my patch is doing it), Dave's patch solves the problem more elaborate way we know it works in specific situations, yet how can we be sure there aren't other places in which nnan should be easily visible if a transformation should succeed?

The other is that there's a general movement away from having the function attributes (in favor of using only the instruction flags) and thus by adding another use we make this transition harder.

My PR will catch all cases and has very low complexity. Seems I need @nikic opinion on the subject, whether we should land it or abandon it.

nikic · 2024-11-05T12:51:33Z

I do think this change still makes sense, especially from a consistency point of view. If SROA sets one of the value-based FMF flags (nsz) then it stands to reason that it should also set the other two (nnan and ninf). Unless there is some reason why nsz would be more problematic than nnan/ninf in this context (it does have substantially different semantics).

davemgreen · 2024-11-05T13:20:39Z

This patch should help with the performance in other ways than the motivating case from #113686 (the creating of fmin/fmax is one of them). Those could probably be fixed in other ways, and my understanding was that there was a long term goal to move away form the function level FMF attributes. But this should help performance in a fairly generic way.

arsenm · 2024-11-05T16:04:24Z

I do think this change still makes sense, especially from a consistency point of view. If SROA sets one of the value-based FMF flags (nsz) then it stands to reason that it should also set the other two (nnan and ninf). Unless there is some reason why nsz would be more problematic than nnan/ninf in this context (it does have substantially different semantics).

nsz is more problematic and difficult to recover than nnan/ninf. Violations of no-nans no-infs can be treated as poison. nsz is fuzzier and just expands the set of permissible values, and is thus attached to specific operations and not recoverable from the inputs. You can't simply add some form of assertion that a value cannot be -0, in the way you could for a nan.

It's not problematic to preserve nnan in the same way here, but it's also less important.

nikic

LGTM

llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp

jcranmer-intel · 2024-11-05T19:33:54Z

arsenm sums it up quite well, I think.

Personally, I dislike the FMF being on select/phi only somewhat less than I dislike the function-level attributes, and I'd rather avoid needing to use them if there's a better way forward. Because nnan/ninf induce poison, you can get a lot of potential by poison propagation. But nsz doesn't induce poison, and that means that where you have patterns like fabs formation, you need the flag on the select to know that you can change the result of a value which is otherwise unchanged.

…par with the nsz flag Following the change introduced by the PR llvm#83381, this patch extends it with the same treatment of the nnan and ninf fast-math flags. This is to address the performance drop caused by PR#83200 which prevented vital InstCombine transformation due to the lack of the relevant fast-math flag. The PromoteMem2Reg utility is used by the SROA pass, where Phi nodes are being created. Proposed change allows propagation of the nnan and ninf flags down to these Phi nodes.

pawosm-arm requested review from nikic and paulwalker-arm October 30, 2024 17:16

llvmbot added the llvm:transforms label Oct 30, 2024

pawosm-arm requested a review from yashssh October 30, 2024 17:17

dtcxzyw requested a review from arsenm October 30, 2024 23:58

arsenm added the floating-point Floating-point math label Oct 31, 2024

pawosm-arm force-pushed the nnan branch from 234b4c3 to 218c46d Compare October 31, 2024 11:19

pawosm-arm force-pushed the nnan branch from 218c46d to 7645aaf Compare October 31, 2024 12:26

llvmbot added the clang Clang issues not falling into any other category label Oct 31, 2024

arsenm requested review from andykaylor and jcranmer-intel October 31, 2024 22:20

pawosm-arm requested a review from davemgreen November 1, 2024 10:27

paulwalker-arm approved these changes Nov 1, 2024

View reviewed changes

nikic reviewed Nov 5, 2024

View reviewed changes

pawosm-arm force-pushed the nnan branch from 7645aaf to a426e17 Compare November 5, 2024 15:49

nikic approved these changes Nov 5, 2024

View reviewed changes

llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp Outdated Show resolved Hide resolved

pawosm-arm force-pushed the nnan branch from a426e17 to 45a02d4 Compare November 5, 2024 21:24

pawosm-arm requested review from paulwalker-arm and nikic November 6, 2024 18:14

pawosm-arm changed the title ~~[Transforms][Utils][PromoteMem2Reg] Propagate nnan flag on par with the nsz flag~~ [Transforms][Utils][PromoteMem2Reg] Propagate nnan and ninf flags on par with the nsz flag Nov 7, 2024

paulwalker-arm approved these changes Nov 7, 2024

View reviewed changes

[Transforms][Utils][PromoteMem2Reg] Propagate nnan and ninf flags on par with the nsz flag #114271

Are you sure you want to change the base?

[Transforms][Utils][PromoteMem2Reg] Propagate nnan and ninf flags on par with the nsz flag #114271

Uh oh!

Conversation

pawosm-arm commented Oct 30, 2024 • edited by kiranchandramohan Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Oct 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arsenm commented Oct 31, 2024

Uh oh!

pawosm-arm commented Oct 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paulwalker-arm left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nikic commented Nov 5, 2024

Uh oh!

nikic left a comment

Choose a reason for hiding this comment

Uh oh!

pawosm-arm commented Nov 5, 2024

Uh oh!

pawosm-arm commented Nov 5, 2024

Uh oh!

paulwalker-arm commented Nov 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pawosm-arm commented Nov 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikic commented Nov 5, 2024

Uh oh!

davemgreen commented Nov 5, 2024

Uh oh!

arsenm commented Nov 5, 2024

Uh oh!

nikic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jcranmer-intel commented Nov 5, 2024

Uh oh!

Uh oh!

pawosm-arm commented Oct 30, 2024 •

edited by kiranchandramohan

Loading

llvmbot commented Oct 30, 2024 •

edited

Loading

pawosm-arm commented Oct 31, 2024 •

edited

Loading

paulwalker-arm left a comment •

edited

Loading

paulwalker-arm commented Nov 5, 2024 •

edited

Loading

pawosm-arm commented Nov 5, 2024 •

edited

Loading