Skip to content

[AMDGPU] [GlobalIsel] Combine Fmul with Select into ldexp instruction. #120104

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

vg0204
Copy link
Contributor

@vg0204 vg0204 commented Dec 16, 2024

This combine pattern perform the below transformation.

fmul x, select(y, A, B) -> fldexp (x, select i32 (y, a, b))
fmul x, select(y, -A, -B) -> fldexp ((fneg x), select i32 (y, a, b))

where, A=2^a & B=2^b ; a and b are integers.

It is a follow-up PR to implement the above combine for globalIsel, as the corresponding DAG combine has been done for SelectionDAG Isel (#111109)

@llvmbot
Copy link
Member

llvmbot commented Dec 16, 2024

@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-amdgpu

Author: Vikash Gupta (vg0204)

Changes

This combine pattern perform the below transformation.

fmul x, select(y, A, B) -> ldexp (x, select i32 (y, a, b))
fmul x, select(y, -A, -B) -> ldexp ((fneg x), select i32 (y, a, b))

where, A=2^a & B=2^b ; a and b are integers.

It is a follow-up PR to implement the above combine for globalIsel, as the corresponding DAG combine has been done for SelectionDAG Isel (#111109)


Full diff: https://github.com/llvm/llvm-project/pull/120104.diff

3 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPUCombine.td (+10)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUCombinerHelper.cpp (+72)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUCombinerHelper.h (+4)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
index 985fa8f1deff94..c1eea0ad9b7073 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
@@ -124,6 +124,16 @@ def sign_extension_in_reg : GICombineRule<
          [{ return matchCombineSignExtendInReg(*${sign_inreg}, ${matchinfo}); }]),
   (apply [{ applyCombineSignExtendInReg(*${sign_inreg}, ${matchinfo}); }])>;
 
+// Do the following combines :
+// fmul x, select(y, A, B)   -> ldexp (x, select i32 (y, a, b))
+// fmul x, select(y, -A, -B) -> ldexp ((fneg x), select i32 (y, a, b))
+def combine_fmul_with_select_to_ldexp : GICombineRule<
+  (defs root:$root, build_fn_matchinfo:$matchinfo),
+  (match  (G_FMUL $dst, $x, $select):$root,
+          (G_SELECT $select, $y, $A, $B):$sel,
+          [{ return Helper.matchCombineFmulWithSelectToLdexp(*${root}, *${sel}, ${matchinfo}); }]),
+  (apply [{ Helper.applyBuildFn(*${root}, ${matchinfo}); }])>;
+
 
 let Predicates = [Has16BitInsts, NotHasMed3_16] in {
 // For gfx8, expand f16-fmed3-as-f32 into a min/max f16 sequence. This
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombinerHelper.cpp b/llvm/lib/Target/AMDGPU/AMDGPUCombinerHelper.cpp
index e5a376ab7357c1..d582ee892a481e 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCombinerHelper.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCombinerHelper.cpp
@@ -445,3 +445,75 @@ void AMDGPUCombinerHelper::applyExpandPromotedF16FMed3(MachineInstr &MI,
   Builder.buildFMinNumIEEE(MI.getOperand(0), B1, C1);
   MI.eraseFromParent();
 }
+
+bool AMDGPUCombinerHelper::matchCombineFmulWithSelectToLdexp(
+    MachineInstr &MI, MachineInstr &Sel,
+    std::function<void(MachineIRBuilder &)> &MatchInfo) {
+  assert(MI.getOpcode() == TargetOpcode::G_FMUL);
+  assert(Sel.getOpcode() == TargetOpcode::G_SELECT);
+
+  Register Dst = MI.getOperand(0).getReg();
+  LLT DestTy = MRI.getType(Dst);
+  LLT ScalarDestTy = DestTy.getScalarType();
+
+  if ((ScalarDestTy == LLT::float64() || ScalarDestTy == LLT::float32() ||
+       ScalarDestTy == LLT::float16()) &&
+      (MRI.hasOneNonDBGUse(Sel.getOperand(0).getReg()))) {
+    Register SelectCond = Sel.getOperand(1).getReg();
+    Register SelectTrue = Sel.getOperand(2).getReg();
+    Register SelectFalse = Sel.getOperand(3).getReg();
+
+    const auto SelectTrueCst =
+        DestTy.isVector()
+            ? getFConstantSplat(SelectTrue, MRI, /* allowUndef */ true)
+            : getFConstantVRegValWithLookThrough(SelectTrue, MRI);
+    if (!SelectTrueCst)
+      return false;
+    const auto SelectFalseCst =
+        DestTy.isVector()
+            ? getFConstantSplat(SelectFalse, MRI, /* allowUndef */ true)
+            : getFConstantVRegValWithLookThrough(SelectFalse, MRI);
+    if (!SelectFalseCst)
+      return false;
+
+    if (SelectTrueCst->Value.isNegative() != SelectFalseCst->Value.isNegative())
+      return false;
+
+    // For f32, only non-inline constants should be transformed.
+    const SIInstrInfo *TII =
+        (MI.getMF()->getSubtarget<GCNSubtarget>()).getInstrInfo();
+    if (ScalarDestTy == LLT::float32() &&
+        TII->isInlineConstant(SelectTrueCst->Value) &&
+        TII->isInlineConstant(SelectFalseCst->Value))
+      return false;
+
+    int SelectTrueVal = SelectTrueCst->Value.getExactLog2Abs();
+    if (SelectTrueVal == INT_MIN)
+      return false;
+    int SelectFalseVal = SelectFalseCst->Value.getExactLog2Abs();
+    if (SelectFalseVal == INT_MIN)
+      return false;
+
+    MatchInfo = [=, &MI](MachineIRBuilder &Builder) {
+      LLT IntDestTy = DestTy.changeElementType(LLT::scalar(32));
+      auto NewSel =
+          Builder.buildSelect(IntDestTy, SelectCond,
+                              Builder.buildConstant(IntDestTy, SelectTrueVal),
+                              Builder.buildConstant(IntDestTy, SelectFalseVal));
+
+      if (SelectTrueCst->Value.isNegative()) {
+        auto NegX = Builder.buildFNeg(
+            DestTy, MI.getOperand(1).getReg(),
+            MRI.getVRegDef(MI.getOperand(1).getReg())->getFlags());
+        Builder.buildFLdexp(Dst, NegX, NewSel, MI.getFlags());
+      } else {
+        Builder.buildFLdexp(Dst, MI.getOperand(1).getReg(), NewSel,
+                            MI.getFlags());
+      }
+    };
+
+    return true;
+  }
+
+  return false;
+}
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombinerHelper.h b/llvm/lib/Target/AMDGPU/AMDGPUCombinerHelper.h
index 6510abe9d23218..df03a9435b3849 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCombinerHelper.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCombinerHelper.h
@@ -30,6 +30,10 @@ class AMDGPUCombinerHelper : public CombinerHelper {
                                    Register Src1, Register Src2);
   void applyExpandPromotedF16FMed3(MachineInstr &MI, Register Src0,
                                    Register Src1, Register Src2);
+
+  bool matchCombineFmulWithSelectToLdexp(
+      MachineInstr &MI, MachineInstr &Sel,
+      std::function<void(MachineIRBuilder &)> &MatchInfo);
 };
 
 } // namespace llvm

@vg0204
Copy link
Contributor Author

vg0204 commented Dec 16, 2024

TO DO :

  1. Identify the most ideal place to invoke this combine (pre or post legalization or BOTH).
  2. Update the test cases following it.

@vg0204 vg0204 requested review from arsenm and kmitropoulou and removed request for arsenm December 16, 2024 16:20
@vg0204 vg0204 self-assigned this Dec 16, 2024
@vg0204 vg0204 requested a review from jayfoad December 16, 2024 16:22
@vg0204 vg0204 changed the title [AMDGPU] [GlobalIsel] Combine Fmul with Select into ldexp instruction. [WIP] [AMDGPU] [GlobalIsel] Combine Fmul with Select into ldexp instruction. Dec 16, 2024
Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing tests. This can go in the pre and post legalize combiners

@vg0204
Copy link
Contributor Author

vg0204 commented Dec 18, 2024

TO DO :

  1. Identify the most ideal place to invoke this combine (pre or post legalization or BOTH).
  2. Update the test cases following it.

Addressed this!!!!

@vg0204 vg0204 requested a review from arsenm December 20, 2024 06:33
def combine_fmul_with_select_to_ldexp : GICombineRule<
// fmul x, select(y, A, B) -> fldexp (x, select i32 (y, a, b))
// fmul x, select(y, -A, -B) -> fldexp ((fneg x), select i32 (y, a, b))
def combine_fmul_with_select_to_fldexp : GICombineRule<
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really should have complex patterns and constant xforms like selection patterns

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you kindly elaborate, as dealing with pattern matching for first time in Isel!

Comment on lines +3410 to +3411
; GFX9-GISEL-NEXT: v_cndmask_b32_e64 v0, 0, 1, vcc
; GFX9-GISEL-NEXT: v_add_u32_e32 v0, 6, v0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

globalisel is missing the fold to push binary operands through select?

Copy link
Contributor Author

@vg0204 vg0204 Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GlobalIsel has matchFoldBinOpIntoSelect in CombinerHelper which does so maybe, implemented by you only if i am right!

if (!isConstantOrConstantVector(*SelectLHS, MRI,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually as I digged deeper, found that actually that binOp(add) is introduced when select is folded as per the pattern :

select Cond, C1, C1-1 --> add (zext Cond), C1-1 : select Cond, 7, 6 --> add (zext Cond), 6

So, that zext(cond) is really introducing v_cndmask_b32_e64 v0, 0, 1, vcc, not Select after InstrnSelection.

// select Cond, C1, C1-1 --> add (zext Cond), C1-1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arsenm , its this issue #121145

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the pattern mentioned in #121145 only occurs when there is a zext of i1? But it would work if we expanded the zext as a select? I think we should just be more aggressively turning extend of boolean into select so we don't need to special case it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This better explains this issue specifically for global Isel scenario : #121145 (comment)

@vg0204 vg0204 requested review from cdevadas and aemerson December 26, 2024 06:15
vg0204 added a commit that referenced this pull request Dec 26, 2024
@vg0204 vg0204 force-pushed the vg0204/combine-fmul-with-select-gloabalIsel branch from d005d4f to d93bded Compare December 27, 2024 08:37
@vg0204
Copy link
Contributor Author

vg0204 commented Dec 30, 2024

Ping!

@vg0204 vg0204 changed the title [WIP] [AMDGPU] [GlobalIsel] Combine Fmul with Select into ldexp instruction. [AMDGPU] [GlobalIsel] Combine Fmul with Select into ldexp instruction. Dec 31, 2024
@vg0204 vg0204 requested a review from arsenm January 3, 2025 07:40
@vg0204 vg0204 force-pushed the vg0204/combine-fmul-with-select-gloabalIsel branch from 3f4a8d5 to f3fe4f8 Compare January 6, 2025 06:31
vg0204 added 7 commits January 6, 2025 12:11
This combine pattern perform the below transformation.

fmul x, select(y, A, B)      -> ldexp (x, select i32 (y, a, b))
fmul x, select(y, -A, -B)    -> ldexp ((fneg x), select i32 (y, a, b))

where, A=2^a & B=2^b ; a and b are integers.

It is a follow-up PR to implement the above combine for globalIsel,
as it has been done for SelectionDAG Isel (PR-111109)
…to-ldexp to include globalIsel in pipeline as well.
@vg0204 vg0204 force-pushed the vg0204/combine-fmul-with-select-gloabalIsel branch from f3fe4f8 to d024dca Compare January 6, 2025 12:12
@vg0204 vg0204 merged commit fd6f8b3 into llvm:main Jan 6, 2025
5 of 7 checks passed
paulhuggett pushed a commit to paulhuggett/llvm-project that referenced this pull request Jan 7, 2025
llvm#120104)

This combine pattern perform the below transformation.

fmul x, select(y, A, B)      -> fldexp (x, select i32 (y, a, b))
fmul x, select(y, -A, -B)   -> fldexp ((fneg x), select i32 (y, a, b))

where, A=2^a & B=2^b ; a and b are integers.

It is a follow-up PR to implement the above combine for globalIsel, as
the corresponding DAG combine has been done for SelectionDAG Isel
(llvm#111109)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants