-
Notifications
You must be signed in to change notification settings - Fork 14.5k
[AMDGPU] [GlobalIsel] Combine Fmul with Select into ldexp instruction. #120104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMDGPU] [GlobalIsel] Combine Fmul with Select into ldexp instruction. #120104
Conversation
@llvm/pr-subscribers-llvm-globalisel @llvm/pr-subscribers-backend-amdgpu Author: Vikash Gupta (vg0204) ChangesThis combine pattern perform the below transformation. fmul x, select(y, A, B) -> ldexp (x, select i32 (y, a, b)) where, A=2^a & B=2^b ; a and b are integers. It is a follow-up PR to implement the above combine for globalIsel, as the corresponding DAG combine has been done for SelectionDAG Isel (#111109) Full diff: https://github.com/llvm/llvm-project/pull/120104.diff 3 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
index 985fa8f1deff94..c1eea0ad9b7073 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
@@ -124,6 +124,16 @@ def sign_extension_in_reg : GICombineRule<
[{ return matchCombineSignExtendInReg(*${sign_inreg}, ${matchinfo}); }]),
(apply [{ applyCombineSignExtendInReg(*${sign_inreg}, ${matchinfo}); }])>;
+// Do the following combines :
+// fmul x, select(y, A, B) -> ldexp (x, select i32 (y, a, b))
+// fmul x, select(y, -A, -B) -> ldexp ((fneg x), select i32 (y, a, b))
+def combine_fmul_with_select_to_ldexp : GICombineRule<
+ (defs root:$root, build_fn_matchinfo:$matchinfo),
+ (match (G_FMUL $dst, $x, $select):$root,
+ (G_SELECT $select, $y, $A, $B):$sel,
+ [{ return Helper.matchCombineFmulWithSelectToLdexp(*${root}, *${sel}, ${matchinfo}); }]),
+ (apply [{ Helper.applyBuildFn(*${root}, ${matchinfo}); }])>;
+
let Predicates = [Has16BitInsts, NotHasMed3_16] in {
// For gfx8, expand f16-fmed3-as-f32 into a min/max f16 sequence. This
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombinerHelper.cpp b/llvm/lib/Target/AMDGPU/AMDGPUCombinerHelper.cpp
index e5a376ab7357c1..d582ee892a481e 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCombinerHelper.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCombinerHelper.cpp
@@ -445,3 +445,75 @@ void AMDGPUCombinerHelper::applyExpandPromotedF16FMed3(MachineInstr &MI,
Builder.buildFMinNumIEEE(MI.getOperand(0), B1, C1);
MI.eraseFromParent();
}
+
+bool AMDGPUCombinerHelper::matchCombineFmulWithSelectToLdexp(
+ MachineInstr &MI, MachineInstr &Sel,
+ std::function<void(MachineIRBuilder &)> &MatchInfo) {
+ assert(MI.getOpcode() == TargetOpcode::G_FMUL);
+ assert(Sel.getOpcode() == TargetOpcode::G_SELECT);
+
+ Register Dst = MI.getOperand(0).getReg();
+ LLT DestTy = MRI.getType(Dst);
+ LLT ScalarDestTy = DestTy.getScalarType();
+
+ if ((ScalarDestTy == LLT::float64() || ScalarDestTy == LLT::float32() ||
+ ScalarDestTy == LLT::float16()) &&
+ (MRI.hasOneNonDBGUse(Sel.getOperand(0).getReg()))) {
+ Register SelectCond = Sel.getOperand(1).getReg();
+ Register SelectTrue = Sel.getOperand(2).getReg();
+ Register SelectFalse = Sel.getOperand(3).getReg();
+
+ const auto SelectTrueCst =
+ DestTy.isVector()
+ ? getFConstantSplat(SelectTrue, MRI, /* allowUndef */ true)
+ : getFConstantVRegValWithLookThrough(SelectTrue, MRI);
+ if (!SelectTrueCst)
+ return false;
+ const auto SelectFalseCst =
+ DestTy.isVector()
+ ? getFConstantSplat(SelectFalse, MRI, /* allowUndef */ true)
+ : getFConstantVRegValWithLookThrough(SelectFalse, MRI);
+ if (!SelectFalseCst)
+ return false;
+
+ if (SelectTrueCst->Value.isNegative() != SelectFalseCst->Value.isNegative())
+ return false;
+
+ // For f32, only non-inline constants should be transformed.
+ const SIInstrInfo *TII =
+ (MI.getMF()->getSubtarget<GCNSubtarget>()).getInstrInfo();
+ if (ScalarDestTy == LLT::float32() &&
+ TII->isInlineConstant(SelectTrueCst->Value) &&
+ TII->isInlineConstant(SelectFalseCst->Value))
+ return false;
+
+ int SelectTrueVal = SelectTrueCst->Value.getExactLog2Abs();
+ if (SelectTrueVal == INT_MIN)
+ return false;
+ int SelectFalseVal = SelectFalseCst->Value.getExactLog2Abs();
+ if (SelectFalseVal == INT_MIN)
+ return false;
+
+ MatchInfo = [=, &MI](MachineIRBuilder &Builder) {
+ LLT IntDestTy = DestTy.changeElementType(LLT::scalar(32));
+ auto NewSel =
+ Builder.buildSelect(IntDestTy, SelectCond,
+ Builder.buildConstant(IntDestTy, SelectTrueVal),
+ Builder.buildConstant(IntDestTy, SelectFalseVal));
+
+ if (SelectTrueCst->Value.isNegative()) {
+ auto NegX = Builder.buildFNeg(
+ DestTy, MI.getOperand(1).getReg(),
+ MRI.getVRegDef(MI.getOperand(1).getReg())->getFlags());
+ Builder.buildFLdexp(Dst, NegX, NewSel, MI.getFlags());
+ } else {
+ Builder.buildFLdexp(Dst, MI.getOperand(1).getReg(), NewSel,
+ MI.getFlags());
+ }
+ };
+
+ return true;
+ }
+
+ return false;
+}
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombinerHelper.h b/llvm/lib/Target/AMDGPU/AMDGPUCombinerHelper.h
index 6510abe9d23218..df03a9435b3849 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCombinerHelper.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCombinerHelper.h
@@ -30,6 +30,10 @@ class AMDGPUCombinerHelper : public CombinerHelper {
Register Src1, Register Src2);
void applyExpandPromotedF16FMed3(MachineInstr &MI, Register Src0,
Register Src1, Register Src2);
+
+ bool matchCombineFmulWithSelectToLdexp(
+ MachineInstr &MI, MachineInstr &Sel,
+ std::function<void(MachineIRBuilder &)> &MatchInfo);
};
} // namespace llvm
|
TO DO :
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing tests. This can go in the pre and post legalize combiners
Addressed this!!!! |
def combine_fmul_with_select_to_ldexp : GICombineRule< | ||
// fmul x, select(y, A, B) -> fldexp (x, select i32 (y, a, b)) | ||
// fmul x, select(y, -A, -B) -> fldexp ((fneg x), select i32 (y, a, b)) | ||
def combine_fmul_with_select_to_fldexp : GICombineRule< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really should have complex patterns and constant xforms like selection patterns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you kindly elaborate, as dealing with pattern matching for first time in Isel!
; GFX9-GISEL-NEXT: v_cndmask_b32_e64 v0, 0, 1, vcc | ||
; GFX9-GISEL-NEXT: v_add_u32_e32 v0, 6, v0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
globalisel is missing the fold to push binary operands through select?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GlobalIsel has matchFoldBinOpIntoSelect in CombinerHelper which does so maybe, implemented by you only if i am right!
if (!isConstantOrConstantVector(*SelectLHS, MRI, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually as I digged deeper, found that actually that binOp(add) is introduced when select is folded as per the pattern :
select Cond, C1, C1-1 --> add (zext Cond), C1-1
: select Cond, 7, 6 --> add (zext Cond), 6
So, that zext(cond) is really introducing v_cndmask_b32_e64 v0, 0, 1, vcc
, not Select after InstrnSelection.
// select Cond, C1, C1-1 --> add (zext Cond), C1-1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the pattern mentioned in #121145 only occurs when there is a zext of i1? But it would work if we expanded the zext as a select? I think we should just be more aggressively turning extend of boolean into select so we don't need to special case it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This better explains this issue specifically for global Isel scenario : #121145 (comment)
d005d4f
to
d93bded
Compare
Ping! |
3f4a8d5
to
f3fe4f8
Compare
This combine pattern perform the below transformation. fmul x, select(y, A, B) -> ldexp (x, select i32 (y, a, b)) fmul x, select(y, -A, -B) -> ldexp ((fneg x), select i32 (y, a, b)) where, A=2^a & B=2^b ; a and b are integers. It is a follow-up PR to implement the above combine for globalIsel, as it has been done for SelectionDAG Isel (PR-111109)
…PostLegalCombiner.
…to-ldexp to include globalIsel in pipeline as well.
…om Pre/Post legalizeCombiner.
f3fe4f8
to
d024dca
Compare
llvm#120104) This combine pattern perform the below transformation. fmul x, select(y, A, B) -> fldexp (x, select i32 (y, a, b)) fmul x, select(y, -A, -B) -> fldexp ((fneg x), select i32 (y, a, b)) where, A=2^a & B=2^b ; a and b are integers. It is a follow-up PR to implement the above combine for globalIsel, as the corresponding DAG combine has been done for SelectionDAG Isel (llvm#111109)
This combine pattern perform the below transformation.
fmul x, select(y, A, B) -> fldexp (x, select i32 (y, a, b))
fmul x, select(y, -A, -B) -> fldexp ((fneg x), select i32 (y, a, b))
where, A=2^a & B=2^b ; a and b are integers.
It is a follow-up PR to implement the above combine for globalIsel, as the corresponding DAG combine has been done for SelectionDAG Isel (#111109)