[CodeGen] [AMDGPU] Attempt DAGCombine for fmul with select to ldexp #111109

vg0204 · 2024-10-04T07:32:49Z

The materialization cost of 32-bit non-inline in case of fmul is quite relatively more, rather than if possible to combine it into ldexp instruction for specific scenarios (for datatypes like f64, f32 and f16) as this is being handled here :

The dag combine for any pair of select values which are exact exponent of 2.

fmul x, select(y, A, B)       -> ldexp (x, select i32 (y, a, b))
fmul x, select(y, -A, -B)    -> ldexp ((fneg x), select i32 (y, a, b))

where, A=2^a & B=2^b ; a and b are integers.

This dagCombine is handled separately in fmulCombine (newly defined in SIIselLowering), targeting fmul fusing it with select type operand into ldexp.

Thus, it fixes #104900.

llvmbot · 2024-10-04T07:33:20Z

@llvm/pr-subscribers-backend-amdgpu

Author: Vikash Gupta (vg0204)

Changes

For the f32/f16, this combine does no improvements, but for f64 this specific case of fmul with select is more costly to materialize as compared to ldexp, so the following dag combine does the magic.

fmul x, select(y, 2.0, 1.0) -&gt; ldexp x, zext(i1 y) 
fmul x, selcet(y, 0.5, 1.0) -&gt; ldexp x, sext(i1 y)

Thus, it solves the issue #104900.

Full diff: https://github.com/llvm/llvm-project/pull/111109.diff

3 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+55)
(modified) llvm/lib/Target/AMDGPU/SIISelLowering.h (+1)
(added) llvm/test/CodeGen/AMDGPU/combine-fmul-sel.ll (+280)

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 5e4cf705cc9e47..ec707fde24e20c 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -899,6 +899,7 @@ SITargetLowering::SITargetLowering(const TargetMachine &TM,
                        ISD::FADD,
                        ISD::FSUB,
                        ISD::FDIV,
+                       ISD::FMUL,
                        ISD::FMINNUM,
                        ISD::FMAXNUM,
                        ISD::FMINNUM_IEEE,
@@ -14476,6 +14477,58 @@ SDValue SITargetLowering::performFDivCombine(SDNode *N,
   return SDValue();
 }
 
+SDValue SITargetLowering::performFMulCombine(SDNode *N,
+                                             DAGCombinerInfo &DCI) const {
+  SelectionDAG &DAG = DCI.DAG;
+  EVT VT = N->getValueType(0);
+
+  SDLoc SL(N);
+  SDValue LHS = N->getOperand(0);
+  SDValue RHS = N->getOperand(1);
+
+  // ldexp(x, zext(i1 y)) -> fmul x, (select y, 2.0, 1.0)
+  // ldexp(x, sext(i1 y)) -> fmul x, (select y, 0.5, 1.0)
+  //
+  // The above mentioned ldexp folding works fine for
+  // f16/f32, but as for f64 it creates f64 select which
+  // is costly to materealize as compared to f64 ldexp
+  // so here we undo the transform for f64 as follows :
+  //
+  // fmul x, (select y, 2.0, 1.0) -> ldexp(x, zext(i1 y))
+  // fmul x, (select y, 0.5, 1.0) -> ldexp(x, sext(i1 y))
+  // TODO : Need to handle vector of f64 type.
+  if (VT == MVT::f64) {
+    if (RHS.hasOneUse() && RHS.getOpcode() == ISD::SELECT) {
+      const ConstantFPSDNode *TrueNode =
+          dyn_cast<ConstantFPSDNode>(RHS.getOperand(1));
+      const ConstantFPSDNode *FalseNode =
+          dyn_cast<ConstantFPSDNode>(RHS.getOperand(2));
+
+      if (!TrueNode || !FalseNode)
+        return SDValue();
+
+      const double TrueVal = TrueNode->getValueAPF().convertToDouble();
+      const double FalseVal = FalseNode->getValueAPF().convertToDouble();
+      unsigned ExtOp;
+
+      if (FalseVal == 1.0) {
+        if (TrueVal == 2.0)
+          ExtOp = ISD::ZERO_EXTEND;
+        else if (TrueVal == 0.5)
+          ExtOp = ISD::SIGN_EXTEND;
+        else
+          return SDValue();
+
+        SDValue ExtNode =
+            DAG.getNode(ExtOp, SL, MVT::i32, RHS.getOperand(0));
+        return DAG.getNode(ISD::FLDEXP, SL, MVT::f64, LHS, ExtNode);
+      }
+    }
+  }
+
+  return SDValue();
+}
+
 SDValue SITargetLowering::performFMACombine(SDNode *N,
                                             DAGCombinerInfo &DCI) const {
   SelectionDAG &DAG = DCI.DAG;
@@ -14765,6 +14818,8 @@ SDValue SITargetLowering::PerformDAGCombine(SDNode *N,
     return performFSubCombine(N, DCI);
   case ISD::FDIV:
     return performFDivCombine(N, DCI);
+  case ISD::FMUL:
+    return performFMulCombine(N, DCI);
   case ISD::SETCC:
     return performSetCCCombine(N, DCI);
   case ISD::FMAXNUM:
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.h b/llvm/lib/Target/AMDGPU/SIISelLowering.h
index 6c3edf37945e24..1ead2e4fc916bb 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.h
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.h
@@ -218,6 +218,7 @@ class SITargetLowering final : public AMDGPUTargetLowering {
   SDValue performFAddCombine(SDNode *N, DAGCombinerInfo &DCI) const;
   SDValue performFSubCombine(SDNode *N, DAGCombinerInfo &DCI) const;
   SDValue performFDivCombine(SDNode *N, DAGCombinerInfo &DCI) const;
+  SDValue performFMulCombine(SDNode *N, DAGCombinerInfo &DCI) const;
   SDValue performFMACombine(SDNode *N, DAGCombinerInfo &DCI) const;
   SDValue performSetCCCombine(SDNode *N, DAGCombinerInfo &DCI) const;
   SDValue performCvtF32UByteNCombine(SDNode *N, DAGCombinerInfo &DCI) const;
diff --git a/llvm/test/CodeGen/AMDGPU/combine-fmul-sel.ll b/llvm/test/CodeGen/AMDGPU/combine-fmul-sel.ll
new file mode 100644
index 00000000000000..0748aa0a0abec0
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/combine-fmul-sel.ll
@@ -0,0 +1,280 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+;RUN: llc < %s -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs | FileCheck -check-prefix=GFX9 %s
+;RUN: llc < %s -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 -verify-machineinstrs | FileCheck -check-prefix=GFX1030 %s
+;RUN: llc < %s -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1100 -verify-machineinstrs | FileCheck -check-prefix=GFX1100 %s
+
+define float @fmul_select_f32_test1(float %x, i1 %bool) {
+; GFX9-LABEL: fmul_select_f32_test1:
+; GFX9:       ; %bb.0:
+; GFX9-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT:    v_and_b32_e32 v1, 1, v1
+; GFX9-NEXT:    v_cmp_eq_u32_e32 vcc, 1, v1
+; GFX9-NEXT:    v_cndmask_b32_e64 v1, 1.0, 2.0, vcc
+; GFX9-NEXT:    v_mul_f32_e32 v0, v0, v1
+; GFX9-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX1030-LABEL: fmul_select_f32_test1:
+; GFX1030:       ; %bb.0:
+; GFX1030-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX1030-NEXT:    v_and_b32_e32 v1, 1, v1
+; GFX1030-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 1, v1
+; GFX1030-NEXT:    v_cndmask_b32_e64 v1, 1.0, 2.0, vcc_lo
+; GFX1030-NEXT:    v_mul_f32_e32 v0, v0, v1
+; GFX1030-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX1100-LABEL: fmul_select_f32_test1:
+; GFX1100:       ; %bb.0:
+; GFX1100-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX1100-NEXT:    v_and_b32_e32 v1, 1, v1
+; GFX1100-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_1) | instid1(VALU_DEP_1)
+; GFX1100-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 1, v1
+; GFX1100-NEXT:    v_cndmask_b32_e64 v1, 1.0, 2.0, vcc_lo
+; GFX1100-NEXT:    v_mul_f32_e32 v0, v0, v1
+; GFX1100-NEXT:    s_setpc_b64 s[30:31]
+  %1 = select i1 %bool, float 2.000000e+00, float 1.000000e+00
+  %ldexp = fmul float %x, %1
+  ret float %ldexp
+}
+
+define float @fmul_select_f32_test2(float %x, i1 %bool) {
+; GFX9-LABEL: fmul_select_f32_test2:
+; GFX9:       ; %bb.0:
+; GFX9-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT:    v_and_b32_e32 v1, 1, v1
+; GFX9-NEXT:    v_cmp_eq_u32_e32 vcc, 1, v1
+; GFX9-NEXT:    v_cndmask_b32_e64 v1, 1.0, 0.5, vcc
+; GFX9-NEXT:    v_mul_f32_e32 v0, v0, v1
+; GFX9-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX1030-LABEL: fmul_select_f32_test2:
+; GFX1030:       ; %bb.0:
+; GFX1030-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX1030-NEXT:    v_and_b32_e32 v1, 1, v1
+; GFX1030-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 1, v1
+; GFX1030-NEXT:    v_cndmask_b32_e64 v1, 1.0, 0.5, vcc_lo
+; GFX1030-NEXT:    v_mul_f32_e32 v0, v0, v1
+; GFX1030-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX1100-LABEL: fmul_select_f32_test2:
+; GFX1100:       ; %bb.0:
+; GFX1100-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX1100-NEXT:    v_and_b32_e32 v1, 1, v1
+; GFX1100-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_1) | instid1(VALU_DEP_1)
+; GFX1100-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 1, v1
+; GFX1100-NEXT:    v_cndmask_b32_e64 v1, 1.0, 0.5, vcc_lo
+; GFX1100-NEXT:    v_mul_f32_e32 v0, v0, v1
+; GFX1100-NEXT:    s_setpc_b64 s[30:31]
+  %1 = select i1 %bool, float 0.500000e+00, float 1.000000e+00
+  %ldexp = fmul float %x, %1
+  ret float %ldexp
+}
+
+define <2 x float> @fmul_select_v2f32_test1(<2 x float> %x, <2 x i1> %bool) {
+; GFX9-LABEL: fmul_select_v2f32_test1:
+; GFX9:       ; %bb.0:
+; GFX9-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT:    v_and_b32_e32 v3, 1, v3
+; GFX9-NEXT:    v_and_b32_e32 v2, 1, v2
+; GFX9-NEXT:    v_cmp_eq_u32_e32 vcc, 1, v3
+; GFX9-NEXT:    v_cndmask_b32_e64 v3, 1.0, 2.0, vcc
+; GFX9-NEXT:    v_cmp_eq_u32_e32 vcc, 1, v2
+; GFX9-NEXT:    v_cndmask_b32_e64 v2, 1.0, 2.0, vcc
+; GFX9-NEXT:    v_mul_f32_e32 v0, v0, v2
+; GFX9-NEXT:    v_mul_f32_e32 v1, v1, v3
+; GFX9-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX1030-LABEL: fmul_select_v2f32_test1:
+; GFX1030:       ; %bb.0:
+; GFX1030-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX1030-NEXT:    v_and_b32_e32 v2, 1, v2
+; GFX1030-NEXT:    v_and_b32_e32 v3, 1, v3
+; GFX1030-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 1, v2
+; GFX1030-NEXT:    v_cndmask_b32_e64 v2, 1.0, 2.0, vcc_lo
+; GFX1030-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 1, v3
+; GFX1030-NEXT:    v_mul_f32_e32 v0, v0, v2
+; GFX1030-NEXT:    v_cndmask_b32_e64 v3, 1.0, 2.0, vcc_lo
+; GFX1030-NEXT:    v_mul_f32_e32 v1, v1, v3
+; GFX1030-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX1100-LABEL: fmul_select_v2f32_test1:
+; GFX1100:       ; %bb.0:
+; GFX1100-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX1100-NEXT:    v_and_b32_e32 v2, 1, v2
+; GFX1100-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_1) | instid1(VALU_DEP_1)
+; GFX1100-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 1, v2
+; GFX1100-NEXT:    v_cndmask_b32_e64 v2, 1.0, 2.0, vcc_lo
+; GFX1100-NEXT:    v_dual_mul_f32 v0, v0, v2 :: v_dual_and_b32 v3, 1, v3
+; GFX1100-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_1) | instid1(VALU_DEP_1)
+; GFX1100-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 1, v3
+; GFX1100-NEXT:    v_cndmask_b32_e64 v3, 1.0, 2.0, vcc_lo
+; GFX1100-NEXT:    v_mul_f32_e32 v1, v1, v3
+; GFX1100-NEXT:    s_setpc_b64 s[30:31]
+  %1 = select <2 x i1> %bool, <2 x float> <float 2.000000e+00, float 2.000000e+00>, <2 x float> <float 1.000000e+00, float 1.000000e+00>
+  %ldexp = fmul <2 x float> %x, %1
+  ret <2 x float> %ldexp
+}
+
+define <2 x float> @fmul_select_v2f32_test2(<2 x float> %x, <2 x i1> %bool) {
+; GFX9-LABEL: fmul_select_v2f32_test2:
+; GFX9:       ; %bb.0:
+; GFX9-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT:    v_and_b32_e32 v3, 1, v3
+; GFX9-NEXT:    v_and_b32_e32 v2, 1, v2
+; GFX9-NEXT:    v_cmp_eq_u32_e32 vcc, 1, v3
+; GFX9-NEXT:    v_cndmask_b32_e64 v3, 1.0, 0.5, vcc
+; GFX9-NEXT:    v_cmp_eq_u32_e32 vcc, 1, v2
+; GFX9-NEXT:    v_cndmask_b32_e64 v2, 1.0, 0.5, vcc
+; GFX9-NEXT:    v_mul_f32_e32 v0, v0, v2
+; GFX9-NEXT:    v_mul_f32_e32 v1, v1, v3
+; GFX9-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX1030-LABEL: fmul_select_v2f32_test2:
+; GFX1030:       ; %bb.0:
+; GFX1030-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX1030-NEXT:    v_and_b32_e32 v2, 1, v2
+; GFX1030-NEXT:    v_and_b32_e32 v3, 1, v3
+; GFX1030-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 1, v2
+; GFX1030-NEXT:    v_cndmask_b32_e64 v2, 1.0, 0.5, vcc_lo
+; GFX1030-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 1, v3
+; GFX1030-NEXT:    v_mul_f32_e32 v0, v0, v2
+; GFX1030-NEXT:    v_cndmask_b32_e64 v3, 1.0, 0.5, vcc_lo
+; GFX1030-NEXT:    v_mul_f32_e32 v1, v1, v3
+; GFX1030-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX1100-LABEL: fmul_select_v2f32_test2:
+; GFX1100:       ; %bb.0:
+; GFX1100-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX1100-NEXT:    v_and_b32_e32 v2, 1, v2
+; GFX1100-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_1) | instid1(VALU_DEP_1)
+; GFX1100-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 1, v2
+; GFX1100-NEXT:    v_cndmask_b32_e64 v2, 1.0, 0.5, vcc_lo
+; GFX1100-NEXT:    v_dual_mul_f32 v0, v0, v2 :: v_dual_and_b32 v3, 1, v3
+; GFX1100-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_1) | instid1(VALU_DEP_1)
+; GFX1100-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 1, v3
+; GFX1100-NEXT:    v_cndmask_b32_e64 v3, 1.0, 0.5, vcc_lo
+; GFX1100-NEXT:    v_mul_f32_e32 v1, v1, v3
+; GFX1100-NEXT:    s_setpc_b64 s[30:31]
+  %1 = select <2 x i1> %bool, <2 x float> <float 0.500000e+00, float 0.500000e+00>, <2 x float> <float 1.000000e+00, float 1.000000e+00>
+  %ldexp = fmul <2 x float> %x, %1
+  ret <2 x float> %ldexp
+}
+
+define double @fmul_select_f64_test1(double %x, i1 %bool) {
+; GFX9-LABEL: fmul_select_f64_test1:
+; GFX9:       ; %bb.0:
+; GFX9-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT:    v_and_b32_e32 v2, 1, v2
+; GFX9-NEXT:    v_ldexp_f64 v[0:1], v[0:1], v2
+; GFX9-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX1030-LABEL: fmul_select_f64_test1:
+; GFX1030:       ; %bb.0:
+; GFX1030-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX1030-NEXT:    v_and_b32_e32 v2, 1, v2
+; GFX1030-NEXT:    v_ldexp_f64 v[0:1], v[0:1], v2
+; GFX1030-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX1100-LABEL: fmul_select_f64_test1:
+; GFX1100:       ; %bb.0:
+; GFX1100-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX1100-NEXT:    v_and_b32_e32 v2, 1, v2
+; GFX1100-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1100-NEXT:    v_ldexp_f64 v[0:1], v[0:1], v2
+; GFX1100-NEXT:    s_setpc_b64 s[30:31]
+  %1 = select i1 %bool, double 2.000000e+00, double 1.000000e+00
+  %ldexp = fmul double %x, %1
+  ret double %ldexp
+}
+
+define double @fmul_select_f64_test2(double %x, i1 %bool) {
+; GFX9-LABEL: fmul_select_f64_test2:
+; GFX9:       ; %bb.0:
+; GFX9-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT:    v_bfe_i32 v2, v2, 0, 1
+; GFX9-NEXT:    v_ldexp_f64 v[0:1], v[0:1], v2
+; GFX9-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX1030-LABEL: fmul_select_f64_test2:
+; GFX1030:       ; %bb.0:
+; GFX1030-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX1030-NEXT:    v_bfe_i32 v2, v2, 0, 1
+; GFX1030-NEXT:    v_ldexp_f64 v[0:1], v[0:1], v2
+; GFX1030-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX1100-LABEL: fmul_select_f64_test2:
+; GFX1100:       ; %bb.0:
+; GFX1100-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX1100-NEXT:    v_bfe_i32 v2, v2, 0, 1
+; GFX1100-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1100-NEXT:    v_ldexp_f64 v[0:1], v[0:1], v2
+; GFX1100-NEXT:    s_setpc_b64 s[30:31]
+  %1 = select i1 %bool, double 0.500000e+00, double 1.000000e+00
+  %ldexp = fmul double %x, %1
+  ret double %ldexp
+}
+
+define <2 x double> @fmul_select_v2f64_test1(<2 x double> %x, <2 x i1> %bool) {
+; GFX9-LABEL: fmul_select_v2f64_test1:
+; GFX9:       ; %bb.0:
+; GFX9-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT:    v_and_b32_e32 v4, 1, v4
+; GFX9-NEXT:    v_ldexp_f64 v[0:1], v[0:1], v4
+; GFX9-NEXT:    v_and_b32_e32 v4, 1, v5
+; GFX9-NEXT:    v_ldexp_f64 v[2:3], v[2:3], v4
+; GFX9-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX1030-LABEL: fmul_select_v2f64_test1:
+; GFX1030:       ; %bb.0:
+; GFX1030-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX1030-NEXT:    v_and_b32_e32 v4, 1, v4
+; GFX1030-NEXT:    v_and_b32_e32 v5, 1, v5
+; GFX1030-NEXT:    v_ldexp_f64 v[0:1], v[0:1], v4
+; GFX1030-NEXT:    v_ldexp_f64 v[2:3], v[2:3], v5
+; GFX1030-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX1100-LABEL: fmul_select_v2f64_test1:
+; GFX1100:       ; %bb.0:
+; GFX1100-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX1100-NEXT:    v_and_b32_e32 v4, 1, v4
+; GFX1100-NEXT:    v_and_b32_e32 v5, 1, v5
+; GFX1100-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_2)
+; GFX1100-NEXT:    v_ldexp_f64 v[0:1], v[0:1], v4
+; GFX1100-NEXT:    v_ldexp_f64 v[2:3], v[2:3], v5
+; GFX1100-NEXT:    s_setpc_b64 s[30:31]
+  %1 = select <2 x i1> %bool, <2 x double> <double 2.000000e+00, double 2.000000e+00>, <2 x double> <double 1.000000e+00, double 1.000000e+00>
+  %ldexp = fmul <2 x double> %x, %1
+  ret <2 x double> %ldexp
+}
+
+define <2 x double> @fmul_select_v2f64_test2(<2 x double> %x, <2 x i1> %bool) {
+; GFX9-LABEL: fmul_select_v2f64_test2:
+; GFX9:       ; %bb.0:
+; GFX9-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT:    v_bfe_i32 v4, v4, 0, 1
+; GFX9-NEXT:    v_ldexp_f64 v[0:1], v[0:1], v4
+; GFX9-NEXT:    v_bfe_i32 v4, v5, 0, 1
+; GFX9-NEXT:    v_ldexp_f64 v[2:3], v[2:3], v4
+; GFX9-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX1030-LABEL: fmul_select_v2f64_test2:
+; GFX1030:       ; %bb.0:
+; GFX1030-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX1030-NEXT:    v_bfe_i32 v4, v4, 0, 1
+; GFX1030-NEXT:    v_bfe_i32 v5, v5, 0, 1
+; GFX1030-NEXT:    v_ldexp_f64 v[0:1], v[0:1], v4
+; GFX1030-NEXT:    v_ldexp_f64 v[2:3], v[2:3], v5
+; GFX1030-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX1100-LABEL: fmul_select_v2f64_test2:
+; GFX1100:       ; %bb.0:
+; GFX1100-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX1100-NEXT:    v_bfe_i32 v4, v4, 0, 1
+; GFX1100-NEXT:    v_bfe_i32 v5, v5, 0, 1
+; GFX1100-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_2)
+; GFX1100-NEXT:    v_ldexp_f64 v[0:1], v[0:1], v4
+; GFX1100-NEXT:    v_ldexp_f64 v[2:3], v[2:3], v5
+; GFX1100-NEXT:    s_setpc_b64 s[30:31]
+  %1 = select <2 x i1> %bool, <2 x double> <double 0.500000e+00, double 0.500000e+00>, <2 x double> <double 1.000000e+00, double 1.000000e+00>
+  %ldexp = fmul <2 x double> %x, %1
+  ret <2 x double> %ldexp
+}

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

arsenm · 2024-10-07T13:16:18Z

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

+          isConstOrConstSplatFP(RHS.getOperand(2));
+      bool isNeg;
+
+      if (!TrueNode || !FalseNode)


Early exit after first one fails before checking the second. Also check RHS first

Didn't do the early exit

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

arsenm · 2024-10-07T13:21:25Z

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

+        else
+          return SDValue();
+
+        SDValue ExtNode = DAG.getNode(ExtOp, SL, MVT::i32, RHS.getOperand(0));


You can generalize this to select of integer, and handle any pair of powers of 2

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

arsenm · 2024-10-08T10:20:34Z

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

+
+      if (TrueNode->isNegative() != FalseNode->isNegative())
+        return SDValue();
+      bool isNeg = TrueNode->isNegative();


Don't need isNeg variable

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

arsenm · 2024-10-09T19:20:31Z

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

      if (FalseNode->isExactlyValue(1.0) || FalseNode->isExactlyValue(-1.0)) {
-        if (TrueNode->isExactlyValue(2.0) || TrueNode->isExactlyValue(-2.0))
-          ExtOp = ISD::ZERO_EXTEND;
-        else if (TrueNode->isExactlyValue(0.5) ||
-                 TrueNode->isExactlyValue(-0.5))
-          ExtOp = ISD::SIGN_EXTEND;
-        else
-          return SDValue();
+        if (TrueNode->isExactlyValue(2.0) || TrueNode->isExactlyValue(-2.0)) {
+          SDValue ZExtNode =
+              DAG.getNode(ISD::ZERO_EXTEND, SL, i32VT, RHS.getOperand(0));
+          return DAG.getNode(ISD::FLDEXP, SL, VT, LHS, ZExtNode);
+        } else if (TrueNode->isExactlyValue(0.5) ||
+                   TrueNode->isExactlyValue(-0.5)) {
+          SDValue SExtNode =
+              DAG.getNode(ISD::SIGN_EXTEND, SL, i32VT, RHS.getOperand(0));
+          return DAG.getNode(ISD::FLDEXP, SL, VT, LHS, SExtNode);
+        }
+      }


You don't need the special case for the extensions up here. We select the extensions into a select anyway

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

arsenm · 2024-10-15T07:44:46Z

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

+          isConstOrConstSplatFP(RHS.getOperand(2));
+      bool isNeg;
+
+      if (!TrueNode || !FalseNode)


Didn't do the early exit

arsenm · 2024-10-15T07:45:04Z

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

+  // Note : It takes care of generic scenario which covers undoing
+  // of special case(zext/sext) as mentioned.


Suggested change

// Note : It takes care of generic scenario which covers undoing

// of special case(zext/sext) as mentioned.

arsenm · 2024-10-15T07:45:55Z

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

+  // The above mentioned ldexp folding works fine for
+  // bf16/f32, but as for f64 it creates f64 select which
+  // is costly to materialize as compared to f64 ldexp
+  // so here we undo the transform for f64 datatype.
+  // Also in case of f16, its cheaper to materialize inline
+  // 32 bit-constant (via ldexp use) rather than using fmul.


Comment is verbose. Just state that it's cheaper to use i32 inline constants than to materialize f16 or f64 values

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

arsenm · 2024-10-15T07:47:15Z

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

+  // fmul x, (select y, -A, -B)   -> ldexp( (fneg x), (select i32 y, a, b) )
+  // Note : It takes care of generic scenario which covers undoing
+  // of special case(zext/sext) as mentioned.
+  if (VT.getScalarType() == MVT::f64 || VT.getScalarType() == MVT::f16) {


For the non-inline constant cases, we should do this for f32 too.

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

jayfoad · 2024-10-15T10:15:01Z

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

+  // ldexp(x, zext(i1 y)) -> fmul x, (select y, 2.0, 1.0)
+  // ldexp(x, sext(i1 y)) -> fmul x, (select y, 0.5, 1.0)


Explain where these transformations are done. Is it another DAG combine somewhere? If so, what stops your DAG combine from fighting with the other one?

jayfoad · 2024-10-15T10:18:44Z

Thus, it solves the issue #104900.

If you write this as just "Fixes #104900" then github will recognize it and close the issue for you when you merge this patch: https://docs.github.com/en/issues/tracking-your-work-with-issues/using-issues/linking-a-pull-request-to-an-issue

jayfoad · 2024-10-16T09:40:32Z

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

+                                             DAGCombinerInfo &DCI) const {
+  SelectionDAG &DAG = DCI.DAG;
+  EVT VT = N->getValueType(0);
+  EVT intVT = VT.changeElementType(MVT::i32);


Repeating myself, variable names should start with a capital letter.

arsenm · 2024-10-16T11:27:14Z

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

+      if (VT.getScalarType() == MVT::f32)
+        if (TII->isInlineConstant(TrueNode->getValueAPF()) &&
+            TII->isInlineConstant(FalseNode->getValueAPF()))


Can merge into one if. Also sink this below the isNegative early exit below

arsenm · 2024-10-16T11:27:41Z

Missing test updates?

vg0204 · 2024-10-21T08:56:40Z

Missing test updates?

Addressed!

arsenm · 2024-10-21T22:25:40Z

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

+      const ConstantFPSDNode *TrueNode =
+          isConstOrConstSplatFP(RHS.getOperand(1));
+      const ConstantFPSDNode *FalseNode =
+          isConstOrConstSplatFP(RHS.getOperand(2));
+
+      bool AreNodesFP = TrueNode && FalseNode;
+      if (!AreNodesFP)
+        return SDValue();


Suggested change

const ConstantFPSDNode *TrueNode =

isConstOrConstSplatFP(RHS.getOperand(1));

const ConstantFPSDNode *FalseNode =

isConstOrConstSplatFP(RHS.getOperand(2));

bool AreNodesFP = TrueNode && FalseNode;

if (!AreNodesFP)

return SDValue();

const ConstantFPSDNode *FalseNode =

isConstOrConstSplatFP(RHS.getOperand(2));

if (!FalseNode)

return SDValue();

const ConstantFPSDNode *TrueNode =

isConstOrConstSplatFP(RHS.getOperand(1));

if (!TrueNode)

return SDValue();

arsenm · 2024-10-21T22:26:48Z

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

+      int FalseNodeExpVal = FalseNode->getValueAPF().getExactLog2Abs();
+      if (TrueNodeExpVal != INT_MIN && FalseNodeExpVal != INT_MIN) {


Early exit after one fails

arsenm · 2024-10-21T22:27:33Z

llvm/test/CodeGen/AMDGPU/dagcombine-fmul-sel.ll

+;RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 < %s | FileCheck -check-prefix=GFX1030 %s
+;RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1100 < %s | FileCheck -check-prefix=GFX1100 %s
+
+define float @fmul_select_f32_test1(float %x, i32 %bool.arg1, i32 %bool.arg2) {


Can you rebase this to be on top of the test add

vg0204 · 2024-10-28T07:16:34Z

Kind Ping!

arsenm

Would be easier to see diff on top of the baseline tests

arsenm · 2024-10-31T22:00:58Z

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

+      const ConstantFPSDNode *TrueNode =
+          isConstOrConstSplatFP(RHS.getOperand(1));
+      const ConstantFPSDNode *FalseNode =
+          isConstOrConstSplatFP(RHS.getOperand(2));
+
+      bool AreNodesFP = TrueNode && FalseNode;
+      if (!AreNodesFP)
+        return SDValue();


arsenm · 2024-10-31T22:01:08Z

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

+  if (VT.getScalarType() == MVT::f64 || VT.getScalarType() == MVT::f32 ||
+      VT.getScalarType() == MVT::f16) {


Factor out repeated getScalarType call

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

arsenm · 2024-11-05T19:52:03Z

llvm/test/CodeGen/AMDGPU/dagcombine-fmul-sel.ll

@@ -0,0 +1,2616 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+;RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx700 < %s | FileCheck -check-prefix=GFX7 %s


Should rebase on top of the test base. Patch lgtm with nit but I can't easily check the effect in the output here

This adds the f32/f64/f16/bf16 test cases for below pattern : `fmul x, select(y, A, B)` with just one use of select Inst above. It acts as pre-commit tests for dagCombining above pattern into cheaper ldexp in case of non-inlline 32 bit-constants. (#111109)

For the f32/f16, this combine does no improvements, but for f64 this specific case of fmul with select is more costly to materialize as compared to ldexp, so the following dag combine does the magic. fmul x, select(y, 2.0, 1.0) -> ldexp x, zext(i1 y) fmul x, selcet(y, 0.5, 1.0) -> ldexp x, sext(i1 y)

vg0204 · 2024-11-26T11:57:05Z

I updated with new test cases as well as the old ones that needs changes, but for some reason, the only below testcase

llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll

It is giving invalid bitcast error "from i16 to i32" while trying to update with update_llc_tests_check.py script. But, bizarelly when running the command sequence of the test in terminal its working fine, generating assembly as supposed to be with dagcombine happening. I am perplexed!!

arsenm · 2024-11-27T20:26:54Z

I updated with new test cases as well as the old ones that needs changes, but for some reason, the only below testcase

llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll

It is giving invalid bitcast error "from i16 to i32" while trying to update with update_llc_tests_check.py script. But, bizarelly when running the command sequence of the test in terminal its working fine, generating assembly as supposed to be with dagcombine happening. I am perplexed!!

I don't know how you could hit this bitcast issue, but should fix however it is getting it.

This test is odd in that it has an llc line with an opt invocation, which update_llc_test_checks does not understand. In any case, there should be no path to hitting an invalid bitcast

vg0204 · 2024-11-28T05:42:28Z

@arsenm, Still figuring out; they used opt's generated output file o be fed to llc using pipe command

Just to point out that when executing the commmands directly mentioned in lit test, its working fine but getting messed up via updation using llc_check_tests script. Could you comment on that !!

vg0204 · 2024-11-28T11:36:08Z

@arsenm, as you were saying there is no issue with bitcast, but owing to some bizarre docker issue, that came. On baremetal machine, as you suspected opt entir line(a preprocess command invoked by lllc-update python script) is creating problem, as it is entirely executed as it is on terminal, thus unable to locate opt binary, as no stripping has been done for it like llc tool! Tested this by hardcoding the right opt path

vg0204 · 2024-11-28T16:41:41Z

@arsenm, as you were saying there is no issue with bitcast, but owing to some bizarre docker issue, that came. On baremetal machine, as you suspected opt entir line(a preprocess command invoked by lllc-update python script) is creating problem, as it is entirely executed as it is on terminal, thus unable to locate opt binary path, as in case of llc tool!

I handled it, by letting hardcoding the opt path, followed by updating it via script & reverting the opt path back to original as was in test. But, in future it would be a problem if someone else tries to run update-llc script again & need to again do the same hack as I did now!

…reating ldexp dagnode.

arsenm

LGTM. Please implement the same for globalisel in a follow up

arsenm · 2024-12-03T00:19:27Z

llvm/test/CodeGen/AMDGPU/fmul-to-ldexp.ll

+; GFX9-GISEL-NEXT:    v_mov_b32_e32 v2, 0x42800000
+; GFX9-GISEL-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
+; GFX9-GISEL-NEXT:    v_cndmask_b32_e32 v0, 1.0, v2, vcc
+; GFX9-GISEL-NEXT:    v_mul_f32_e32 v0, v1, v0


TODO: Implement this for globalisel

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

#120104) This combine pattern perform the below transformation. fmul x, select(y, A, B) -> fldexp (x, select i32 (y, a, b)) fmul x, select(y, -A, -B) -> fldexp ((fneg x), select i32 (y, a, b)) where, A=2^a & B=2^b ; a and b are integers. It is a follow-up PR to implement the above combine for globalIsel, as the corresponding DAG combine has been done for SelectionDAG Isel (#111109)

llvm#120104) This combine pattern perform the below transformation. fmul x, select(y, A, B) -> fldexp (x, select i32 (y, a, b)) fmul x, select(y, -A, -B) -> fldexp ((fneg x), select i32 (y, a, b)) where, A=2^a & B=2^b ; a and b are integers. It is a follow-up PR to implement the above combine for globalIsel, as the corresponding DAG combine has been done for SelectionDAG Isel (llvm#111109)

This adds the f32/f64/f16/bf16 test cases for below pattern : `fmul x, select(y, A, B)` with just one use of select Inst above. It acts as pre-commit tests for dagCombining above pattern into cheaper ldexp in case of non-inlline 32 bit-constants. (llvm#111109) Change-Id: Ia6a3bb41b25ca8fb3d3f5bc67c183c168d8f4ba8

…lvm#111109) The materialization cost of 32-bit non-inline in case of fmul is quite relatively more, rather than if possible to combine it into ldexp instruction for specific scenarios (for datatypes like f64, f32 and f16) as this is being handled here : The dag combine for any pair of select values which are exact exponent of 2. ``` fmul x, select(y, A, B) -> ldexp (x, select i32 (y, a, b)) fmul x, select(y, -A, -B) -> ldexp ((fneg x), select i32 (y, a, b)) where, A=2^a & B=2^b ; a and b are integers. ``` This dagCombine is handled separately in fmulCombine (newly defined in SIIselLowering), targeting fmul fusing it with select type operand into ldexp. Thus, it fixes llvm#104900. Change-Id: I1e76049d8de218329efac4c62ee3c52cd824258c

llvmbot added the backend:AMDGPU label Oct 4, 2024

vg0204 force-pushed the vg0204/fmul-dag-combine-to-ldexp branch from 86a3023 to 4b0885d Compare October 4, 2024 07:34

vg0204 mentioned this pull request Oct 4, 2024

[AMDGPU] Adds pre-commit test for fmul-select combine #111107

Merged

vg0204 self-assigned this Oct 4, 2024

vg0204 requested a review from arsenm October 4, 2024 07:36

vg0204 added the llvm:codegen label Oct 4, 2024

arsenm reviewed Oct 4, 2024

View reviewed changes

llvm/lib/Target/AMDGPU/SIISelLowering.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/AMDGPU/SIISelLowering.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/AMDGPU/SIISelLowering.cpp Outdated Show resolved Hide resolved

arsenm reviewed Oct 7, 2024

View reviewed changes

arsenm reviewed Oct 8, 2024

View reviewed changes

arsenm reviewed Oct 9, 2024

View reviewed changes

vg0204 force-pushed the vg0204/fmul-dag-combine-to-ldexp branch from 6906e39 to de6e306 Compare October 15, 2024 07:07

arsenm reviewed Oct 15, 2024

View reviewed changes

jayfoad reviewed Oct 15, 2024

View reviewed changes

jayfoad reviewed Oct 16, 2024

View reviewed changes

arsenm reviewed Oct 16, 2024

View reviewed changes

vg0204 force-pushed the vg0204/fmul-dag-combine-to-ldexp branch from e2fd76a to 4b9d3a0 Compare October 21, 2024 06:37

arsenm reviewed Oct 21, 2024

View reviewed changes

arsenm reviewed Oct 31, 2024

View reviewed changes

vg0204 force-pushed the vg0204/fmul-dag-combine-to-ldexp branch from c27b98e to 832eb0f Compare November 4, 2024 08:18

arsenm reviewed Nov 5, 2024

View reviewed changes

vg0204 force-pushed the vg0204/fmul-dag-combine-to-ldexp branch 2 times, most recently from 75176be to b250aba Compare November 26, 2024 07:38

vg0204 force-pushed the vg0204/fmul-dag-combine-to-ldexp branch from b250aba to b1fdd89 Compare November 26, 2024 07:39

Added new and updated testcases reflecting the dagCombine effect.

be435f8

vg0204 force-pushed the vg0204/fmul-dag-combine-to-ldexp branch from 1af8afc to be435f8 Compare November 28, 2024 16:37

Rectified minor issue with right handlling of integer exponent when c…

c92a283

…reating ldexp dagnode.

vg0204 requested review from jayfoad and arsenm November 29, 2024 09:23

arsenm approved these changes Dec 3, 2024

View reviewed changes

jayfoad reviewed Dec 3, 2024

View reviewed changes

Addressed the reviewed suggested changes by Jayfoad.

d9dc4c4

vg0204 merged commit 0b0d9a3 into llvm:main Dec 9, 2024
5 of 8 checks passed

vg0204 mentioned this pull request Dec 16, 2024

[AMDGPU] [GlobalIsel] Combine Fmul with Select into ldexp instruction. #120104

Merged

		// Note : It takes care of generic scenario which covers undoing
		// of special case(zext/sext) as mentioned.

		// ldexp(x, zext(i1 y)) -> fmul x, (select y, 2.0, 1.0)
		// ldexp(x, sext(i1 y)) -> fmul x, (select y, 0.5, 1.0)

		int FalseNodeExpVal = FalseNode->getValueAPF().getExactLog2Abs();
		if (TrueNodeExpVal != INT_MIN && FalseNodeExpVal != INT_MIN) {

		if (VT.getScalarType() == MVT::f64 \|\| VT.getScalarType() == MVT::f32 \|\|
		VT.getScalarType() == MVT::f16) {

		@@ -0,0 +1,2616 @@
		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
		;RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx700 < %s \| FileCheck -check-prefix=GFX7 %s

[CodeGen] [AMDGPU] Attempt DAGCombine for fmul with select to ldexp #111109

[CodeGen] [AMDGPU] Attempt DAGCombine for fmul with select to ldexp #111109

Uh oh!

Conversation

vg0204 commented Oct 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Oct 4, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jayfoad commented Oct 15, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arsenm commented Oct 16, 2024

Uh oh!

vg0204 commented Oct 21, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vg0204 commented Oct 28, 2024

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vg0204 commented Nov 26, 2024

Uh oh!

arsenm commented Nov 27, 2024

Uh oh!

vg0204 commented Nov 28, 2024

Uh oh!

vg0204 commented Nov 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

vg0204 commented Oct 4, 2024 •

edited

Loading

vg0204 commented Nov 28, 2024 •

edited

Loading