[TTI][RISCV]Improve costs for whole vector reg extract/insert. #80164

alexey-bataev · 2024-01-31T16:47:51Z

If we can detect, that whole register extract/insert is requested,
consider it free.

Created using spr 1.3.5

llvmbot · 2024-01-31T16:48:18Z

@llvm/pr-subscribers-backend-risc-v

@llvm/pr-subscribers-llvm-analysis

Author: Alexey Bataev (alexey-bataev)

Changes

If we can detect, that whole register extract/insert is requested, it
emits VMV_V_V instruction or just a vsetvli.

Patch is 81.61 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/80164.diff

4 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp (+42)
(modified) llvm/test/Analysis/CostModel/RISCV/shuffle-extract_subvector.ll (+87-87)
(modified) llvm/test/Analysis/CostModel/RISCV/shuffle-insert_subvector.ll (+21-21)
(modified) llvm/test/Analysis/CostModel/RISCV/shuffle-interleave.ll (+3-3)

diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index fe1cdb2dfa423..465a05b6497a2 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -326,6 +326,48 @@ InstructionCost RISCVTTIImpl::getShuffleCost(TTI::ShuffleKind Kind,
     switch (Kind) {
     default:
       break;
+    case TTI::SK_ExtractSubvector:
+      if (isa<FixedVectorType>(SubTp)) {
+        unsigned TpRegs = getRegUsageForType(Tp);
+        unsigned NumElems =
+            divideCeil(Tp->getElementCount().getFixedValue(), TpRegs);
+        // Whole vector extract - just the vector itself + (possible) vsetvli.
+        // TODO: consider adding the cost for vsetvli.
+        if (Index % NumElems == 0) {
+          std::pair<InstructionCost, MVT> SubLT =
+              getTypeLegalizationCost(SubTp);
+          return Index == 0
+                     ? TTI::TCC_Free
+                     : SubLT.first * getRISCVInstructionCost(RISCV::VMV_V_V,
+                                                             SubLT.second,
+                                                             CostKind);
+        }
+      }
+      break;
+    case TTI::SK_InsertSubvector:
+      if (auto *FSubTy = dyn_cast<FixedVectorType>(SubTp)) {
+        unsigned TpRegs = getRegUsageForType(Tp);
+        unsigned SubTpRegs = getRegUsageForType(SubTp);
+        unsigned NextSubTpRegs = getRegUsageForType(FixedVectorType::get(
+            Tp->getElementType(), FSubTy->getNumElements() + 1));
+        unsigned NumElems =
+            divideCeil(Tp->getElementCount().getFixedValue(), TpRegs);
+        // Whole vector insert - just the vector itself + (possible) vsetvli.
+        // TODO: consider adding the cost for vsetvli.
+        if (Index % NumElems == 0 &&
+            (any_of(Args, UndefValue::classof) ||
+             (SubTpRegs != 0 && SubTpRegs != NextSubTpRegs &&
+              TpRegs / SubTpRegs > 1))) {
+          std::pair<InstructionCost, MVT> SubLT =
+              getTypeLegalizationCost(SubTp);
+          return Index == 0
+                     ? TTI::TCC_Free
+                     : SubLT.first * getRISCVInstructionCost(RISCV::VMV_V_V,
+                                                             SubLT.second,
+                                                             CostKind);
+        }
+      }
+      break;
     case TTI::SK_PermuteSingleSrc: {
       if (Mask.size() >= 2 && LT.second.isFixedLengthVector()) {
         MVT EltTp = LT.second.getVectorElementType();
diff --git a/llvm/test/Analysis/CostModel/RISCV/shuffle-extract_subvector.ll b/llvm/test/Analysis/CostModel/RISCV/shuffle-extract_subvector.ll
index 76cb1955a2b37..901d66e1124d8 100644
--- a/llvm/test/Analysis/CostModel/RISCV/shuffle-extract_subvector.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/shuffle-extract_subvector.ll
@@ -9,15 +9,15 @@
 
 define void @test_vXf64(<4 x double> %src256, <8 x double> %src512) {
 ; CHECK-LABEL: 'test_vXf64'
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V256_01 = shufflevector <4 x double> %src256, <4 x double> undef, <2 x i32> <i32 0, i32 1>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V256_23 = shufflevector <4 x double> %src256, <4 x double> undef, <2 x i32> <i32 2, i32 3>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V512_01 = shufflevector <8 x double> %src512, <8 x double> undef, <2 x i32> <i32 0, i32 1>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V512_23 = shufflevector <8 x double> %src512, <8 x double> undef, <2 x i32> <i32 2, i32 3>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V512_45 = shufflevector <8 x double> %src512, <8 x double> undef, <2 x i32> <i32 4, i32 5>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V512_67 = shufflevector <8 x double> %src512, <8 x double> undef, <2 x i32> <i32 6, i32 7>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V512_0123 = shufflevector <8 x double> %src512, <8 x double> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V512_2345 = shufflevector <8 x double> %src512, <8 x double> undef, <4 x i32> <i32 2, i32 3, i32 4, i32 5>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V512_4567 = shufflevector <8 x double> %src512, <8 x double> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %V256_01 = shufflevector <4 x double> %src256, <4 x double> undef, <2 x i32> <i32 0, i32 1>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V256_23 = shufflevector <4 x double> %src256, <4 x double> undef, <2 x i32> <i32 2, i32 3>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %V512_01 = shufflevector <8 x double> %src512, <8 x double> undef, <2 x i32> <i32 0, i32 1>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V512_23 = shufflevector <8 x double> %src512, <8 x double> undef, <2 x i32> <i32 2, i32 3>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V512_45 = shufflevector <8 x double> %src512, <8 x double> undef, <2 x i32> <i32 4, i32 5>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V512_67 = shufflevector <8 x double> %src512, <8 x double> undef, <2 x i32> <i32 6, i32 7>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %V512_0123 = shufflevector <8 x double> %src512, <8 x double> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V512_2345 = shufflevector <8 x double> %src512, <8 x double> undef, <4 x i32> <i32 2, i32 3, i32 4, i32 5>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V512_4567 = shufflevector <8 x double> %src512, <8 x double> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of -1 for instruction: %V512_567u = shufflevector <8 x double> %src512, <8 x double> undef, <4 x i32> <i32 5, i32 6, i32 7, i32 poison>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
 ;
@@ -36,15 +36,15 @@ define void @test_vXf64(<4 x double> %src256, <8 x double> %src512) {
 
 define void @test_vXi64(<4 x i64> %src256, <8 x i64> %src512) {
 ; CHECK-LABEL: 'test_vXi64'
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V256_01 = shufflevector <4 x i64> %src256, <4 x i64> undef, <2 x i32> <i32 0, i32 1>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V256_23 = shufflevector <4 x i64> %src256, <4 x i64> undef, <2 x i32> <i32 2, i32 3>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V512_01 = shufflevector <8 x i64> %src512, <8 x i64> undef, <2 x i32> <i32 0, i32 1>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V512_23 = shufflevector <8 x i64> %src512, <8 x i64> undef, <2 x i32> <i32 2, i32 3>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V512_45 = shufflevector <8 x i64> %src512, <8 x i64> undef, <2 x i32> <i32 4, i32 5>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V512_67 = shufflevector <8 x i64> %src512, <8 x i64> undef, <2 x i32> <i32 6, i32 7>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V512_0123 = shufflevector <8 x i64> %src512, <8 x i64> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V512_2345 = shufflevector <8 x i64> %src512, <8 x i64> undef, <4 x i32> <i32 2, i32 3, i32 4, i32 5>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V512_4567 = shufflevector <8 x i64> %src512, <8 x i64> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %V256_01 = shufflevector <4 x i64> %src256, <4 x i64> undef, <2 x i32> <i32 0, i32 1>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V256_23 = shufflevector <4 x i64> %src256, <4 x i64> undef, <2 x i32> <i32 2, i32 3>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %V512_01 = shufflevector <8 x i64> %src512, <8 x i64> undef, <2 x i32> <i32 0, i32 1>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V512_23 = shufflevector <8 x i64> %src512, <8 x i64> undef, <2 x i32> <i32 2, i32 3>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V512_45 = shufflevector <8 x i64> %src512, <8 x i64> undef, <2 x i32> <i32 4, i32 5>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V512_67 = shufflevector <8 x i64> %src512, <8 x i64> undef, <2 x i32> <i32 6, i32 7>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %V512_0123 = shufflevector <8 x i64> %src512, <8 x i64> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V512_2345 = shufflevector <8 x i64> %src512, <8 x i64> undef, <4 x i32> <i32 2, i32 3, i32 4, i32 5>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V512_4567 = shufflevector <8 x i64> %src512, <8 x i64> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
 ;
   %V256_01 = shufflevector <4 x i64> %src256, <4 x i64> undef, <2 x i32> <i32 0, i32 1>
@@ -61,28 +61,28 @@ define void @test_vXi64(<4 x i64> %src256, <8 x i64> %src512) {
 
 define void @test_vXi32(<4 x i32> %src128, <8 x i32> %src256, <16 x i32> %src512) {
 ; CHECK-LABEL: 'test_vXi32'
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V128_01 = shufflevector <4 x i32> %src128, <4 x i32> undef, <2 x i32> <i32 0, i32 1>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %V128_01 = shufflevector <4 x i32> %src128, <4 x i32> undef, <2 x i32> <i32 0, i32 1>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V128_23 = shufflevector <4 x i32> %src128, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V256_01 = shufflevector <8 x i32> %src256, <8 x i32> undef, <2 x i32> <i32 0, i32 1>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %V256_01 = shufflevector <8 x i32> %src256, <8 x i32> undef, <2 x i32> <i32 0, i32 1>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V256_23 = shufflevector <8 x i32> %src256, <8 x i32> undef, <2 x i32> <i32 2, i32 3>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V256_45 = shufflevector <8 x i32> %src256, <8 x i32> undef, <2 x i32> <i32 4, i32 5>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V256_45 = shufflevector <8 x i32> %src256, <8 x i32> undef, <2 x i32> <i32 4, i32 5>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V256_67 = shufflevector <8 x i32> %src256, <8 x i32> undef, <2 x i32> <i32 6, i32 7>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V256_0123 = shufflevector <8 x i32> %src256, <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V256_4567 = shufflevector <8 x i32> %src256, <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V512_01 = shufflevector <16 x i32> %src512, <16 x i32> undef, <2 x i32> <i32 0, i32 1>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %V256_0123 = shufflevector <8 x i32> %src256, <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V256_4567 = shufflevector <8 x i32> %src256, <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %V512_01 = shufflevector <16 x i32> %src512, <16 x i32> undef, <2 x i32> <i32 0, i32 1>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V512_23 = shufflevector <16 x i32> %src512, <16 x i32> undef, <2 x i32> <i32 2, i32 3>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V512_45 = shufflevector <16 x i32> %src512, <16 x i32> undef, <2 x i32> <i32 4, i32 5>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V512_45 = shufflevector <16 x i32> %src512, <16 x i32> undef, <2 x i32> <i32 4, i32 5>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V512_67 = shufflevector <16 x i32> %src512, <16 x i32> undef, <2 x i32> <i32 6, i32 7>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V512_89 = shufflevector <16 x i32> %src512, <16 x i32> undef, <2 x i32> <i32 8, i32 9>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V512_89 = shufflevector <16 x i32> %src512, <16 x i32> undef, <2 x i32> <i32 8, i32 9>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V512_AB = shufflevector <16 x i32> %src512, <16 x i32> undef, <2 x i32> <i32 10, i32 11>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V512_CD = shufflevector <16 x i32> %src512, <16 x i32> undef, <2 x i32> <i32 12, i32 13>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V512_CD = shufflevector <16 x i32> %src512, <16 x i32> undef, <2 x i32> <i32 12, i32 13>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V512_EF = shufflevector <16 x i32> %src512, <16 x i32> undef, <2 x i32> <i32 14, i32 15>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V512_0123 = shufflevector <16 x i32> %src512, <16 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V512_4567 = shufflevector <16 x i32> %src512, <16 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V512_89AB = shufflevector <16 x i32> %src512, <16 x i32> undef, <4 x i32> <i32 8, i32 9, i32 10, i32 11>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V512_CDEF = shufflevector <16 x i32> %src512, <16 x i32> undef, <4 x i32> <i32 12, i32 13, i32 14, i32 15>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V512_01234567 = shufflevector <16 x i32> %src512, <16 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V512_89ABCDEF = shufflevector <16 x i32> %src512, <16 x i32> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %V512_0123 = shufflevector <16 x i32> %src512, <16 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V512_4567 = shufflevector <16 x i32> %src512, <16 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V512_89AB = shufflevector <16 x i32> %src512, <16 x i32> undef, <4 x i32> <i32 8, i32 9, i32 10, i32 11>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V512_CDEF = shufflevector <16 x i32> %src512, <16 x i32> undef, <4 x i32> <i32 12, i32 13, i32 14, i32 15>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %V512_01234567 = shufflevector <16 x i32> %src512, <16 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V512_89ABCDEF = shufflevector <16 x i32> %src512, <16 x i32> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
 ;
   %V128_01 = shufflevector <4 x i32> %src128, <4 x i32> undef, <2 x i32> <i32 0, i32 1>
@@ -112,62 +112,62 @@ define void @test_vXi32(<4 x i32> %src128, <8 x i32> %src256, <16 x i32> %src512
 
 define void @test_vXi16(<4 x i16> %src64, <8 x i16> %src128, <16 x i16> %src256, <32 x i16> %src512) {
 ; CHECK-LABEL: 'test_vXi16'
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V64_01 = shufflevector <4 x i16> %src64, <4 x i16> undef, <2 x i32> <i32 0, i32 1>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %V64_01 = shufflevector <4 x i16> %src64, <4 x i16> undef, <2 x i32> <i32 0, i32 1>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V64_23 = shufflevector <4 x i16> %src64, <4 x i16> undef, <2 x i32> <i32 2, i32 3>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V128_01 = shufflevector <8 x i16> %src128, <8 x i16> undef, <2 x i32> <i32 0, i32 1>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %V128_01 = shufflevector <8 x i16> %src128, <8 x i16> undef, <2 x i32> <i32 0, i32 1>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V128_23 = shufflevector <8 x i16> %src128, <8 x i16> undef, <2 x i32> <i32 2, i32 3>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V128_45 = shufflevector <8 x i16> %src128, <8 x i16> undef, <2 x i32> <i32 4, i32 5>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V128_67 = shufflevector <8 x i16> %src128, <8 x i16> undef, <2 x i32> <i32 6, i32 7>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V128_0123 = shufflevector <8 x i16> %src128, <8 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %V128_0123 = shufflevector <8 x i16> %src128, <8 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V128_4567 = shufflevector <8 x i16> %src128, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V256_01 = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 0, i32 1>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %V256_01 = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 0, i32 1>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V256_23 = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 2, i32 3>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V256_45 = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 4, i32 5>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V256_67 = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 6, i32 7>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V256_89 = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 8, i32 9>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V256_89 = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 8, i32 9>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V256_AB = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 10, i32 11>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V256_CD = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 12, i32 13>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V256_EF = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 14, i32 15>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V256_0123 = shufflevector <16 x i16> %src256, <16 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %V256_0123 = shufflevector <16 x i16> %src256, <16 x i16> u...
[truncated]

topperc · 2024-01-31T16:57:28Z

Don't we need to know the exact vlen to know where register boundaries are?

alexey-bataev · 2024-01-31T16:59:08Z

Don't we need to know the exact vlen to know where register boundaries are?

I use getRegUsageForType() to get this info.

topperc · 2024-01-31T17:03:42Z

Don't we need to know the exact vlen to know where register boundaries are?

I use getRegUsageForType() to get this info.

That tells the maximum number of registers needed for the type assuming a minimum VLEN. If hardware VLEN is more than the minimum VLEN, we still use the extra registers but the elements in them are not used since they would be past VL. CodeGen has to use a slidedown unless we also know the maximum VLEN is the same as the minimum VLEN.

alexey-bataev · 2024-01-31T17:08:18Z

Don't we need to know the exact vlen to know where register boundaries are?

I use getRegUsageForType() to get this info.

That tells the maximum number of registers needed for the type assuming a minimum VLEN. If hardware VLEN is more than the minimum VLEN, we still use the extra registers but the elements in them are not used since they would be past VL. CodeGen has to use a slidedown unless we also know the maximum VLEN is the same as the minimum VLEN.

Do we have anything in TTI that returns correct VLEN?

topperc · 2024-01-31T17:53:21Z

Don't we need to know the exact vlen to know where register boundaries are?

I use getRegUsageForType() to get this info.

That tells the maximum number of registers needed for the type assuming a minimum VLEN. If hardware VLEN is more than the minimum VLEN, we still use the extra registers but the elements in them are not used since they would be past VL. CodeGen has to use a slidedown unless we also know the maximum VLEN is the same as the minimum VLEN.

Do we have anything in TTI that returns correct VLEN?

You can check that ST->getRealMaxVlen() == ST->getRealMinVLen()

alexey-bataev · 2024-01-31T17:57:21Z

Don't we need to know the exact vlen to know where register boundaries are?

I use getRegUsageForType() to get this info.

That tells the maximum number of registers needed for the type assuming a minimum VLEN. If hardware VLEN is more than the minimum VLEN, we still use the extra registers but the elements in them are not used since they would be past VL. CodeGen has to use a slidedown unless we also know the maximum VLEN is the same as the minimum VLEN.

Do we have anything in TTI that returns correct VLEN?

You can check that ST->getRealMaxVlen() == ST->getRealMinVLen()

Will add

Created using spr 1.3.5

preames · 2024-02-01T19:12:36Z

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

+            divideCeil(Tp->getElementCount().getFixedValue(), TpRegs);
+        // Whole vector extract - just the vector itself + (possible) vsetvli.
+        // TODO: consider adding the cost for vsetvli.
+        if (Index == 0 || (ST->getRealMaxVLen() == ST->getRealMinVLen() &&


I think this check would be more clearly expressed as an and of the following clauses
a) ST->getRealMaxVLen() == ST->getRealMinVLen()
b) NumElems * ElementSizeInBits == VLEN
c) Index % NumElems == 0

Note that this only supports m1 full extracts. But starting there and extending it to m2, and m4 later seems entirely reasonable.

preames · 2024-02-01T19:13:14Z

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

+              getTypeLegalizationCost(SubTp);
+          return Index == 0
+                     ? TTI::TCC_Free
+                     : SubLT.first * getRISCVInstructionCost(RISCV::VMV_V_V,


For a full VREG case, you never need the VMV_V_V. You only need the VMV_V_V if NumElems < VLMAX.

Extending this to sub-register extract with exact VLEN known would be reasonable, but let's do that in a separate patch.

preames · 2024-02-01T19:15:08Z

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

+      break;
+    case TTI::SK_InsertSubvector:
+      if (auto *FSubTy = dyn_cast<FixedVectorType>(SubTp)) {
+        unsigned TpRegs = getRegUsageForType(Tp);


Same basic style comments as above.

Created using spr 1.3.5

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

Created using spr 1.3.5

lukel97 · 2024-02-12T02:22:33Z

llvm/test/Analysis/CostModel/RISCV/shuffle-extract_subvector.ll

+; RTBASE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %V128_01 = shufflevector <4 x i32> %src128, <4 x i32> undef, <2 x i32> <i32 0, i32 1>
+; RTBASE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V128_23 = shufflevector <4 x i32> %src128, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
+; RTBASE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %V256_01 = shufflevector <8 x i32> %src256, <8 x i32> undef, <2 x i32> <i32 0, i32 1>
+; RTBASE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V256_23 = shufflevector <8 x i32> %src256, <8 x i32> undef, <2 x i32> <i32 2, i32 3>


Just making a note, this is an LMUL 1 vslidedown:

vsetivli zero, 2, e32, m1, ta, ma vslidedown.vi v8, v8, 2

So I think the cost should just be one here. This looks like it's coming from the scalable vector cost path. Is the type being passed in a <8 x i32> instead of a <2 x i32>? Something that could be looked at in a later patch.

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

preames

I'd encourage you to split off a change to handle only the Index == 0 case. It should be simple, but that's the value. :)

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

Created using spr 1.3.5

preames · 2024-02-14T15:57:52Z

I'd encourage you to split off a change to handle only the Index == 0 case. It should be simple, but that's the value. :)

I posted this here: #81751

alexey-bataev · 2024-02-14T16:55:13Z

The latest version of the patch actually handles only index 0

lukel97 · 2024-02-15T02:12:20Z

#81751 handles both fixed and scalable vectors from the looks of things. I wonder if it's possible to have this patch handle the whole reg extract/insert case for scalable vectors too, if we move the logic into the scalable part below the fixed vector switch? Any scalable vector extract/insert should be free if both the vector and subvector are >= LMUL 1.

alexey-bataev · 2024-02-15T14:27:20Z

#81751 handles both fixed and scalable vectors from the looks of things. I wonder if it's possible to have this patch handle the whole reg extract/insert case for scalable vectors too, if we move the logic into the scalable part below the fixed vector switch? Any scalable vector extract/insert should be free if both the vector and subvector are >= LMUL 1.

Not sure that insert subvector should be free. It can be free, if either the second vector is undef or inserting the whole vector. LMUL >=1 not enough for the second case, also need to check that the whole vector is insert, not, say, half of it.

lukel97 · 2024-02-15T15:23:20Z

LMUL >=1 not enough for the second case, also need to check that the whole vector is insert, not, say, half of it.

But for llvm.vector.insert there is the constraint that all the subvec elements must be within bounds of the vector, and for scalable vectors the index at which it is inserted is scaled by vscale.

So if the subvector is LMUL >=1 it shouldn't be possible for only half of it be inserted, since it won't be truncated and the index will be a multiple of an LMUL1 register boundary.

This is separate from the exact VLEN fixed vector case in the original version of this PR though, we can leave it for a future patch.

Created using spr 1.3.5

lukel97 · 2024-02-16T18:40:09Z

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

+        unsigned NextSubTpRegs = getRegUsageForType(FixedVectorType::get(
+            Tp->getElementType(), FSubTy->getNumElements() + 1));
+        // Whole vector insert - just the vector itself.
+        if (Index == 0 && SubTpRegs != 0 && SubTpRegs != NextSubTpRegs &&


Just want to check, is SubTpRegs != NextSubTpRegs to check that SubTp isn't a fractional LMUL?

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

Created using spr 1.3.5

lukel97 · 2024-02-19T05:42:35Z

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

@@ -442,6 +454,9 @@ InstructionCost RISCVTTIImpl::getShuffleCost(TTI::ShuffleKind Kind,
    return LT.first *
           getRISCVInstructionCost(RISCV::VSLIDEDOWN_VI, LT.second, CostKind);
  case TTI::SK_InsertSubvector:
+    if (Index == 0 && any_of(Args, UndefValue::classof))


Do we need to check that Args isn't empty?

Added, thanks!

Created using spr 1.3.5

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

Created using spr 1.3.5

lukel97

Does the PR title need updated to reflect that this handles insert_subvectors only now

lukel97 · 2024-02-21T03:13:27Z

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

@@ -326,6 +326,18 @@ InstructionCost RISCVTTIImpl::getShuffleCost(TTI::ShuffleKind Kind,
    switch (Kind) {
    default:
      break;
+    case TTI::SK_InsertSubvector: {
+      auto *FSubTy = dyn_cast<FixedVectorType>(SubTp);


Should we use cast instead of dyn_cast here to get the assertion?

Created using spr 1.3.5

…ct vlen (#82405) If we have exact vlen knowledge, we can figure out which indices correspond to register boundaries. Our lowering uses this knowledge to replace the vslidedown.vi with a sub-register extract. Our costs can reflect that as well. This is another piece split off #80164 --------- Co-authored-by: Luke Lau <[email protected]>

preames · 2024-02-21T16:03:00Z

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

@@ -442,6 +454,9 @@ InstructionCost RISCVTTIImpl::getShuffleCost(TTI::ShuffleKind Kind,
    return LT.first *
           getRISCVInstructionCost(RISCV::VSLIDEDOWN_VI, LT.second, CostKind);
  case TTI::SK_InsertSubvector:
+    if (Index == 0 && !Args.empty() && any_of(Args, UndefValue::classof))


I tried to split off this piece - or more accurately something vaguely related - and stumbled into something interesting.

The InsertSubvector w/Index=0 is unreachable from everywhere except SLP. TTI::getInstructionCost contains a check for the identity shuffle and always returns 0. improveShuffleKindFromMask will recognize the insert into passthru case as a select (correctly), and thus it doesn't hit this case either. Put together, this means that the index=0 case never makes it from the backend, and thus we have no test coverage via cost model tests.

SLP hits a slightly different codepath here and directly calls getShuffleCost with a possible identity mask. It still can't hit the select case, but it can hit the insert into poison case. SLP appears to have a bunch of guards for this already in various cases.

I'm not really a fan of having untestable logic here. Anyone have any ideas how we can rework this API to ensure SLP can't reach a case which is untestable via costmodel tests?

Do not tests for llvm.vector.insert intrinsics check this?

Created using spr 1.3.5

alexey-bataev · 2024-02-28T18:48:08Z

Ping!

topperc · 2024-02-28T18:58:48Z

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

+      unsigned NextSubTpRegs = getRegUsageForType(FixedVectorType::get(
+          Tp->getElementType(), FSubTy->getNumElements() + 1));
+      // Whole vector insert - just the vector itself.
+      if (Index == 0 && SubTpRegs != 0 && SubTpRegs != NextSubTpRegs &&


I don't think this works. getRegUsageForType is returning the maximum number of registers needed given a minimum VLEN. If the runtime VLEN is larger the used number of registers could be less.

The backend must always use a vslideup or vmv.v.v for fixed vector insert unless we know both the maximum and minimum VLEN are the same. I think you have to check ST.getRealVLen().

Created using spr 1.3.5

alexey-bataev · 2024-03-07T15:00:55Z

Ping!

lukel97

I think the PR title still needs the reg extract part removed

lukel97 · 2024-03-08T10:55:07Z

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

+    const unsigned MinVLen = ST->getRealMinVLen();
+    const unsigned MaxVLen = ST->getRealMaxVLen();


You can use ST->getRealVLen() which was added recently

lukel97 · 2024-03-08T10:58:22Z

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

+      unsigned TpRegs = getRegUsageForType(Tp);
+      unsigned SubTpRegs = getRegUsageForType(SubTp);
+      unsigned NextSubTpRegs = getRegUsageForType(FixedVectorType::get(
+          Tp->getElementType(), FSubTy->getNumElements() + 1));
+      if (SubTpRegs != 0 && SubTpRegs != NextSubTpRegs && TpRegs >= SubTpRegs)


Is it possible for TpRegs < SubTpRegs?

…t vlen If we have exact vlen knowledge, we can figure out which indices correspond to register boundaries. Our lowering will use this knowledge to replace the vslideup.vi with a sub-register insert when the subvec passthru is undef. One case where the subvec passthru is known undef is when the subvec completely fills the subregister, and that's the easiest case to recognize during costing. Note: This is cost modeling a lowering which hasn't landed yet, see llvm#84107. This change will not land until after that one does. This is another piece split off llvm#80164

preames

I have posted an alternative patch for the remaining insertsubvector case here: #85240

When writing this, I discovered that this patch models a lowering which is not implemented. There is a patch on review, but it hasn't landed yet.

[𝘀𝗽𝗿] initial version

cfd0dcf

Created using spr 1.3.5

alexey-bataev requested review from preames and lukel97 January 31, 2024 16:48

llvmbot added backend:RISC-V llvm:analysis Includes value tracking, cost tables and constant folding labels Jan 31, 2024

Rebase, address comments

1689f41

Created using spr 1.3.5

preames reviewed Feb 1, 2024

View reviewed changes

alexey-bataev added 2 commits February 1, 2024 21:04

Address comments

e909200

Created using spr 1.3.5

Rebase

2dd7afa

Created using spr 1.3.5

lukel97 reviewed Feb 5, 2024

View reviewed changes

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp Outdated Show resolved Hide resolved

Rebase, Address comments

382d5b3

Created using spr 1.3.5

lukel97 reviewed Feb 12, 2024

View reviewed changes

preames reviewed Feb 12, 2024

View reviewed changes

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp Outdated Show resolved Hide resolved

Rebase, Address comments

de3be41

Created using spr 1.3.5

Rebase, add scalable vectors

51da92f

Created using spr 1.3.5

alexey-bataev changed the title ~~[TTI][RISCV]Improve costs for fixed vector whole reg extract/insert.~~ [TTI][RISCV]Improve costs for whole vector reg extract/insert. Feb 15, 2024

lukel97 reviewed Feb 16, 2024

View reviewed changes

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp Outdated Show resolved Hide resolved

Rebase, address comments

0ac7359

Created using spr 1.3.5

lukel97 reviewed Feb 19, 2024

View reviewed changes

Rebase, address comments

7afc02a

Created using spr 1.3.5

lukel97 reviewed Feb 20, 2024

View reviewed changes

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp Outdated Show resolved Hide resolved

Rebase, address comments

93c0d1c

Created using spr 1.3.5

preames mentioned this pull request Feb 20, 2024

[RISCV][TTI] Cost a subvector extract at a register boundary with exact vlen #82405

Merged

lukel97 reviewed Feb 21, 2024

View reviewed changes

Rebase, address comments

f17263c

Created using spr 1.3.5

preames reviewed Feb 21, 2024

View reviewed changes

alexey-bataev added 2 commits February 22, 2024 21:37

Rebase

07713b8

Created using spr 1.3.5

Rebase

4647802

Created using spr 1.3.5

topperc reviewed Feb 28, 2024

View reviewed changes

Rebase, Address comments

a0d744b

Created using spr 1.3.5

lukel97 reviewed Mar 8, 2024

View reviewed changes

preames mentioned this pull request Mar 14, 2024

[RISCV][TTI] Cost a subvector insert at a register boundary with exact vlen #85240

Closed

preames requested changes Mar 14, 2024

View reviewed changes

alexey-bataev closed this Mar 19, 2024

		const unsigned MinVLen = ST->getRealMinVLen();
		const unsigned MaxVLen = ST->getRealMaxVLen();

[TTI][RISCV]Improve costs for whole vector reg extract/insert. #80164

[TTI][RISCV]Improve costs for whole vector reg extract/insert. #80164

Uh oh!

Conversation

alexey-bataev commented Jan 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jan 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

topperc commented Jan 31, 2024

Uh oh!

alexey-bataev commented Jan 31, 2024

Uh oh!

topperc commented Jan 31, 2024

Uh oh!

alexey-bataev commented Jan 31, 2024

Uh oh!

topperc commented Jan 31, 2024

Uh oh!

alexey-bataev commented Jan 31, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

preames left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

preames commented Feb 14, 2024

Uh oh!

alexey-bataev commented Feb 14, 2024

Uh oh!

lukel97 commented Feb 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexey-bataev commented Feb 15, 2024

Uh oh!

lukel97 commented Feb 15, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lukel97 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexey-bataev commented Feb 28, 2024

Uh oh!

topperc Feb 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexey-bataev commented Mar 7, 2024

Uh oh!

lukel97 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexey-bataev commented Jan 31, 2024 •

edited

Loading

llvmbot commented Jan 31, 2024 •

edited

Loading

lukel97 commented Feb 15, 2024 •

edited

Loading

topperc Feb 28, 2024 •

edited

Loading