-
Notifications
You must be signed in to change notification settings - Fork 14.5k
[TTI][RISCV]Improve costs for whole vector reg extract/insert. #80164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TTI][RISCV]Improve costs for whole vector reg extract/insert. #80164
Conversation
Created using spr 1.3.5
@llvm/pr-subscribers-backend-risc-v @llvm/pr-subscribers-llvm-analysis Author: Alexey Bataev (alexey-bataev) ChangesIf we can detect, that whole register extract/insert is requested, it Patch is 81.61 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/80164.diff 4 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index fe1cdb2dfa423..465a05b6497a2 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -326,6 +326,48 @@ InstructionCost RISCVTTIImpl::getShuffleCost(TTI::ShuffleKind Kind,
switch (Kind) {
default:
break;
+ case TTI::SK_ExtractSubvector:
+ if (isa<FixedVectorType>(SubTp)) {
+ unsigned TpRegs = getRegUsageForType(Tp);
+ unsigned NumElems =
+ divideCeil(Tp->getElementCount().getFixedValue(), TpRegs);
+ // Whole vector extract - just the vector itself + (possible) vsetvli.
+ // TODO: consider adding the cost for vsetvli.
+ if (Index % NumElems == 0) {
+ std::pair<InstructionCost, MVT> SubLT =
+ getTypeLegalizationCost(SubTp);
+ return Index == 0
+ ? TTI::TCC_Free
+ : SubLT.first * getRISCVInstructionCost(RISCV::VMV_V_V,
+ SubLT.second,
+ CostKind);
+ }
+ }
+ break;
+ case TTI::SK_InsertSubvector:
+ if (auto *FSubTy = dyn_cast<FixedVectorType>(SubTp)) {
+ unsigned TpRegs = getRegUsageForType(Tp);
+ unsigned SubTpRegs = getRegUsageForType(SubTp);
+ unsigned NextSubTpRegs = getRegUsageForType(FixedVectorType::get(
+ Tp->getElementType(), FSubTy->getNumElements() + 1));
+ unsigned NumElems =
+ divideCeil(Tp->getElementCount().getFixedValue(), TpRegs);
+ // Whole vector insert - just the vector itself + (possible) vsetvli.
+ // TODO: consider adding the cost for vsetvli.
+ if (Index % NumElems == 0 &&
+ (any_of(Args, UndefValue::classof) ||
+ (SubTpRegs != 0 && SubTpRegs != NextSubTpRegs &&
+ TpRegs / SubTpRegs > 1))) {
+ std::pair<InstructionCost, MVT> SubLT =
+ getTypeLegalizationCost(SubTp);
+ return Index == 0
+ ? TTI::TCC_Free
+ : SubLT.first * getRISCVInstructionCost(RISCV::VMV_V_V,
+ SubLT.second,
+ CostKind);
+ }
+ }
+ break;
case TTI::SK_PermuteSingleSrc: {
if (Mask.size() >= 2 && LT.second.isFixedLengthVector()) {
MVT EltTp = LT.second.getVectorElementType();
diff --git a/llvm/test/Analysis/CostModel/RISCV/shuffle-extract_subvector.ll b/llvm/test/Analysis/CostModel/RISCV/shuffle-extract_subvector.ll
index 76cb1955a2b37..901d66e1124d8 100644
--- a/llvm/test/Analysis/CostModel/RISCV/shuffle-extract_subvector.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/shuffle-extract_subvector.ll
@@ -9,15 +9,15 @@
define void @test_vXf64(<4 x double> %src256, <8 x double> %src512) {
; CHECK-LABEL: 'test_vXf64'
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256_01 = shufflevector <4 x double> %src256, <4 x double> undef, <2 x i32> <i32 0, i32 1>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256_23 = shufflevector <4 x double> %src256, <4 x double> undef, <2 x i32> <i32 2, i32 3>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V512_01 = shufflevector <8 x double> %src512, <8 x double> undef, <2 x i32> <i32 0, i32 1>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V512_23 = shufflevector <8 x double> %src512, <8 x double> undef, <2 x i32> <i32 2, i32 3>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V512_45 = shufflevector <8 x double> %src512, <8 x double> undef, <2 x i32> <i32 4, i32 5>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V512_67 = shufflevector <8 x double> %src512, <8 x double> undef, <2 x i32> <i32 6, i32 7>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V512_0123 = shufflevector <8 x double> %src512, <8 x double> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V512_2345 = shufflevector <8 x double> %src512, <8 x double> undef, <4 x i32> <i32 2, i32 3, i32 4, i32 5>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V512_4567 = shufflevector <8 x double> %src512, <8 x double> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_01 = shufflevector <4 x double> %src256, <4 x double> undef, <2 x i32> <i32 0, i32 1>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_23 = shufflevector <4 x double> %src256, <4 x double> undef, <2 x i32> <i32 2, i32 3>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_01 = shufflevector <8 x double> %src512, <8 x double> undef, <2 x i32> <i32 0, i32 1>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_23 = shufflevector <8 x double> %src512, <8 x double> undef, <2 x i32> <i32 2, i32 3>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_45 = shufflevector <8 x double> %src512, <8 x double> undef, <2 x i32> <i32 4, i32 5>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_67 = shufflevector <8 x double> %src512, <8 x double> undef, <2 x i32> <i32 6, i32 7>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_0123 = shufflevector <8 x double> %src512, <8 x double> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V512_2345 = shufflevector <8 x double> %src512, <8 x double> undef, <4 x i32> <i32 2, i32 3, i32 4, i32 5>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V512_4567 = shufflevector <8 x double> %src512, <8 x double> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of -1 for instruction: %V512_567u = shufflevector <8 x double> %src512, <8 x double> undef, <4 x i32> <i32 5, i32 6, i32 7, i32 poison>
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;
@@ -36,15 +36,15 @@ define void @test_vXf64(<4 x double> %src256, <8 x double> %src512) {
define void @test_vXi64(<4 x i64> %src256, <8 x i64> %src512) {
; CHECK-LABEL: 'test_vXi64'
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256_01 = shufflevector <4 x i64> %src256, <4 x i64> undef, <2 x i32> <i32 0, i32 1>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256_23 = shufflevector <4 x i64> %src256, <4 x i64> undef, <2 x i32> <i32 2, i32 3>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V512_01 = shufflevector <8 x i64> %src512, <8 x i64> undef, <2 x i32> <i32 0, i32 1>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V512_23 = shufflevector <8 x i64> %src512, <8 x i64> undef, <2 x i32> <i32 2, i32 3>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V512_45 = shufflevector <8 x i64> %src512, <8 x i64> undef, <2 x i32> <i32 4, i32 5>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V512_67 = shufflevector <8 x i64> %src512, <8 x i64> undef, <2 x i32> <i32 6, i32 7>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V512_0123 = shufflevector <8 x i64> %src512, <8 x i64> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V512_2345 = shufflevector <8 x i64> %src512, <8 x i64> undef, <4 x i32> <i32 2, i32 3, i32 4, i32 5>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V512_4567 = shufflevector <8 x i64> %src512, <8 x i64> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_01 = shufflevector <4 x i64> %src256, <4 x i64> undef, <2 x i32> <i32 0, i32 1>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_23 = shufflevector <4 x i64> %src256, <4 x i64> undef, <2 x i32> <i32 2, i32 3>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_01 = shufflevector <8 x i64> %src512, <8 x i64> undef, <2 x i32> <i32 0, i32 1>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_23 = shufflevector <8 x i64> %src512, <8 x i64> undef, <2 x i32> <i32 2, i32 3>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_45 = shufflevector <8 x i64> %src512, <8 x i64> undef, <2 x i32> <i32 4, i32 5>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_67 = shufflevector <8 x i64> %src512, <8 x i64> undef, <2 x i32> <i32 6, i32 7>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_0123 = shufflevector <8 x i64> %src512, <8 x i64> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V512_2345 = shufflevector <8 x i64> %src512, <8 x i64> undef, <4 x i32> <i32 2, i32 3, i32 4, i32 5>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V512_4567 = shufflevector <8 x i64> %src512, <8 x i64> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;
%V256_01 = shufflevector <4 x i64> %src256, <4 x i64> undef, <2 x i32> <i32 0, i32 1>
@@ -61,28 +61,28 @@ define void @test_vXi64(<4 x i64> %src256, <8 x i64> %src512) {
define void @test_vXi32(<4 x i32> %src128, <8 x i32> %src256, <16 x i32> %src512) {
; CHECK-LABEL: 'test_vXi32'
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_01 = shufflevector <4 x i32> %src128, <4 x i32> undef, <2 x i32> <i32 0, i32 1>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128_01 = shufflevector <4 x i32> %src128, <4 x i32> undef, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_23 = shufflevector <4 x i32> %src128, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256_01 = shufflevector <8 x i32> %src256, <8 x i32> undef, <2 x i32> <i32 0, i32 1>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_01 = shufflevector <8 x i32> %src256, <8 x i32> undef, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256_23 = shufflevector <8 x i32> %src256, <8 x i32> undef, <2 x i32> <i32 2, i32 3>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256_45 = shufflevector <8 x i32> %src256, <8 x i32> undef, <2 x i32> <i32 4, i32 5>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_45 = shufflevector <8 x i32> %src256, <8 x i32> undef, <2 x i32> <i32 4, i32 5>
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256_67 = shufflevector <8 x i32> %src256, <8 x i32> undef, <2 x i32> <i32 6, i32 7>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256_0123 = shufflevector <8 x i32> %src256, <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256_4567 = shufflevector <8 x i32> %src256, <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V512_01 = shufflevector <16 x i32> %src512, <16 x i32> undef, <2 x i32> <i32 0, i32 1>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_0123 = shufflevector <8 x i32> %src256, <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_4567 = shufflevector <8 x i32> %src256, <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_01 = shufflevector <16 x i32> %src512, <16 x i32> undef, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V512_23 = shufflevector <16 x i32> %src512, <16 x i32> undef, <2 x i32> <i32 2, i32 3>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V512_45 = shufflevector <16 x i32> %src512, <16 x i32> undef, <2 x i32> <i32 4, i32 5>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_45 = shufflevector <16 x i32> %src512, <16 x i32> undef, <2 x i32> <i32 4, i32 5>
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V512_67 = shufflevector <16 x i32> %src512, <16 x i32> undef, <2 x i32> <i32 6, i32 7>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V512_89 = shufflevector <16 x i32> %src512, <16 x i32> undef, <2 x i32> <i32 8, i32 9>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_89 = shufflevector <16 x i32> %src512, <16 x i32> undef, <2 x i32> <i32 8, i32 9>
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V512_AB = shufflevector <16 x i32> %src512, <16 x i32> undef, <2 x i32> <i32 10, i32 11>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V512_CD = shufflevector <16 x i32> %src512, <16 x i32> undef, <2 x i32> <i32 12, i32 13>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_CD = shufflevector <16 x i32> %src512, <16 x i32> undef, <2 x i32> <i32 12, i32 13>
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V512_EF = shufflevector <16 x i32> %src512, <16 x i32> undef, <2 x i32> <i32 14, i32 15>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V512_0123 = shufflevector <16 x i32> %src512, <16 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V512_4567 = shufflevector <16 x i32> %src512, <16 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V512_89AB = shufflevector <16 x i32> %src512, <16 x i32> undef, <4 x i32> <i32 8, i32 9, i32 10, i32 11>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V512_CDEF = shufflevector <16 x i32> %src512, <16 x i32> undef, <4 x i32> <i32 12, i32 13, i32 14, i32 15>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V512_01234567 = shufflevector <16 x i32> %src512, <16 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V512_89ABCDEF = shufflevector <16 x i32> %src512, <16 x i32> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_0123 = shufflevector <16 x i32> %src512, <16 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_4567 = shufflevector <16 x i32> %src512, <16 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_89AB = shufflevector <16 x i32> %src512, <16 x i32> undef, <4 x i32> <i32 8, i32 9, i32 10, i32 11>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_CDEF = shufflevector <16 x i32> %src512, <16 x i32> undef, <4 x i32> <i32 12, i32 13, i32 14, i32 15>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_01234567 = shufflevector <16 x i32> %src512, <16 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V512_89ABCDEF = shufflevector <16 x i32> %src512, <16 x i32> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;
%V128_01 = shufflevector <4 x i32> %src128, <4 x i32> undef, <2 x i32> <i32 0, i32 1>
@@ -112,62 +112,62 @@ define void @test_vXi32(<4 x i32> %src128, <8 x i32> %src256, <16 x i32> %src512
define void @test_vXi16(<4 x i16> %src64, <8 x i16> %src128, <16 x i16> %src256, <32 x i16> %src512) {
; CHECK-LABEL: 'test_vXi16'
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64_01 = shufflevector <4 x i16> %src64, <4 x i16> undef, <2 x i32> <i32 0, i32 1>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64_01 = shufflevector <4 x i16> %src64, <4 x i16> undef, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64_23 = shufflevector <4 x i16> %src64, <4 x i16> undef, <2 x i32> <i32 2, i32 3>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_01 = shufflevector <8 x i16> %src128, <8 x i16> undef, <2 x i32> <i32 0, i32 1>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128_01 = shufflevector <8 x i16> %src128, <8 x i16> undef, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_23 = shufflevector <8 x i16> %src128, <8 x i16> undef, <2 x i32> <i32 2, i32 3>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_45 = shufflevector <8 x i16> %src128, <8 x i16> undef, <2 x i32> <i32 4, i32 5>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_67 = shufflevector <8 x i16> %src128, <8 x i16> undef, <2 x i32> <i32 6, i32 7>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_0123 = shufflevector <8 x i16> %src128, <8 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128_0123 = shufflevector <8 x i16> %src128, <8 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_4567 = shufflevector <8 x i16> %src128, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256_01 = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 0, i32 1>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_01 = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256_23 = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 2, i32 3>
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256_45 = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 4, i32 5>
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256_67 = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 6, i32 7>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256_89 = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 8, i32 9>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_89 = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 8, i32 9>
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256_AB = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 10, i32 11>
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256_CD = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 12, i32 13>
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256_EF = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 14, i32 15>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256_0123 = shufflevector <16 x i16> %src256, <16 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_0123 = shufflevector <16 x i16> %src256, <16 x i16> u...
[truncated]
|
Don't we need to know the exact vlen to know where register boundaries are? |
I use getRegUsageForType() to get this info. |
That tells the maximum number of registers needed for the type assuming a minimum VLEN. If hardware VLEN is more than the minimum VLEN, we still use the extra registers but the elements in them are not used since they would be past VL. CodeGen has to use a slidedown unless we also know the maximum VLEN is the same as the minimum VLEN. |
Do we have anything in TTI that returns correct VLEN? |
You can check that |
Will add |
Created using spr 1.3.5
divideCeil(Tp->getElementCount().getFixedValue(), TpRegs); | ||
// Whole vector extract - just the vector itself + (possible) vsetvli. | ||
// TODO: consider adding the cost for vsetvli. | ||
if (Index == 0 || (ST->getRealMaxVLen() == ST->getRealMinVLen() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this check would be more clearly expressed as an and of the following clauses
a) ST->getRealMaxVLen() == ST->getRealMinVLen()
b) NumElems * ElementSizeInBits == VLEN
c) Index % NumElems == 0
Note that this only supports m1 full extracts. But starting there and extending it to m2, and m4 later seems entirely reasonable.
getTypeLegalizationCost(SubTp); | ||
return Index == 0 | ||
? TTI::TCC_Free | ||
: SubLT.first * getRISCVInstructionCost(RISCV::VMV_V_V, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a full VREG case, you never need the VMV_V_V. You only need the VMV_V_V if NumElems < VLMAX.
Extending this to sub-register extract with exact VLEN known would be reasonable, but let's do that in a separate patch.
break; | ||
case TTI::SK_InsertSubvector: | ||
if (auto *FSubTy = dyn_cast<FixedVectorType>(SubTp)) { | ||
unsigned TpRegs = getRegUsageForType(Tp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same basic style comments as above.
Created using spr 1.3.5
Created using spr 1.3.5
; RTBASE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128_01 = shufflevector <4 x i32> %src128, <4 x i32> undef, <2 x i32> <i32 0, i32 1> | ||
; RTBASE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_23 = shufflevector <4 x i32> %src128, <4 x i32> undef, <2 x i32> <i32 2, i32 3> | ||
; RTBASE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_01 = shufflevector <8 x i32> %src256, <8 x i32> undef, <2 x i32> <i32 0, i32 1> | ||
; RTBASE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256_23 = shufflevector <8 x i32> %src256, <8 x i32> undef, <2 x i32> <i32 2, i32 3> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just making a note, this is an LMUL 1 vslidedown:
vsetivli zero, 2, e32, m1, ta, ma
vslidedown.vi v8, v8, 2
So I think the cost should just be one here. This looks like it's coming from the scalable vector cost path. Is the type being passed in a <8 x i32> instead of a <2 x i32>? Something that could be looked at in a later patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd encourage you to split off a change to handle only the Index == 0 case. It should be simple, but that's the value. :)
Created using spr 1.3.5
I posted this here: #81751 |
The latest version of the patch actually handles only index 0 |
#81751 handles both fixed and scalable vectors from the looks of things. I wonder if it's possible to have this patch handle the whole reg extract/insert case for scalable vectors too, if we move the logic into the scalable part below the fixed vector switch? Any scalable vector extract/insert should be free if both the vector and subvector are >= LMUL 1. |
Not sure that insert subvector should be free. It can be free, if either the second vector is undef or inserting the whole vector. LMUL >=1 not enough for the second case, also need to check that the whole vector is insert, not, say, half of it. |
But for llvm.vector.insert there is the constraint that all the subvec elements must be within bounds of the vector, and for scalable vectors the index at which it is inserted is scaled by vscale. So if the subvector is LMUL >=1 it shouldn't be possible for only half of it be inserted, since it won't be truncated and the index will be a multiple of an LMUL1 register boundary. This is separate from the exact VLEN fixed vector case in the original version of this PR though, we can leave it for a future patch. |
Created using spr 1.3.5
unsigned NextSubTpRegs = getRegUsageForType(FixedVectorType::get( | ||
Tp->getElementType(), FSubTy->getNumElements() + 1)); | ||
// Whole vector insert - just the vector itself. | ||
if (Index == 0 && SubTpRegs != 0 && SubTpRegs != NextSubTpRegs && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to check, is SubTpRegs != NextSubTpRegs
to check that SubTp isn't a fractional LMUL?
Created using spr 1.3.5
@@ -442,6 +454,9 @@ InstructionCost RISCVTTIImpl::getShuffleCost(TTI::ShuffleKind Kind, | |||
return LT.first * | |||
getRISCVInstructionCost(RISCV::VSLIDEDOWN_VI, LT.second, CostKind); | |||
case TTI::SK_InsertSubvector: | |||
if (Index == 0 && any_of(Args, UndefValue::classof)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to check that Args isn't empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added, thanks!
Created using spr 1.3.5
Created using spr 1.3.5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the PR title need updated to reflect that this handles insert_subvectors only now
@@ -326,6 +326,18 @@ InstructionCost RISCVTTIImpl::getShuffleCost(TTI::ShuffleKind Kind, | |||
switch (Kind) { | |||
default: | |||
break; | |||
case TTI::SK_InsertSubvector: { | |||
auto *FSubTy = dyn_cast<FixedVectorType>(SubTp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use cast instead of dyn_cast here to get the assertion?
Created using spr 1.3.5
…ct vlen (#82405) If we have exact vlen knowledge, we can figure out which indices correspond to register boundaries. Our lowering uses this knowledge to replace the vslidedown.vi with a sub-register extract. Our costs can reflect that as well. This is another piece split off #80164 --------- Co-authored-by: Luke Lau <[email protected]>
@@ -442,6 +454,9 @@ InstructionCost RISCVTTIImpl::getShuffleCost(TTI::ShuffleKind Kind, | |||
return LT.first * | |||
getRISCVInstructionCost(RISCV::VSLIDEDOWN_VI, LT.second, CostKind); | |||
case TTI::SK_InsertSubvector: | |||
if (Index == 0 && !Args.empty() && any_of(Args, UndefValue::classof)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to split off this piece - or more accurately something vaguely related - and stumbled into something interesting.
The InsertSubvector w/Index=0 is unreachable from everywhere except SLP. TTI::getInstructionCost contains a check for the identity shuffle and always returns 0. improveShuffleKindFromMask will recognize the insert into passthru case as a select (correctly), and thus it doesn't hit this case either. Put together, this means that the index=0 case never makes it from the backend, and thus we have no test coverage via cost model tests.
SLP hits a slightly different codepath here and directly calls getShuffleCost with a possible identity mask. It still can't hit the select case, but it can hit the insert into poison case. SLP appears to have a bunch of guards for this already in various cases.
I'm not really a fan of having untestable logic here. Anyone have any ideas how we can rework this API to ensure SLP can't reach a case which is untestable via costmodel tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not tests for llvm.vector.insert intrinsics check this?
Ping! |
unsigned NextSubTpRegs = getRegUsageForType(FixedVectorType::get( | ||
Tp->getElementType(), FSubTy->getNumElements() + 1)); | ||
// Whole vector insert - just the vector itself. | ||
if (Index == 0 && SubTpRegs != 0 && SubTpRegs != NextSubTpRegs && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this works. getRegUsageForType is returning the maximum number of registers needed given a minimum VLEN. If the runtime VLEN is larger the used number of registers could be less.
The backend must always use a vslideup or vmv.v.v for fixed vector insert unless we know both the maximum and minimum VLEN are the same. I think you have to check ST.getRealVLen()
.
Created using spr 1.3.5
Ping! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the PR title still needs the reg extract part removed
const unsigned MinVLen = ST->getRealMinVLen(); | ||
const unsigned MaxVLen = ST->getRealMaxVLen(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use ST->getRealVLen()
which was added recently
unsigned TpRegs = getRegUsageForType(Tp); | ||
unsigned SubTpRegs = getRegUsageForType(SubTp); | ||
unsigned NextSubTpRegs = getRegUsageForType(FixedVectorType::get( | ||
Tp->getElementType(), FSubTy->getNumElements() + 1)); | ||
if (SubTpRegs != 0 && SubTpRegs != NextSubTpRegs && TpRegs >= SubTpRegs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible for TpRegs < SubTpRegs
?
…t vlen If we have exact vlen knowledge, we can figure out which indices correspond to register boundaries. Our lowering will use this knowledge to replace the vslideup.vi with a sub-register insert when the subvec passthru is undef. One case where the subvec passthru is known undef is when the subvec completely fills the subregister, and that's the easiest case to recognize during costing. Note: This is cost modeling a lowering which hasn't landed yet, see llvm#84107. This change will not land until after that one does. This is another piece split off llvm#80164
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have posted an alternative patch for the remaining insertsubvector case here: #85240
When writing this, I discovered that this patch models a lowering which is not implemented. There is a patch on review, but it hasn't landed yet.
If we can detect, that whole register extract/insert is requested,
consider it free.