[VPlan] Handle VPWidenCastRecipe without underlying value in EVL transform #120194

lukel97 · 2024-12-17T07:18:50Z

This fixes a crash that shows up when building SPEC CPU 2017 with EVL tail folding on RISC-V.

A VPWidenCastRecipe doesn't always have an underlying value, and in the case of this crash this happens whenever a widened cast is created via truncateToMinimalBitwidths.

Fix this by just using the opcode stored in the recipe itself.

I think a similar issue exists with VPWidenIntrinsicRecipe and how it's widened, but I haven't run into any crashes with it just yet.

…sform This fixes a crash that shows up when building SPEC CPU 2017 with EVL tail folding on RISC-V. A VPWidenCastRecipe doesn't always have an underlying value, and in the case of this crash this happens whenever a widened cast is created via truncateToMinimalBitwidths. Fix this by just using the opcode stored in the recipe itself. I think a similar issue exists with VPWidenIntrinsicRecipe and how it's widened, but I haven't run into any crashes with it just yet.

llvmbot · 2024-12-17T07:19:45Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-vectorizers

Author: Luke Lau (lukel97)

Changes

This fixes a crash that shows up when building SPEC CPU 2017 with EVL tail folding on RISC-V.

A VPWidenCastRecipe doesn't always have an underlying value, and in the case of this crash this happens whenever a widened cast is created via truncateToMinimalBitwidths.

Fix this by just using the opcode stored in the recipe itself.

I think a similar issue exists with VPWidenIntrinsicRecipe and how it's widened, but I haven't run into any crashes with it just yet.

Full diff: https://github.com/llvm/llvm-project/pull/120194.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+3-4)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-cast-intrinsics.ll (+121)

diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 9a3b82fe57c12a..066232775e46a7 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -1506,9 +1506,8 @@ static void transformRecipestoEVLRecipes(VPlan &Plan, VPValue &EVL) {
                   })
               .Case<VPWidenCastRecipe>(
                   [&](VPWidenCastRecipe *CInst) -> VPRecipeBase * {
-                    auto *CI = dyn_cast<CastInst>(CInst->getUnderlyingInstr());
                     Intrinsic::ID VPID =
-                        VPIntrinsic::getForOpcode(CI->getOpcode());
+                        VPIntrinsic::getForOpcode(CInst->getOpcode());
                     assert(VPID != Intrinsic::not_intrinsic &&
                            "Expected vp.casts Instrinsic");
 
@@ -1516,8 +1515,8 @@ static void transformRecipestoEVLRecipes(VPlan &Plan, VPValue &EVL) {
                     assert(VPIntrinsic::getMaskParamPos(VPID) &&
                            VPIntrinsic::getVectorLengthParamPos(VPID) &&
                            "Expected VP intrinsic");
-                    VPValue *Mask = Plan.getOrAddLiveIn(ConstantInt::getTrue(
-                        IntegerType::getInt1Ty(CI->getContext())));
+                    VPValue *Mask = Plan.getOrAddLiveIn(
+                        ConstantInt::getTrue(IntegerType::getInt1Ty(Ctx)));
                     Ops.push_back(Mask);
                     Ops.push_back(&EVL);
                     return new VPWidenIntrinsicRecipe(
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-cast-intrinsics.ll b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-cast-intrinsics.ll
index 4557e95f1e1b6a..48f9cec34d974f 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-cast-intrinsics.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-cast-intrinsics.ll
@@ -1058,6 +1058,125 @@ loop:
 exit:
   ret void
 }
+
+define void @truncate_to_minimal_bitwidths_widen_cast_recipe(ptr noalias %dst, ptr noalias %src, i32 %mvx) {
+; IF-EVL-LABEL: define void @truncate_to_minimal_bitwidths_widen_cast_recipe(
+; IF-EVL-SAME: ptr noalias [[DST:%.*]], ptr noalias [[SRC:%.*]], i32 [[MVX:%.*]]) #[[ATTR0]] {
+; IF-EVL-NEXT:  [[ENTRY:.*:]]
+; IF-EVL-NEXT:    [[CMP111:%.*]] = icmp sgt i32 [[MVX]], 0
+; IF-EVL-NEXT:    br i1 [[CMP111]], label %[[FOR_BODY13_PREHEADER:.*]], label %[[FOR_COND_CLEANUP12:.*]]
+; IF-EVL:       [[FOR_BODY13_PREHEADER]]:
+; IF-EVL-NEXT:    [[WIDE_TRIP_COUNT:%.*]] = zext nneg i32 [[MVX]] to i64
+; IF-EVL-NEXT:    br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; IF-EVL:       [[VECTOR_PH]]:
+; IF-EVL-NEXT:    [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
+; IF-EVL-NEXT:    [[TMP1:%.*]] = mul i64 [[TMP0]], 16
+; IF-EVL-NEXT:    [[TMP2:%.*]] = sub i64 [[TMP1]], 1
+; IF-EVL-NEXT:    [[N_RND_UP:%.*]] = add i64 [[WIDE_TRIP_COUNT]], [[TMP2]]
+; IF-EVL-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP1]]
+; IF-EVL-NEXT:    [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
+; IF-EVL-NEXT:    [[TMP3:%.*]] = call i64 @llvm.vscale.i64()
+; IF-EVL-NEXT:    [[TMP4:%.*]] = mul i64 [[TMP3]], 16
+; IF-EVL-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 16 x i32> poison, i32 [[MVX]], i64 0
+; IF-EVL-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 16 x i32> [[BROADCAST_SPLATINSERT]], <vscale x 16 x i32> poison, <vscale x 16 x i32> zeroinitializer
+; IF-EVL-NEXT:    [[TMP5:%.*]] = trunc <vscale x 16 x i32> [[BROADCAST_SPLAT]] to <vscale x 16 x i16>
+; IF-EVL-NEXT:    [[BROADCAST_SPLATINSERT2:%.*]] = insertelement <vscale x 16 x ptr> poison, ptr [[DST]], i64 0
+; IF-EVL-NEXT:    [[BROADCAST_SPLAT3:%.*]] = shufflevector <vscale x 16 x ptr> [[BROADCAST_SPLATINSERT2]], <vscale x 16 x ptr> poison, <vscale x 16 x i32> zeroinitializer
+; IF-EVL-NEXT:    br label %[[VECTOR_BODY:.*]]
+; IF-EVL:       [[VECTOR_BODY]]:
+; IF-EVL-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; IF-EVL-NEXT:    [[EVL_BASED_IV:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_EVL_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; IF-EVL-NEXT:    [[AVL:%.*]] = sub i64 [[WIDE_TRIP_COUNT]], [[EVL_BASED_IV]]
+; IF-EVL-NEXT:    [[TMP6:%.*]] = call i32 @llvm.experimental.get.vector.length.i64(i64 [[AVL]], i32 16, i1 true)
+; IF-EVL-NEXT:    [[TMP7:%.*]] = add i64 [[EVL_BASED_IV]], 0
+; IF-EVL-NEXT:    [[TMP8:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP7]]
+; IF-EVL-NEXT:    [[TMP9:%.*]] = getelementptr i8, ptr [[TMP8]], i32 0
+; IF-EVL-NEXT:    [[VP_OP_LOAD:%.*]] = call <vscale x 16 x i8> @llvm.vp.load.nxv16i8.p0(ptr align 1 [[TMP9]], <vscale x 16 x i1> splat (i1 true), i32 [[TMP6]])
+; IF-EVL-NEXT:    [[TMP10:%.*]] = call <vscale x 16 x i16> @llvm.vp.zext.nxv16i16.nxv16i8(<vscale x 16 x i8> [[VP_OP_LOAD]], <vscale x 16 x i1> splat (i1 true), i32 [[TMP6]])
+; IF-EVL-NEXT:    [[VP_OP:%.*]] = call <vscale x 16 x i16> @llvm.vp.mul.nxv16i16(<vscale x 16 x i16> [[TMP5]], <vscale x 16 x i16> [[TMP10]], <vscale x 16 x i1> splat (i1 true), i32 [[TMP6]])
+; IF-EVL-NEXT:    [[VP_OP1:%.*]] = call <vscale x 16 x i16> @llvm.vp.lshr.nxv16i16(<vscale x 16 x i16> [[VP_OP]], <vscale x 16 x i16> trunc (<vscale x 16 x i32> splat (i32 1) to <vscale x 16 x i16>), <vscale x 16 x i1> splat (i1 true), i32 [[TMP6]])
+; IF-EVL-NEXT:    [[TMP11:%.*]] = call <vscale x 16 x i8> @llvm.vp.trunc.nxv16i8.nxv16i16(<vscale x 16 x i16> [[VP_OP1]], <vscale x 16 x i1> splat (i1 true), i32 [[TMP6]])
+; IF-EVL-NEXT:    call void @llvm.vp.scatter.nxv16i8.nxv16p0(<vscale x 16 x i8> [[TMP11]], <vscale x 16 x ptr> align 1 [[BROADCAST_SPLAT3]], <vscale x 16 x i1> splat (i1 true), i32 [[TMP6]])
+; IF-EVL-NEXT:    [[TMP12:%.*]] = zext i32 [[TMP6]] to i64
+; IF-EVL-NEXT:    [[INDEX_EVL_NEXT]] = add nuw i64 [[TMP12]], [[EVL_BASED_IV]]
+; IF-EVL-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP4]]
+; IF-EVL-NEXT:    [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; IF-EVL-NEXT:    br i1 [[TMP13]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP47:![0-9]+]]
+; IF-EVL:       [[MIDDLE_BLOCK]]:
+; IF-EVL-NEXT:    br i1 true, label %[[FOR_COND_CLEANUP12_LOOPEXIT:.*]], label %[[SCALAR_PH]]
+; IF-EVL:       [[SCALAR_PH]]:
+; IF-EVL-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], %[[MIDDLE_BLOCK]] ], [ 0, %[[FOR_BODY13_PREHEADER]] ]
+; IF-EVL-NEXT:    br label %[[FOR_BODY13:.*]]
+; IF-EVL:       [[FOR_COND_CLEANUP12_LOOPEXIT]]:
+; IF-EVL-NEXT:    br label %[[FOR_COND_CLEANUP12]]
+; IF-EVL:       [[FOR_COND_CLEANUP12]]:
+; IF-EVL-NEXT:    ret void
+; IF-EVL:       [[FOR_BODY13]]:
+; IF-EVL-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY13]] ]
+; IF-EVL-NEXT:    [[ARRAYIDX15:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[INDVARS_IV]]
+; IF-EVL-NEXT:    [[TMP14:%.*]] = load i8, ptr [[ARRAYIDX15]], align 1
+; IF-EVL-NEXT:    [[CONV:%.*]] = zext i8 [[TMP14]] to i32
+; IF-EVL-NEXT:    [[MUL16:%.*]] = mul i32 [[MVX]], [[CONV]]
+; IF-EVL-NEXT:    [[SHR35:%.*]] = lshr i32 [[MUL16]], 1
+; IF-EVL-NEXT:    [[CONV36:%.*]] = trunc i32 [[SHR35]] to i8
+; IF-EVL-NEXT:    store i8 [[CONV36]], ptr [[DST]], align 1
+; IF-EVL-NEXT:    [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
+; IF-EVL-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[WIDE_TRIP_COUNT]]
+; IF-EVL-NEXT:    br i1 [[EXITCOND_NOT]], label %[[FOR_COND_CLEANUP12_LOOPEXIT]], label %[[FOR_BODY13]], !llvm.loop [[LOOP48:![0-9]+]]
+;
+; NO-VP-LABEL: define void @truncate_to_minimal_bitwidths_widen_cast_recipe(
+; NO-VP-SAME: ptr noalias [[DST:%.*]], ptr noalias [[SRC:%.*]], i32 [[MVX:%.*]]) #[[ATTR0]] {
+; NO-VP-NEXT:  [[ENTRY:.*:]]
+; NO-VP-NEXT:    [[CMP111:%.*]] = icmp sgt i32 [[MVX]], 0
+; NO-VP-NEXT:    br i1 [[CMP111]], label %[[FOR_BODY13_PREHEADER:.*]], label %[[FOR_COND_CLEANUP12:.*]]
+; NO-VP:       [[FOR_BODY13_PREHEADER]]:
+; NO-VP-NEXT:    [[WIDE_TRIP_COUNT:%.*]] = zext nneg i32 [[MVX]] to i64
+; NO-VP-NEXT:    br label %[[FOR_BODY13:.*]]
+; NO-VP:       [[FOR_COND_CLEANUP12_LOOPEXIT:.*]]:
+; NO-VP-NEXT:    br label %[[FOR_COND_CLEANUP12]]
+; NO-VP:       [[FOR_COND_CLEANUP12]]:
+; NO-VP-NEXT:    ret void
+; NO-VP:       [[FOR_BODY13]]:
+; NO-VP-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ 0, %[[FOR_BODY13_PREHEADER]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY13]] ]
+; NO-VP-NEXT:    [[ARRAYIDX15:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[INDVARS_IV]]
+; NO-VP-NEXT:    [[TMP0:%.*]] = load i8, ptr [[ARRAYIDX15]], align 1
+; NO-VP-NEXT:    [[CONV:%.*]] = zext i8 [[TMP0]] to i32
+; NO-VP-NEXT:    [[MUL16:%.*]] = mul i32 [[MVX]], [[CONV]]
+; NO-VP-NEXT:    [[SHR35:%.*]] = lshr i32 [[MUL16]], 1
+; NO-VP-NEXT:    [[CONV36:%.*]] = trunc i32 [[SHR35]] to i8
+; NO-VP-NEXT:    store i8 [[CONV36]], ptr [[DST]], align 1
+; NO-VP-NEXT:    [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
+; NO-VP-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[WIDE_TRIP_COUNT]]
+; NO-VP-NEXT:    br i1 [[EXITCOND_NOT]], label %[[FOR_COND_CLEANUP12_LOOPEXIT]], label %[[FOR_BODY13]]
+;
+entry:
+  %cmp111 = icmp sgt i32 %mvx, 0
+  br i1 %cmp111, label %for.body13.preheader, label %for.cond.cleanup12
+
+for.body13.preheader:                             ; preds = %entry
+  %wide.trip.count = zext nneg i32 %mvx to i64
+  br label %for.body13
+
+for.cond.cleanup12.loopexit:                      ; preds = %for.body13
+  br label %for.cond.cleanup12
+
+for.cond.cleanup12:                               ; preds = %for.cond.cleanup12.loopexit, %entry
+  ret void
+
+for.body13:                                       ; preds = %for.body13.preheader, %for.body13
+  %indvars.iv = phi i64 [ 0, %for.body13.preheader ], [ %indvars.iv.next, %for.body13 ]
+  %arrayidx15 = getelementptr i8, ptr %src, i64 %indvars.iv
+  %0 = load i8, ptr %arrayidx15, align 1
+  %conv = zext i8 %0 to i32
+  %mul16 = mul i32 %mvx, %conv
+  %shr35 = lshr i32 %mul16, 1
+  %conv36 = trunc i32 %shr35 to i8
+  store i8 %conv36, ptr %dst, align 1
+  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
+  %exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count
+  br i1 %exitcond.not, label %for.cond.cleanup12.loopexit, label %for.body13
+}
+
 ;.
 ; IF-EVL: [[META0]] = !{[[META1:![0-9]+]]}
 ; IF-EVL: [[META1]] = distinct !{[[META1]], [[META2:![0-9]+]]}
@@ -1106,4 +1225,6 @@ exit:
 ; IF-EVL: [[LOOP44]] = distinct !{[[LOOP44]], [[META6]]}
 ; IF-EVL: [[LOOP45]] = distinct !{[[LOOP45]], [[META6]], [[META7]]}
 ; IF-EVL: [[LOOP46]] = distinct !{[[LOOP46]], [[META6]]}
+; IF-EVL: [[LOOP47]] = distinct !{[[LOOP47]], [[META6]], [[META7]]}
+; IF-EVL: [[LOOP48]] = distinct !{[[LOOP48]], [[META7]], [[META6]]}
 ;.

Mel-Chen · 2024-12-17T07:47:41Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+                    VPValue *Mask = Plan.getOrAddLiveIn(
+                        ConstantInt::getTrue(IntegerType::getInt1Ty(Ctx)));


ConstantInt::getTrue(Ctx));

Mel-Chen · 2024-12-17T07:49:18Z

llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-cast-intrinsics.ll

@@ -1058,6 +1058,125 @@ loop:
 exit:
  ret void
 }
+
+define void @truncate_to_minimal_bitwidths_widen_cast_recipe(ptr noalias %dst, ptr noalias %src, i32 %mvx) {


Suggested change

define void @truncate_to_minimal_bitwidths_widen_cast_recipe(ptr noalias %dst, ptr noalias %src, i32 %mvx) {

define void @truncate_to_minimal_bitwidths_widen_cast_recipe(ptr %dst, ptr %src, i32 %mvx) {

I was under the impression that we preferred noalias to avoid the memcheck block if we weren't explicitly testing for it? E.g. I remember seeing #107225 do it as a cleanup

Mel-Chen · 2024-12-17T07:51:37Z

llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-cast-intrinsics.ll

+for.cond.cleanup12.loopexit:                      ; preds = %for.body13
+  br label %for.cond.cleanup12


Clean up this bb if possible.

fhahn · 2024-12-17T07:59:54Z

llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-cast-intrinsics.ll

+
+define void @truncate_to_minimal_bitwidths_widen_cast_recipe(ptr noalias %dst, ptr noalias %src, i32 %mvx) {
+; IF-EVL-LABEL: define void @truncate_to_minimal_bitwidths_widen_cast_recipe(
+; IF-EVL-SAME: ptr noalias [[DST:%.*]], ptr noalias [[SRC:%.*]], i32 [[MVX:%.*]]) #[[ATTR0]] {


Please move to a dedicated file with a descriptive name. The test file is already far too big

fhahn

This issue was also pointed out when reviewing #119510, but it would probably be better to fix it separately as in this PR, especially as it comes with a test case.

Does this mean we have some gaps in RISCV testing?

fhahn · 2024-12-17T09:26:35Z

llvm/test/Transforms/LoopVectorize/RISCV/truncate-to-minimal-bitwidth-evl-crash.ll

+  %cmp111 = icmp sgt i32 %mvx, 0
+  br i1 %cmp111, label %for.body13.preheader, label %for.cond.cleanup12
+


is this needed?

I'd hope not, I'll stick this through another round of llvm-reduce and see if it can simplify any other parts

fhahn · 2024-12-17T09:26:57Z

llvm/test/Transforms/LoopVectorize/RISCV/truncate-to-minimal-bitwidth-evl-crash.ll

@@ -0,0 +1,31 @@
+; RUN: opt -passes=loop-vectorize -force-tail-folding-style=data-with-evl -prefer-predicate-over-epilogue=predicate-dont-vectorize -mtriple=riscv64 -mattr=+v -S %s


this should check the output, not just checking that it doesn't crash

fhahn · 2024-12-17T09:27:23Z

llvm/test/Transforms/LoopVectorize/RISCV/truncate-to-minimal-bitwidth-evl-crash.ll

+  br label %for.body13
+
+for.body13:                                       ; preds = %for.body13.preheader, %for.body13
+  %indvars.iv = phi i64 [ 0, %for.body13.preheader ], [ %indvars.iv.next, %for.body13 ]


Suggested change

%indvars.iv = phi i64 [ 0, %for.body13.preheader ], [ %indvars.iv.next, %for.body13 ]

%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body13 ]

fhahn · 2024-12-17T09:27:34Z

llvm/test/Transforms/LoopVectorize/RISCV/truncate-to-minimal-bitwidth-evl-crash.ll

+  %wide.trip.count = zext nneg i32 %mvx to i64
+  br label %for.body13
+
+for.body13:                                       ; preds = %for.body13.preheader, %for.body13


Suggested change

for.body13: ; preds = %for.body13.preheader, %for.body13

loop:

fhahn · 2024-12-17T09:27:43Z

llvm/test/Transforms/LoopVectorize/RISCV/truncate-to-minimal-bitwidth-evl-crash.ll

+  %exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count
+  br i1 %exitcond.not, label %for.cond.cleanup12, label %for.body13
+
+for.cond.cleanup12:                               ; preds = %for.body13, %entry


Suggested change

for.cond.cleanup12: ; preds = %for.body13, %entry

exit:

fhahn · 2024-12-17T09:27:56Z

llvm/test/Transforms/LoopVectorize/RISCV/truncate-to-minimal-bitwidth-evl-crash.ll

+  %conv36 = trunc i32 %shr35 to i8
+  store i8 %conv36, ptr %dst, align 1
+  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
+  %exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count


Suggested change

%exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count

%ec = icmp eq i64 %indvars.iv.next, %wide.trip.count

fhahn · 2024-12-17T09:28:03Z

llvm/test/Transforms/LoopVectorize/RISCV/truncate-to-minimal-bitwidth-evl-crash.ll

+
+for.body13:                                       ; preds = %for.body13.preheader, %for.body13
+  %indvars.iv = phi i64 [ 0, %for.body13.preheader ], [ %indvars.iv.next, %for.body13 ]
+  %arrayidx15 = getelementptr i8, ptr %src, i64 %indvars.iv


Suggested change

%arrayidx15 = getelementptr i8, ptr %src, i64 %indvars.iv

%gep.src = getelementptr i8, ptr %src, i64 %indvars.iv

fhahn · 2024-12-17T09:43:53Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

@@ -1506,18 +1506,17 @@ static void transformRecipestoEVLRecipes(VPlan &Plan, VPValue &EVL) {
                  })
              .Case<VPWidenCastRecipe>(
                  [&](VPWidenCastRecipe *CInst) -> VPRecipeBase * {


The name here is also confusing, as Inst implies an instructions, while this is a cast recipe... Better name it CastR

lukel97 · 2024-12-17T09:56:41Z

Does this mean we have some gaps in RISCV testing?

Most likely! There's a separate assertion failure in inferScalarTypes that I'm running into after this PR. I'm bisecting it currently, I think there's some sort of non-deterministic clobbering of the cached types going on?

In any case it might be a good idea to start giving any EVL-related PRs a quick build on llvm-test-suite/SPEC, or to keep an eye out on the RISC-V EVL buildbot: https://lab.llvm.org/buildbot/#/builders/132

fhahn · 2024-12-17T10:00:45Z

Could you share the IR causing the problem?

lukel97 · 2024-12-17T10:03:59Z

Could you share the IR causing the problem?

define void @hpel_filter(ptr %dstv, ptr %src, i64 %wide.trip.count) {
entry:
  br label %for.body4

for.cond.cleanup3.loopexit:                       ; preds = %for.body4
  ret void

for.body4:                                        ; preds = %for.body4, %entry
  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body4 ]
  %arrayidx13 = getelementptr i8, ptr %src, i64 %indvars.iv
  %0 = load i8, ptr %arrayidx13, align 1
  %conv14 = zext i8 %0 to i32
  %mul21.neg = mul i32 %conv14, 0
  %add33 = ashr i32 %conv14, 0
  %shr = or i32 %add33, 0
  %tobool.not.i = icmp ult i32 %conv14, 0
  %cond.i = select i1 %tobool.not.i, i32 %shr, i32 0
  %conv.i = trunc i32 %cond.i to i8
  store i8 %conv.i, ptr %dstv, align 1
  %conv36 = trunc i32 %mul21.neg to i16
  store i16 %conv36, ptr null, align 2
  %indvars.iv.next = add i64 %indvars.iv, 1
  %exitcond.not = icmp eq i64 %indvars.iv, %wide.trip.count
  br i1 %exitcond.not, label %for.cond.cleanup3.loopexit, label %for.body4
}

I'm able to reproduce the assertion with opt -disable-output -passes=loop-vectorize -mtriple riscv64 -mattr=+v -disable-output -force-tail-folding-style=data-with-evl -prefer-predicate-over-epilogue=predicate-dont-vectorize

FWIW I think this might be non-deterministic. Building opt with asan+ubsan seems to cause the assertion to go away, but nothing gets reported

lukel97 · 2024-12-17T10:12:10Z

Noting this down, I was able to bisect the inferScalarTypes assertion (not the assertion in this PR!) back to b759020

lukel97 · 2024-12-17T10:23:37Z

llvm/test/Transforms/LoopVectorize/RISCV/truncate-to-minimal-bitwidth-evl-crash.ll

+  %gep.src = getelementptr i8, ptr %src, i64 %iv
+  %0 = load i8, ptr %gep.src, align 1
+  %conv = zext i8 %0 to i32
+  %mul16 = mul i32 0, %conv


This is strange, but running instcombine on this seems to simplify it too much and then we don't get the assertion

lukel97 · 2024-12-17T15:54:57Z

Noting this down, I was able to bisect the inferScalarTypes assertion (not the assertion in this PR!) back to b759020

I've posted a fix for this in #120252

alexey-bataev

LG

fhahn

LGTM, thanks!

It looks like https://lab.llvm.org/buildbot/#/builders/132 isn't working properly? I only see a large number of old build requests, but no actual builds.

lukel97 · 2024-12-17T18:51:20Z

LGTM, thanks!

It looks like https://lab.llvm.org/buildbot/#/builders/132 isn't working properly? I only see a large number of old build requests, but no actual builds.

Woops, looks like the active builder is on staging now: https://lab.llvm.org/staging/#/builders/16

LiqinWeng · 2024-12-18T02:15:21Z

Thanks，LGTM:)

lukel97 requested review from alexey-bataev, arcbbb, fhahn, LiqinWeng and Mel-Chen December 17, 2024 07:18

llvmbot added vectorizers llvm:transforms labels Dec 17, 2024

Mel-Chen reviewed Dec 17, 2024

View reviewed changes

fhahn reviewed Dec 17, 2024

View reviewed changes

lukel97 added 3 commits December 17, 2024 16:00

Remove redundant IntegerType::get, remove redundant block in tests

6dd9322

Move to new test without FileCheck/UTC

b611446

Fix comment

f57e06a

fhahn reviewed Dec 17, 2024

View reviewed changes

lukel97 added 2 commits December 17, 2024 18:19

Tidy up test

4335c21

Rename CInst -> CastR

cf73213

lukel97 commented Dec 17, 2024

View reviewed changes

alexey-bataev approved these changes Dec 17, 2024

View reviewed changes

fhahn approved these changes Dec 17, 2024

View reviewed changes

lukel97 merged commit 4a7f60d into llvm:main Dec 18, 2024
8 checks passed

		VPValue *Mask = Plan.getOrAddLiveIn(
		ConstantInt::getTrue(IntegerType::getInt1Ty(Ctx)));

	define void @truncate_to_minimal_bitwidths_widen_cast_recipe(ptr noalias %dst, ptr noalias %src, i32 %mvx) {
	define void @truncate_to_minimal_bitwidths_widen_cast_recipe(ptr %dst, ptr %src, i32 %mvx) {

		for.cond.cleanup12.loopexit: ; preds = %for.body13
		br label %for.cond.cleanup12

		%cmp111 = icmp sgt i32 %mvx, 0
		br i1 %cmp111, label %for.body13.preheader, label %for.cond.cleanup12

		@@ -0,0 +1,31 @@
		; RUN: opt -passes=loop-vectorize -force-tail-folding-style=data-with-evl -prefer-predicate-over-epilogue=predicate-dont-vectorize -mtriple=riscv64 -mattr=+v -S %s

	%indvars.iv = phi i64 [ 0, %for.body13.preheader ], [ %indvars.iv.next, %for.body13 ]
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body13 ]

	for.body13: ; preds = %for.body13.preheader, %for.body13
	loop:

	%exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count
	%ec = icmp eq i64 %indvars.iv.next, %wide.trip.count

	%arrayidx15 = getelementptr i8, ptr %src, i64 %indvars.iv
	%gep.src = getelementptr i8, ptr %src, i64 %indvars.iv

[VPlan] Handle VPWidenCastRecipe without underlying value in EVL transform #120194

[VPlan] Handle VPWidenCastRecipe without underlying value in EVL transform #120194

Uh oh!

Conversation

lukel97 commented Dec 17, 2024

Uh oh!

llvmbot commented Dec 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lukel97 commented Dec 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fhahn commented Dec 17, 2024

Uh oh!

lukel97 commented Dec 17, 2024

Uh oh!

lukel97 commented Dec 17, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lukel97 commented Dec 17, 2024

Uh oh!

alexey-bataev left a comment

Choose a reason for hiding this comment

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

lukel97 commented Dec 17, 2024

Uh oh!

LiqinWeng commented Dec 18, 2024

Uh oh!

Uh oh!

Uh oh!

llvmbot commented Dec 17, 2024 •

edited

Loading

lukel97 commented Dec 17, 2024 •

edited

Loading