Skip to content

[VPlan] Handle VPWidenCastRecipe without underlying value in EVL transform #120194

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

lukel97
Copy link
Contributor

@lukel97 lukel97 commented Dec 17, 2024

This fixes a crash that shows up when building SPEC CPU 2017 with EVL tail folding on RISC-V.

A VPWidenCastRecipe doesn't always have an underlying value, and in the case of this crash this happens whenever a widened cast is created via truncateToMinimalBitwidths.

Fix this by just using the opcode stored in the recipe itself.

I think a similar issue exists with VPWidenIntrinsicRecipe and how it's widened, but I haven't run into any crashes with it just yet.

…sform

This fixes a crash that shows up when building SPEC CPU 2017 with EVL tail folding on RISC-V.

A VPWidenCastRecipe doesn't always have an underlying value, and in the case of this crash this happens whenever a widened cast is created via truncateToMinimalBitwidths.

Fix this by just using the opcode stored in the recipe itself.

I think a similar issue exists with VPWidenIntrinsicRecipe and how it's widened, but I haven't run into any crashes with it just yet.
@llvmbot
Copy link
Member

llvmbot commented Dec 17, 2024

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-vectorizers

Author: Luke Lau (lukel97)

Changes

This fixes a crash that shows up when building SPEC CPU 2017 with EVL tail folding on RISC-V.

A VPWidenCastRecipe doesn't always have an underlying value, and in the case of this crash this happens whenever a widened cast is created via truncateToMinimalBitwidths.

Fix this by just using the opcode stored in the recipe itself.

I think a similar issue exists with VPWidenIntrinsicRecipe and how it's widened, but I haven't run into any crashes with it just yet.


Full diff: https://github.com/llvm/llvm-project/pull/120194.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+3-4)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-cast-intrinsics.ll (+121)
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 9a3b82fe57c12a..066232775e46a7 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -1506,9 +1506,8 @@ static void transformRecipestoEVLRecipes(VPlan &Plan, VPValue &EVL) {
                   })
               .Case<VPWidenCastRecipe>(
                   [&](VPWidenCastRecipe *CInst) -> VPRecipeBase * {
-                    auto *CI = dyn_cast<CastInst>(CInst->getUnderlyingInstr());
                     Intrinsic::ID VPID =
-                        VPIntrinsic::getForOpcode(CI->getOpcode());
+                        VPIntrinsic::getForOpcode(CInst->getOpcode());
                     assert(VPID != Intrinsic::not_intrinsic &&
                            "Expected vp.casts Instrinsic");
 
@@ -1516,8 +1515,8 @@ static void transformRecipestoEVLRecipes(VPlan &Plan, VPValue &EVL) {
                     assert(VPIntrinsic::getMaskParamPos(VPID) &&
                            VPIntrinsic::getVectorLengthParamPos(VPID) &&
                            "Expected VP intrinsic");
-                    VPValue *Mask = Plan.getOrAddLiveIn(ConstantInt::getTrue(
-                        IntegerType::getInt1Ty(CI->getContext())));
+                    VPValue *Mask = Plan.getOrAddLiveIn(
+                        ConstantInt::getTrue(IntegerType::getInt1Ty(Ctx)));
                     Ops.push_back(Mask);
                     Ops.push_back(&EVL);
                     return new VPWidenIntrinsicRecipe(
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-cast-intrinsics.ll b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-cast-intrinsics.ll
index 4557e95f1e1b6a..48f9cec34d974f 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-cast-intrinsics.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-cast-intrinsics.ll
@@ -1058,6 +1058,125 @@ loop:
 exit:
   ret void
 }
+
+define void @truncate_to_minimal_bitwidths_widen_cast_recipe(ptr noalias %dst, ptr noalias %src, i32 %mvx) {
+; IF-EVL-LABEL: define void @truncate_to_minimal_bitwidths_widen_cast_recipe(
+; IF-EVL-SAME: ptr noalias [[DST:%.*]], ptr noalias [[SRC:%.*]], i32 [[MVX:%.*]]) #[[ATTR0]] {
+; IF-EVL-NEXT:  [[ENTRY:.*:]]
+; IF-EVL-NEXT:    [[CMP111:%.*]] = icmp sgt i32 [[MVX]], 0
+; IF-EVL-NEXT:    br i1 [[CMP111]], label %[[FOR_BODY13_PREHEADER:.*]], label %[[FOR_COND_CLEANUP12:.*]]
+; IF-EVL:       [[FOR_BODY13_PREHEADER]]:
+; IF-EVL-NEXT:    [[WIDE_TRIP_COUNT:%.*]] = zext nneg i32 [[MVX]] to i64
+; IF-EVL-NEXT:    br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; IF-EVL:       [[VECTOR_PH]]:
+; IF-EVL-NEXT:    [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
+; IF-EVL-NEXT:    [[TMP1:%.*]] = mul i64 [[TMP0]], 16
+; IF-EVL-NEXT:    [[TMP2:%.*]] = sub i64 [[TMP1]], 1
+; IF-EVL-NEXT:    [[N_RND_UP:%.*]] = add i64 [[WIDE_TRIP_COUNT]], [[TMP2]]
+; IF-EVL-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP1]]
+; IF-EVL-NEXT:    [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
+; IF-EVL-NEXT:    [[TMP3:%.*]] = call i64 @llvm.vscale.i64()
+; IF-EVL-NEXT:    [[TMP4:%.*]] = mul i64 [[TMP3]], 16
+; IF-EVL-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 16 x i32> poison, i32 [[MVX]], i64 0
+; IF-EVL-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 16 x i32> [[BROADCAST_SPLATINSERT]], <vscale x 16 x i32> poison, <vscale x 16 x i32> zeroinitializer
+; IF-EVL-NEXT:    [[TMP5:%.*]] = trunc <vscale x 16 x i32> [[BROADCAST_SPLAT]] to <vscale x 16 x i16>
+; IF-EVL-NEXT:    [[BROADCAST_SPLATINSERT2:%.*]] = insertelement <vscale x 16 x ptr> poison, ptr [[DST]], i64 0
+; IF-EVL-NEXT:    [[BROADCAST_SPLAT3:%.*]] = shufflevector <vscale x 16 x ptr> [[BROADCAST_SPLATINSERT2]], <vscale x 16 x ptr> poison, <vscale x 16 x i32> zeroinitializer
+; IF-EVL-NEXT:    br label %[[VECTOR_BODY:.*]]
+; IF-EVL:       [[VECTOR_BODY]]:
+; IF-EVL-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; IF-EVL-NEXT:    [[EVL_BASED_IV:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_EVL_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; IF-EVL-NEXT:    [[AVL:%.*]] = sub i64 [[WIDE_TRIP_COUNT]], [[EVL_BASED_IV]]
+; IF-EVL-NEXT:    [[TMP6:%.*]] = call i32 @llvm.experimental.get.vector.length.i64(i64 [[AVL]], i32 16, i1 true)
+; IF-EVL-NEXT:    [[TMP7:%.*]] = add i64 [[EVL_BASED_IV]], 0
+; IF-EVL-NEXT:    [[TMP8:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP7]]
+; IF-EVL-NEXT:    [[TMP9:%.*]] = getelementptr i8, ptr [[TMP8]], i32 0
+; IF-EVL-NEXT:    [[VP_OP_LOAD:%.*]] = call <vscale x 16 x i8> @llvm.vp.load.nxv16i8.p0(ptr align 1 [[TMP9]], <vscale x 16 x i1> splat (i1 true), i32 [[TMP6]])
+; IF-EVL-NEXT:    [[TMP10:%.*]] = call <vscale x 16 x i16> @llvm.vp.zext.nxv16i16.nxv16i8(<vscale x 16 x i8> [[VP_OP_LOAD]], <vscale x 16 x i1> splat (i1 true), i32 [[TMP6]])
+; IF-EVL-NEXT:    [[VP_OP:%.*]] = call <vscale x 16 x i16> @llvm.vp.mul.nxv16i16(<vscale x 16 x i16> [[TMP5]], <vscale x 16 x i16> [[TMP10]], <vscale x 16 x i1> splat (i1 true), i32 [[TMP6]])
+; IF-EVL-NEXT:    [[VP_OP1:%.*]] = call <vscale x 16 x i16> @llvm.vp.lshr.nxv16i16(<vscale x 16 x i16> [[VP_OP]], <vscale x 16 x i16> trunc (<vscale x 16 x i32> splat (i32 1) to <vscale x 16 x i16>), <vscale x 16 x i1> splat (i1 true), i32 [[TMP6]])
+; IF-EVL-NEXT:    [[TMP11:%.*]] = call <vscale x 16 x i8> @llvm.vp.trunc.nxv16i8.nxv16i16(<vscale x 16 x i16> [[VP_OP1]], <vscale x 16 x i1> splat (i1 true), i32 [[TMP6]])
+; IF-EVL-NEXT:    call void @llvm.vp.scatter.nxv16i8.nxv16p0(<vscale x 16 x i8> [[TMP11]], <vscale x 16 x ptr> align 1 [[BROADCAST_SPLAT3]], <vscale x 16 x i1> splat (i1 true), i32 [[TMP6]])
+; IF-EVL-NEXT:    [[TMP12:%.*]] = zext i32 [[TMP6]] to i64
+; IF-EVL-NEXT:    [[INDEX_EVL_NEXT]] = add nuw i64 [[TMP12]], [[EVL_BASED_IV]]
+; IF-EVL-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP4]]
+; IF-EVL-NEXT:    [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; IF-EVL-NEXT:    br i1 [[TMP13]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP47:![0-9]+]]
+; IF-EVL:       [[MIDDLE_BLOCK]]:
+; IF-EVL-NEXT:    br i1 true, label %[[FOR_COND_CLEANUP12_LOOPEXIT:.*]], label %[[SCALAR_PH]]
+; IF-EVL:       [[SCALAR_PH]]:
+; IF-EVL-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], %[[MIDDLE_BLOCK]] ], [ 0, %[[FOR_BODY13_PREHEADER]] ]
+; IF-EVL-NEXT:    br label %[[FOR_BODY13:.*]]
+; IF-EVL:       [[FOR_COND_CLEANUP12_LOOPEXIT]]:
+; IF-EVL-NEXT:    br label %[[FOR_COND_CLEANUP12]]
+; IF-EVL:       [[FOR_COND_CLEANUP12]]:
+; IF-EVL-NEXT:    ret void
+; IF-EVL:       [[FOR_BODY13]]:
+; IF-EVL-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY13]] ]
+; IF-EVL-NEXT:    [[ARRAYIDX15:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[INDVARS_IV]]
+; IF-EVL-NEXT:    [[TMP14:%.*]] = load i8, ptr [[ARRAYIDX15]], align 1
+; IF-EVL-NEXT:    [[CONV:%.*]] = zext i8 [[TMP14]] to i32
+; IF-EVL-NEXT:    [[MUL16:%.*]] = mul i32 [[MVX]], [[CONV]]
+; IF-EVL-NEXT:    [[SHR35:%.*]] = lshr i32 [[MUL16]], 1
+; IF-EVL-NEXT:    [[CONV36:%.*]] = trunc i32 [[SHR35]] to i8
+; IF-EVL-NEXT:    store i8 [[CONV36]], ptr [[DST]], align 1
+; IF-EVL-NEXT:    [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
+; IF-EVL-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[WIDE_TRIP_COUNT]]
+; IF-EVL-NEXT:    br i1 [[EXITCOND_NOT]], label %[[FOR_COND_CLEANUP12_LOOPEXIT]], label %[[FOR_BODY13]], !llvm.loop [[LOOP48:![0-9]+]]
+;
+; NO-VP-LABEL: define void @truncate_to_minimal_bitwidths_widen_cast_recipe(
+; NO-VP-SAME: ptr noalias [[DST:%.*]], ptr noalias [[SRC:%.*]], i32 [[MVX:%.*]]) #[[ATTR0]] {
+; NO-VP-NEXT:  [[ENTRY:.*:]]
+; NO-VP-NEXT:    [[CMP111:%.*]] = icmp sgt i32 [[MVX]], 0
+; NO-VP-NEXT:    br i1 [[CMP111]], label %[[FOR_BODY13_PREHEADER:.*]], label %[[FOR_COND_CLEANUP12:.*]]
+; NO-VP:       [[FOR_BODY13_PREHEADER]]:
+; NO-VP-NEXT:    [[WIDE_TRIP_COUNT:%.*]] = zext nneg i32 [[MVX]] to i64
+; NO-VP-NEXT:    br label %[[FOR_BODY13:.*]]
+; NO-VP:       [[FOR_COND_CLEANUP12_LOOPEXIT:.*]]:
+; NO-VP-NEXT:    br label %[[FOR_COND_CLEANUP12]]
+; NO-VP:       [[FOR_COND_CLEANUP12]]:
+; NO-VP-NEXT:    ret void
+; NO-VP:       [[FOR_BODY13]]:
+; NO-VP-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ 0, %[[FOR_BODY13_PREHEADER]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY13]] ]
+; NO-VP-NEXT:    [[ARRAYIDX15:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[INDVARS_IV]]
+; NO-VP-NEXT:    [[TMP0:%.*]] = load i8, ptr [[ARRAYIDX15]], align 1
+; NO-VP-NEXT:    [[CONV:%.*]] = zext i8 [[TMP0]] to i32
+; NO-VP-NEXT:    [[MUL16:%.*]] = mul i32 [[MVX]], [[CONV]]
+; NO-VP-NEXT:    [[SHR35:%.*]] = lshr i32 [[MUL16]], 1
+; NO-VP-NEXT:    [[CONV36:%.*]] = trunc i32 [[SHR35]] to i8
+; NO-VP-NEXT:    store i8 [[CONV36]], ptr [[DST]], align 1
+; NO-VP-NEXT:    [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
+; NO-VP-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[WIDE_TRIP_COUNT]]
+; NO-VP-NEXT:    br i1 [[EXITCOND_NOT]], label %[[FOR_COND_CLEANUP12_LOOPEXIT]], label %[[FOR_BODY13]]
+;
+entry:
+  %cmp111 = icmp sgt i32 %mvx, 0
+  br i1 %cmp111, label %for.body13.preheader, label %for.cond.cleanup12
+
+for.body13.preheader:                             ; preds = %entry
+  %wide.trip.count = zext nneg i32 %mvx to i64
+  br label %for.body13
+
+for.cond.cleanup12.loopexit:                      ; preds = %for.body13
+  br label %for.cond.cleanup12
+
+for.cond.cleanup12:                               ; preds = %for.cond.cleanup12.loopexit, %entry
+  ret void
+
+for.body13:                                       ; preds = %for.body13.preheader, %for.body13
+  %indvars.iv = phi i64 [ 0, %for.body13.preheader ], [ %indvars.iv.next, %for.body13 ]
+  %arrayidx15 = getelementptr i8, ptr %src, i64 %indvars.iv
+  %0 = load i8, ptr %arrayidx15, align 1
+  %conv = zext i8 %0 to i32
+  %mul16 = mul i32 %mvx, %conv
+  %shr35 = lshr i32 %mul16, 1
+  %conv36 = trunc i32 %shr35 to i8
+  store i8 %conv36, ptr %dst, align 1
+  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
+  %exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count
+  br i1 %exitcond.not, label %for.cond.cleanup12.loopexit, label %for.body13
+}
+
 ;.
 ; IF-EVL: [[META0]] = !{[[META1:![0-9]+]]}
 ; IF-EVL: [[META1]] = distinct !{[[META1]], [[META2:![0-9]+]]}
@@ -1106,4 +1225,6 @@ exit:
 ; IF-EVL: [[LOOP44]] = distinct !{[[LOOP44]], [[META6]]}
 ; IF-EVL: [[LOOP45]] = distinct !{[[LOOP45]], [[META6]], [[META7]]}
 ; IF-EVL: [[LOOP46]] = distinct !{[[LOOP46]], [[META6]]}
+; IF-EVL: [[LOOP47]] = distinct !{[[LOOP47]], [[META6]], [[META7]]}
+; IF-EVL: [[LOOP48]] = distinct !{[[LOOP48]], [[META7]], [[META6]]}
 ;.

Comment on lines 1518 to 1519
VPValue *Mask = Plan.getOrAddLiveIn(
ConstantInt::getTrue(IntegerType::getInt1Ty(Ctx)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConstantInt::getTrue(Ctx));

@@ -1058,6 +1058,125 @@ loop:
exit:
ret void
}

define void @truncate_to_minimal_bitwidths_widen_cast_recipe(ptr noalias %dst, ptr noalias %src, i32 %mvx) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
define void @truncate_to_minimal_bitwidths_widen_cast_recipe(ptr noalias %dst, ptr noalias %src, i32 %mvx) {
define void @truncate_to_minimal_bitwidths_widen_cast_recipe(ptr %dst, ptr %src, i32 %mvx) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was under the impression that we preferred noalias to avoid the memcheck block if we weren't explicitly testing for it? E.g. I remember seeing #107225 do it as a cleanup

Comment on lines 1160 to 1161
for.cond.cleanup12.loopexit: ; preds = %for.body13
br label %for.cond.cleanup12
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean up this bb if possible.


define void @truncate_to_minimal_bitwidths_widen_cast_recipe(ptr noalias %dst, ptr noalias %src, i32 %mvx) {
; IF-EVL-LABEL: define void @truncate_to_minimal_bitwidths_widen_cast_recipe(
; IF-EVL-SAME: ptr noalias [[DST:%.*]], ptr noalias [[SRC:%.*]], i32 [[MVX:%.*]]) #[[ATTR0]] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move to a dedicated file with a descriptive name. The test file is already far too big

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue was also pointed out when reviewing #119510, but it would probably be better to fix it separately as in this PR, especially as it comes with a test case.

Does this mean we have some gaps in RISCV testing?

Comment on lines 9 to 11
%cmp111 = icmp sgt i32 %mvx, 0
br i1 %cmp111, label %for.body13.preheader, label %for.cond.cleanup12

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd hope not, I'll stick this through another round of llvm-reduce and see if it can simplify any other parts

@@ -0,0 +1,31 @@
; RUN: opt -passes=loop-vectorize -force-tail-folding-style=data-with-evl -prefer-predicate-over-epilogue=predicate-dont-vectorize -mtriple=riscv64 -mattr=+v -S %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should check the output, not just checking that it doesn't crash

br label %for.body13

for.body13: ; preds = %for.body13.preheader, %for.body13
%indvars.iv = phi i64 [ 0, %for.body13.preheader ], [ %indvars.iv.next, %for.body13 ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
%indvars.iv = phi i64 [ 0, %for.body13.preheader ], [ %indvars.iv.next, %for.body13 ]
%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body13 ]

%wide.trip.count = zext nneg i32 %mvx to i64
br label %for.body13

for.body13: ; preds = %for.body13.preheader, %for.body13
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for.body13: ; preds = %for.body13.preheader, %for.body13
loop:

%exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count
br i1 %exitcond.not, label %for.cond.cleanup12, label %for.body13

for.cond.cleanup12: ; preds = %for.body13, %entry
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for.cond.cleanup12: ; preds = %for.body13, %entry
exit:

%conv36 = trunc i32 %shr35 to i8
store i8 %conv36, ptr %dst, align 1
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
%exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count
%ec = icmp eq i64 %indvars.iv.next, %wide.trip.count


for.body13: ; preds = %for.body13.preheader, %for.body13
%indvars.iv = phi i64 [ 0, %for.body13.preheader ], [ %indvars.iv.next, %for.body13 ]
%arrayidx15 = getelementptr i8, ptr %src, i64 %indvars.iv
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
%arrayidx15 = getelementptr i8, ptr %src, i64 %indvars.iv
%gep.src = getelementptr i8, ptr %src, i64 %indvars.iv

@@ -1506,18 +1506,17 @@ static void transformRecipestoEVLRecipes(VPlan &Plan, VPValue &EVL) {
})
.Case<VPWidenCastRecipe>(
[&](VPWidenCastRecipe *CInst) -> VPRecipeBase * {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name here is also confusing, as Inst implies an instructions, while this is a cast recipe... Better name it CastR

@lukel97
Copy link
Contributor Author

lukel97 commented Dec 17, 2024

Does this mean we have some gaps in RISCV testing?

Most likely! There's a separate assertion failure in inferScalarTypes that I'm running into after this PR. I'm bisecting it currently, I think there's some sort of non-deterministic clobbering of the cached types going on?

In any case it might be a good idea to start giving any EVL-related PRs a quick build on llvm-test-suite/SPEC, or to keep an eye out on the RISC-V EVL buildbot: https://lab.llvm.org/buildbot/#/builders/132

@fhahn
Copy link
Contributor

fhahn commented Dec 17, 2024

Could you share the IR causing the problem?

@lukel97
Copy link
Contributor Author

lukel97 commented Dec 17, 2024

Could you share the IR causing the problem?

define void @hpel_filter(ptr %dstv, ptr %src, i64 %wide.trip.count) {
entry:
  br label %for.body4

for.cond.cleanup3.loopexit:                       ; preds = %for.body4
  ret void

for.body4:                                        ; preds = %for.body4, %entry
  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body4 ]
  %arrayidx13 = getelementptr i8, ptr %src, i64 %indvars.iv
  %0 = load i8, ptr %arrayidx13, align 1
  %conv14 = zext i8 %0 to i32
  %mul21.neg = mul i32 %conv14, 0
  %add33 = ashr i32 %conv14, 0
  %shr = or i32 %add33, 0
  %tobool.not.i = icmp ult i32 %conv14, 0
  %cond.i = select i1 %tobool.not.i, i32 %shr, i32 0
  %conv.i = trunc i32 %cond.i to i8
  store i8 %conv.i, ptr %dstv, align 1
  %conv36 = trunc i32 %mul21.neg to i16
  store i16 %conv36, ptr null, align 2
  %indvars.iv.next = add i64 %indvars.iv, 1
  %exitcond.not = icmp eq i64 %indvars.iv, %wide.trip.count
  br i1 %exitcond.not, label %for.cond.cleanup3.loopexit, label %for.body4
}

I'm able to reproduce the assertion with opt -disable-output -passes=loop-vectorize -mtriple riscv64 -mattr=+v -disable-output -force-tail-folding-style=data-with-evl -prefer-predicate-over-epilogue=predicate-dont-vectorize

FWIW I think this might be non-deterministic. Building opt with asan+ubsan seems to cause the assertion to go away, but nothing gets reported

@lukel97
Copy link
Contributor Author

lukel97 commented Dec 17, 2024

Noting this down, I was able to bisect the inferScalarTypes assertion (not the assertion in this PR!) back to b759020

%gep.src = getelementptr i8, ptr %src, i64 %iv
%0 = load i8, ptr %gep.src, align 1
%conv = zext i8 %0 to i32
%mul16 = mul i32 0, %conv
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is strange, but running instcombine on this seems to simplify it too much and then we don't get the assertion

@lukel97
Copy link
Contributor Author

lukel97 commented Dec 17, 2024

Noting this down, I was able to bisect the inferScalarTypes assertion (not the assertion in this PR!) back to b759020

I've posted a fix for this in #120252

Copy link
Member

@alexey-bataev alexey-bataev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

It looks like https://lab.llvm.org/buildbot/#/builders/132 isn't working properly? I only see a large number of old build requests, but no actual builds.

@lukel97
Copy link
Contributor Author

lukel97 commented Dec 17, 2024

LGTM, thanks!

It looks like https://lab.llvm.org/buildbot/#/builders/132 isn't working properly? I only see a large number of old build requests, but no actual builds.

Woops, looks like the active builder is on staging now: https://lab.llvm.org/staging/#/builders/16

@LiqinWeng
Copy link
Contributor

Thanks,LGTM:)

@lukel97 lukel97 merged commit 4a7f60d into llvm:main Dec 18, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants