Skip to content

[VPlan] Add new VPScalarCastRecipe, use for IV & step trunc. #78113

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jan 26, 2024
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2297,6 +2297,7 @@ emitTransformedIndex(IRBuilderBase &B, Value *Index, Value *StartValue,
? B.CreateSExtOrTrunc(Index, StepTy)
: B.CreateCast(Instruction::SIToFP, Index, StepTy);
if (CastedIndex != Index) {
assert(!isa<SExtInst>(CastedIndex));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: error message missing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was left over from testing, removed!

CastedIndex->setName(CastedIndex->getName() + ".cast");
Index = CastedIndex;
}
Expand Down
27 changes: 27 additions & 0 deletions llvm/lib/Transforms/Vectorize/VPlan.h
Original file line number Diff line number Diff line change
Expand Up @@ -859,6 +859,7 @@ class VPSingleDefRecipe : public VPRecipeBase, public VPValue {
case VPRecipeBase::VPWidenIntOrFpInductionSC:
case VPRecipeBase::VPWidenPointerInductionSC:
case VPRecipeBase::VPReductionPHISC:
case VPRecipeBase::VPScalarCastSC:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (independent of this patch): would be good to keep in order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, will do separately.

return true;
case VPRecipeBase::VPInterleaveSC:
case VPRecipeBase::VPBranchOnMaskSC:
Expand Down Expand Up @@ -1338,6 +1339,32 @@ class VPWidenCastRecipe : public VPRecipeWithIRFlags {
Type *getResultType() const { return ResultTy; }
};

/// VPScalarCastRecipe is a recipe to create scalar cast instructions.
class VPScalarCastRecipe : public VPSingleDefRecipe {
Instruction::CastOps Opcode;

/// Result type for the cast.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: comment adds little knowledge if any.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, thanks!

Type *ResultTy;

Value *generate(VPTransformState &State, unsigned Part);

public:
VPScalarCastRecipe(Instruction::CastOps Opcode, VPValue *Op, Type *ResultTy)
: VPSingleDefRecipe(VPDef::VPScalarCastSC, {Op}), Opcode(Opcode),
ResultTy(ResultTy) {}

~VPScalarCastRecipe() override = default;

VP_CLASSOF_IMPL(VPDef::VPScalarCastSC)

void execute(VPTransformState &State) override;

#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
void print(raw_ostream &O, const Twine &Indent,
VPSlotTracker &SlotTracker) const override;
#endif
};

/// A recipe for widening Call instructions.
class VPWidenCallRecipe : public VPSingleDefRecipe {
/// ID of the vector intrinsic to call when widening the call. If set the
Expand Down
6 changes: 5 additions & 1 deletion llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -230,7 +230,11 @@ Type *VPTypeAnalysis::inferScalarType(const VPValue *V) {
return V->getUnderlyingValue()->getType();
})
.Case<VPWidenCastRecipe>(
[](const VPWidenCastRecipe *R) { return R->getResultType(); });
[](const VPWidenCastRecipe *R) { return R->getResultType(); })
.Case<VPExpandSCEVRecipe>([](const VPExpandSCEVRecipe *R) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding Expand SCEV case to inferScalarType is independent of this patch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is needed to infer the type of the step in the new code in VPlanTransforms.cpp (if (TypeInfo.inferScalarType(BaseIV) != TypeInfo.inferScalarType(Step)) {)).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So now that VPDerivedIVRecipe simply mirrors the type of its first start-value operand, should it join VPCanonicalIVPHIRecipe et al.'s or VPPredInstPHIRecipe et al.'s cases above and remove its getScalarType() method? Possibly as a follow-up.

Should VPWidenIntOrFpInductionRecipe's trunc also utilize VPScalarCastRecipe, and retire its getScalarType(), possibly as a follow-up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VPDerivedIVRecipe getScalarType would be good to remove, will do as follow-up.

VPWidenIntOrFpInductionRecipe's trunc drives truncating the start value and step, effectively creating a narrow phi. I'll check to see if this can also be modeled with the new recipe.

return R->getSCEV()->getType();
});

assert(ResultTy && "could not infer type for the given VPValue");
CachedTypes[V] = ResultTy;
return ResultTy;
Expand Down
50 changes: 43 additions & 7 deletions llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,7 @@ bool VPRecipeBase::mayHaveSideEffects() const {
switch (getVPDefID()) {
case VPDerivedIVSC:
case VPPredInstPHISC:
case VPScalarCastSC:
return false;
case VPInstructionSC:
switch (cast<VPInstruction>(this)->getOpcode()) {
Expand Down Expand Up @@ -1117,13 +1118,7 @@ void VPScalarIVStepsRecipe::execute(VPTransformState &State) {

// Ensure step has the same type as that of scalar IV.
Type *BaseIVTy = BaseIV->getType()->getScalarType();
if (BaseIVTy != Step->getType()) {
// TODO: Also use VPDerivedIVRecipe when only the step needs truncating, to
// avoid separate truncate here.
assert(Step->getType()->isIntegerTy() &&
"Truncation requires an integer step");
Step = State.Builder.CreateTrunc(Step, BaseIVTy);
}
assert(BaseIVTy == Step->getType() && "Types of BaseIV and Step must match!");

// We build scalar steps for both integer and floating-point induction
// variables. Here, we determine the kind of arithmetic we will perform.
Expand Down Expand Up @@ -1469,6 +1464,47 @@ void VPReplicateRecipe::print(raw_ostream &O, const Twine &Indent,
}
#endif

Value *VPScalarCastRecipe ::generate(VPTransformState &State, unsigned Part) {
assert(vputils::onlyFirstLaneUsed(this) &&
"Codegen only implemented for first lane.");
switch (Opcode) {
case Instruction::SExt:
case Instruction::ZExt:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only Trunc is currently being used, still.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, could remove if preferred, but State.Builder.CreateCast generically supports any cast

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to either remove or make a note that SExt and ZExt are currently unused/dead cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a note, thanks!

case Instruction::Trunc: {
Value *Op = State.get(getOperand(0), VPIteration(Part, 0));
return State.Builder.CreateCast(Instruction::CastOps(Opcode), Op, ResultTy);
}
default:
llvm_unreachable("opcode not implemented yet");
}
}

void VPScalarCastRecipe ::execute(VPTransformState &State) {
bool UniformAcrossUFs = all_of(operands(), [](VPValue *Op) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bool UniformAcrossUFs = all_of(operands(), [](VPValue *Op) {
bool LoopInvariant = all_of(operands(), [](VPValue *Op) {

this checks more than uniformity across UFs.

And if it is invariance we're checking, we can alternatively check if the recipe itself resides outside vector regions (e.g., has been LICM'd), rather than all of its operands.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest version extends the patch to use it also to truncate VPDerivedIVRecipe, if needed. For that, it also checks if the operands are VPDerivedIV/VPCanonicalIVPHIs, which should be considered uniform across all VFs & UFs, unless I am missing something.

return Op->isDefinedOutsideVectorRegions();
});
for (unsigned Part = 0; Part != State.UF; ++Part) {
Value *Res;
// Only generate a single instance, if the recipe is uniform across all UFs.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a single instance is going to be generated, in the preheader? Recipe is uniform across all UFs by (current) definition (and even more so - invariant). Can assert UniformAcrossUFs holds, along with onlyFirstLaneUsed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation here is intentionally kept generic for any cast where only the first lane is used, so it can be used directly in #76172

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently untested / dead code.

The value returned by an EVL recipe should arguably be of the same type as that of its first operand, even if the underlying intrinsic it currently wraps insists on always returning an i32? I.e., that case should arguably be fixed by performing a trunc/sext/zext as needed when providing the adequate EVL value given a current Phi/remaining-trip-count values, to facilitate adding/subtracting them together.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated version of the patch now also uses it for truncating derived IV/canonical IV if needed. Those are still uniform across VFs & UFs, but not invariant.

I suppose EVL at the moment can also be considered uniform across VFs and UFs.

Both the if/else sides should be executed by tests.

if (Part > 0 && UniformAcrossUFs)
Res = State.get(this, VPIteration(0, 0));
else
Res = generate(State, Part);
State.set(this, Res, VPIteration(Part, 0));
}
}

#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
void VPScalarCastRecipe ::print(raw_ostream &O, const Twine &Indent,
VPSlotTracker &SlotTracker) const {
O << Indent << "SCALAR-CAST ";
printAsOperand(O, SlotTracker);
O << " = " << Instruction::getOpcodeName(Opcode) << " ";
printOperands(O, SlotTracker);
O << " to " << *ResultTy;
}
#endif

void VPBranchOnMaskRecipe::execute(VPTransformState &State) {
assert(State.Instance && "Branch on Mask works only on single instance.");

Expand Down
9 changes: 9 additions & 0 deletions llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -504,6 +504,15 @@ static VPValue *createScalarIVSteps(VPlan &Plan, const InductionDescriptor &ID,
HeaderVPBB->insert(BaseIV->getDefiningRecipe(), IP);
}

VPTypeAnalysis TypeInfo(SE.getContext());
if (TypeInfo.inferScalarType(BaseIV) != TypeInfo.inferScalarType(Step)) {
Step = new VPScalarCastRecipe(Instruction::Trunc, Step,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this recipe only generate Trunc? No need to hold an Opcode, can be renamed VPScalarTruncRecipe. Assert that the type sizes correspond to Trunc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, here the only use-case is Trunc, but the implementation of the recipe is intentionally kept generic for any cast where only the first lane is used, so it can be used directly in #76172

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assert that the type sizes correspond to Trunc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, thanks!

TypeInfo.inferScalarType(BaseIV));
auto *VecPreheader =
cast<VPBasicBlock>(Plan.getVectorLoopRegion()->getSinglePredecessor());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto *VecPreheader =
cast<VPBasicBlock>(Plan.getVectorLoopRegion()->getSinglePredecessor());
auto *VecPreheader =
cast<VPBasicBlock>(HeaderVPBB->getSingleHierarchicalPredecessor());

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

VecPreheader->appendRecipe(Step->getDefiningRecipe());
}

VPScalarIVStepsRecipe *Steps = new VPScalarIVStepsRecipe(ID, BaseIV, Step);
HeaderVPBB->insert(Steps, IP);
return Steps;
Expand Down
1 change: 1 addition & 0 deletions llvm/lib/Transforms/Vectorize/VPlanValue.h
Original file line number Diff line number Diff line change
Expand Up @@ -350,6 +350,7 @@ class VPDef {
VPInterleaveSC,
VPReductionSC,
VPReplicateSC,
VPScalarCastSC,
VPScalarIVStepsSC,
VPVectorPointerSC,
VPWidenCallSC,
Expand Down
4 changes: 3 additions & 1 deletion llvm/test/Transforms/LoopVectorize/cast-induction.ll
Original file line number Diff line number Diff line change
Expand Up @@ -83,12 +83,14 @@ define void @cast_variable_step(i64 %step) {
; VF4: middle.block:
;
; IC2-LABEL: @cast_variable_step(
; IC2: [[TRUNC_STEP:%.+]] = trunc i64 %step to i32
; IC2: br label %vector.body

; IC2-LABEL: vector.body:
; IC2-NEXT: [[INDEX:%.+]] = phi i64 [ 0, %vector.ph ]
; IC2-NEXT: [[MUL:%.+]] = mul i64 %index, %step
; IC2-NEXT: [[OFFSET_IDX:%.+]] = add i64 10, [[MUL]]
; IC2-NEXT: [[TRUNC_OFF:%.+]] = trunc i64 [[OFFSET_IDX]] to i32
; IC2-NEXT: [[TRUNC_STEP:%.+]] = trunc i64 %step to i32
; IC2-NEXT: [[STEP0:%.+]] = mul i32 0, [[TRUNC_STEP]]
; IC2-NEXT: [[T0:%.+]] = add i32 [[TRUNC_OFF]], [[STEP0]]
; IC2-NEXT: [[STEP1:%.+]] = mul i32 1, [[TRUNC_STEP]]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -184,14 +184,15 @@ exit:
; DBG-NEXT: No successors
; DBG-EMPTY:
; DBG-NEXT: vector.ph:
; DBG-NEXT: SCALAR-CAST vp<[[CAST:%.+]]> = trunc ir<1> to i32
; DBG-NEXT: Successor(s): vector loop
; DBG-EMPTY:
; DBG-NEXT: <x1> vector loop: {
; DBG-NEXT: vector.body:
; DBG-NEXT: EMIT vp<[[CAN_IV:%.+]]> = CANONICAL-INDUCTION
; DBG-NEXT: FIRST-ORDER-RECURRENCE-PHI ir<%for> = phi ir<0>, vp<[[SCALAR_STEPS:.+]]>
; DBG-NEXT: vp<[[DERIVED_IV:%.+]]> = DERIVED-IV ir<0> + vp<[[CAN_IV]]> * ir<1> (truncated to i32)
; DBG-NEXT: vp<[[SCALAR_STEPS]]> = SCALAR-STEPS vp<[[DERIVED_IV]]>, ir<1>
; DBG-NEXT: vp<[[SCALAR_STEPS]]> = SCALAR-STEPS vp<[[DERIVED_IV]]>, vp<[[CAST]]>
; DBG-NEXT: EMIT vp<[[SPLICE:%.+]]> = first-order splice ir<%for>, vp<[[SCALAR_STEPS]]>
; DBG-NEXT: CLONE store vp<[[SPLICE]]>, ir<%dst>
; DBG-NEXT: EMIT vp<[[IV_INC:%.+]]> = add nuw vp<[[CAN_IV]]>, vp<[[VFxUF]]>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ define void @test(i16 %x, i64 %y, ptr %ptr) {
; CHECK-NEXT: [[V3:%.*]] = add i8 [[V2]], 1
; CHECK-NEXT: [[CMP15:%.*]] = icmp slt i8 [[V3]], 5
; CHECK-NEXT: [[IV_NEXT]] = add i64 [[IV]], [[INC]]
; CHECK-NEXT: br i1 [[CMP15]], label [[LOOP]], label [[LOOP_EXIT]], !llvm.loop [[LOOP2:![0-9]+]]
; CHECK-NEXT: br i1 [[CMP15]], label [[LOOP]], label [[LOOP_EXIT]], !llvm.loop [[LOOP3:![0-9]+]]
; CHECK: loop.exit:
; CHECK-NEXT: [[DIV_1:%.*]] = udiv i64 [[Y]], [[ADD]]
; CHECK-NEXT: [[V1:%.*]] = add i64 [[DIV_1]], 1
Expand Down