[aie][aie2p] Global Isel : trunc vector when size(src)/size(dst) = 4 #496

newling · 2025-06-20T06:11:16Z

Add instruction selection tablegen pattern for

 %1 = trunc <32 x i32> %0 to <32 x i8>

It is based on existing patterns for

 %1 = trunc <32 x i32> %0 to <32 x i16>

and

%1 = trunc <32 x i16> %0 to <32 x i8>

The final asm with this new lowering for trunc <32 x i32> %0 to <32 x i8> is

     vlda     x0, [p0, #0];          vldb     x1, [p0, #64]
     [...]
     ret     lr;             vshuffle        x2, x0, x1, r0
     mova    r1, #0;         vshuffle        x0, x0, x1, r0  //  Delay Slot 5
     vshuffle        x0, x0, x2, r1                  //  Delay Slot 4
     [...]

Questions for reviewers:

I couldn't find a way in tablegen to do this with just 2 shuffles, and I'm not sure if it's possible with tablegen. The first 2 vshuffles compute the same thing (question: is CSE after instruction selection possible?). Maybe if I wrote the pattern in C++ I could do it with 2 shuffles and avoid the 'rematerialization', but will the final code be more efficient?
I see in the tests before this PR, tests of accregbank and vecregbank. Do I need to have the same in my tests?
Do I need end-to-end tests? Full llc run, some kind of numerical test?

konstantinschwarz

The approach looks fine to me.

Ideally we would be able to do this "legalization" in the Legalizer, then we don't have to include this complexity into every AIE instruction selector.
However multiple G_TRUNC steps are combined in the artifact combiner during legalization, creating an infinite loop in the legalizer :/

llvm/lib/Target/AIE/aie2p/AIE2PInstrPatterns.td

llvm/lib/Target/AIE/aie2p/AIE2PLegalizerInfo.cpp

llvm/lib/Target/AIE/aie2p/AIE2PInstrPatterns.td

niwinanto · 2025-06-24T07:53:17Z

The approach looks fine to me.

Ideally we would be able to do this "legalization" in the Legalizer, then we don't have to include this complexity into every AIE instruction selector. However multiple G_TRUNC steps are combined in the artifact combiner during legalization, creating an infinite loop in the legalizer :/

I agree with @konstantinschwarz Moving this complex ISel pattern to legalizer makes much more sense. However, did you try vshuffle with mode 36, this might help you to trunc vi32 to vi8 directly.

martien-de-jong · 2025-06-24T09:13:21Z

llvm/lib/Target/AIE/aie2p/AIE2PLegalizerInfo.cpp

+
+        const auto SrcElmBits = SrcTy.getElementType().getSizeInBits();
+        if (SrcElmBits != 64 && SrcElmBits != 32 && SrcElmBits != 16)
+          return false;


Perhaps assert, or add the assumptions on DstElemBits as a comment. I assume we know it is <= SrcElemBits because we have a G_TRUNC and all truncations are legal.

I've updated this logic with additional comments and asserts

newling · 2025-06-24T16:55:26Z

My initial approach was via legalizing (i32->i8) --> (i32->i16->i8) but as predicted @konstantinschwarz this does get undone by a combiner. Maybe it's possible to adjust the list of combines used here but that might prevent a good optimization.

@niwinanto your mode 36 suggestion looks very promising, I'm investigating now!

newling · 2025-06-24T20:33:14Z

@niwinanto the problem I am facing using shuffle's mode 36 truncation is that it can do v16i32 -> v16i8 but the result there (v16i8) is 128-bits and I can't find an equivalent of EXTRACT_SUBREG [...] sub_256_lo for 128-bits. Is that because the underlying HW registers are 256-bits?

niwinanto · 2025-06-25T06:58:16Z

@niwinanto the problem I am facing using shuffle's mode 36 truncation is that it can do v16i32 -> v16i8 but the result there (v16i8) is 128-bits and I can't find an equivalent of EXTRACT_SUBREG [...] sub_256_lo for 128-bits. Is that because the underlying HW registers are 256-bits?

@newling Yes, you are right. Basic HW register for vector registers are 256-bit wide, so we cannot extract a subregister smaller than than 256-bit. However, we have 128-bit q registers and there is a vector move operation which moves from 256 bit register into 128 bit register with truncation of upper 128 bits. Please take a look at VMOV_alu_mv_mv_w_to_q, this instruction can help here.

llvm/test/CodeGen/AIE/aie2p/GlobalIsel/legalize-trunc.mir

niwinanto · 2025-06-26T07:40:13Z

@newling change looks good to me. Just one feed back to include more tests in the legalizer. Also, would be nice if you can rephrase the commit message and squash the clang format commit to respective commit before the merge.

newling · 2025-06-26T16:26:42Z

Also, would be nice if you can rephrase the commit message and squash the clang format commit to respective commit before the merge.

Will do. I'm still unsure what the recommended workflow is here, I've only worked on projects where all commits get squashed into a single commit when the PR is merged into main, as outlined here/ I guess in this project you prefer large 'feature' PRs where each commit is a single step towards the feature. As opposed to a sequence of small PRs? Anyway, I'm happy to adapt whatever you all use as a workflow :)

niwinanto

Thanks for addressing the feedback. LGTM!

andcarminati · 2025-06-27T12:45:33Z

Also, would be nice if you can rephrase the commit message and squash the clang format commit to respective commit before the merge.

Will do. I'm still unsure what the recommended workflow is here, I've only worked on projects where all commits get squashed into a single commit when the PR is merged into main, as outlined here/ I guess in this project you prefer large 'feature' PRs where each commit is a single step towards the feature. As opposed to a sequence of small PRs? Anyway, I'm happy to adapt whatever you all use as a workflow :)

I think this PR is in a nice shape now, only a small suggestion: append [AIE2p] in front of each commit message.

…>256 by splitting in two

newling · 2025-07-07T23:23:43Z

Ping

konstantinschwarz

LGTM. It would be nice to add LLVM IR -> assembly tests for all types that we expect to work, similar to https://github.com/Xilinx/llvm-aie/blob/aie-public/llvm/test/CodeGen/AIE/extractelement.ll

newling · 2025-07-14T18:14:29Z

TODO(newling) don't land before I've verified numerical correctness of this transform.

newling requested review from abhinay-anubola, abnikant, andcarminati, F-Stuckmann, gbossu, katerynamuts, khallouh, konstantinschwarz, martien-de-jong, niwinanto, SagarMaheshwari99 and stephenneuendorffer as code owners June 20, 2025 06:11

newling changed the title ~~[aie][aie2p] Global Isel : trunc where dst is 1/4 bitsize of src~~ [aie][aie2p] Global Isel : trunc vector when size(src)/size(dst) = 4 Jun 20, 2025

konstantinschwarz reviewed Jun 23, 2025

View reviewed changes

llvm/lib/Target/AIE/aie2p/AIE2PInstrPatterns.td Outdated Show resolved Hide resolved

llvm/lib/Target/AIE/aie2p/AIE2PLegalizerInfo.cpp Outdated Show resolved Hide resolved

konstantinschwarz reviewed Jun 23, 2025

View reviewed changes

llvm/lib/Target/AIE/aie2p/AIE2PInstrPatterns.td Outdated Show resolved Hide resolved

martien-de-jong reviewed Jun 24, 2025

View reviewed changes

niwinanto reviewed Jun 26, 2025

View reviewed changes

llvm/test/CodeGen/AIE/aie2p/GlobalIsel/legalize-trunc.mir Outdated Show resolved Hide resolved

newling force-pushed the trunc_v32i32_v32i8 branch 2 times, most recently from 3676e33 to 9324566 Compare June 26, 2025 17:06

niwinanto previously approved these changes Jun 26, 2025

View reviewed changes

newling dismissed niwinanto’s stale review via b5a23e9 June 26, 2025 17:14

newling force-pushed the trunc_v32i32_v32i8 branch 2 times, most recently from b5a23e9 to 14f4523 Compare June 27, 2025 01:18

newling force-pushed the trunc_v32i32_v32i8 branch 2 times, most recently from 92354f8 to a815f2b Compare June 27, 2025 18:09

newling added 3 commits June 27, 2025 11:10

[AIE2p] Use tablegen pattern for trunc where dst is 1/4 bitsize

085f685

[AIE2p] Use shuffle mode 36 (trunc 512->128) and legalize trunc 1024-…

a815f2b

…>256 by splitting in two

Merge branch 'aie-public' into trunc_v32i32_v32i8

e716f30

konstantinschwarz approved these changes Jul 14, 2025

View reviewed changes

[aie][aie2p] Global Isel : trunc vector when size(src)/size(dst) = 4 #496

Are you sure you want to change the base?

[aie][aie2p] Global Isel : trunc vector when size(src)/size(dst) = 4 #496

Conversation

newling commented Jun 20, 2025

Questions for reviewers:

Uh oh!

konstantinschwarz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

niwinanto commented Jun 24, 2025

Uh oh!

martien-de-jong Jun 24, 2025

Choose a reason for hiding this comment

Uh oh!

newling Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

newling commented Jun 24, 2025

Uh oh!

newling commented Jun 24, 2025

Uh oh!

niwinanto commented Jun 25, 2025

Uh oh!

Uh oh!

niwinanto commented Jun 26, 2025

Uh oh!

newling commented Jun 26, 2025

Uh oh!

niwinanto left a comment

Choose a reason for hiding this comment

Uh oh!

andcarminati commented Jun 27, 2025

Uh oh!

newling commented Jul 7, 2025

Uh oh!

konstantinschwarz left a comment

Choose a reason for hiding this comment

Uh oh!

newling commented Jul 14, 2025

Uh oh!

Uh oh!