-
Notifications
You must be signed in to change notification settings - Fork 27
[aie][aie2p] Global Isel : trunc vector when size(src)/size(dst) = 4 #496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: aie-public
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approach looks fine to me.
Ideally we would be able to do this "legalization" in the Legalizer, then we don't have to include this complexity into every AIE instruction selector.
However multiple G_TRUNC steps are combined in the artifact combiner during legalization, creating an infinite loop in the legalizer :/
I agree with @konstantinschwarz Moving this complex ISel pattern to legalizer makes much more sense. However, did you try vshuffle with mode 36, this might help you to trunc vi32 to vi8 directly. |
|
||
const auto SrcElmBits = SrcTy.getElementType().getSizeInBits(); | ||
if (SrcElmBits != 64 && SrcElmBits != 32 && SrcElmBits != 16) | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps assert, or add the assumptions on DstElemBits as a comment. I assume we know it is <= SrcElemBits because we have a G_TRUNC and all truncations are legal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated this logic with additional comments and asserts
My initial approach was via legalizing (i32->i8) --> (i32->i16->i8) but as predicted @konstantinschwarz this does get undone by a combiner. Maybe it's possible to adjust the list of combines used here but that might prevent a good optimization. @niwinanto your mode 36 suggestion looks very promising, I'm investigating now! |
@niwinanto the problem I am facing using shuffle's mode 36 truncation is that it can do |
@newling Yes, you are right. Basic HW register for vector registers are 256-bit wide, so we cannot extract a subregister smaller than than 256-bit. However, we have 128-bit q registers and there is a vector move operation which moves from 256 bit register into 128 bit register with truncation of upper 128 bits. Please take a look at |
@newling change looks good to me. Just one feed back to include more tests in the legalizer. Also, would be nice if you can rephrase the commit message and squash the clang format commit to respective commit before the merge. |
Will do. I'm still unsure what the recommended workflow is here, I've only worked on projects where all commits get squashed into a single commit when the PR is merged into main, as outlined here/ I guess in this project you prefer large 'feature' PRs where each commit is a single step towards the feature. As opposed to a sequence of small PRs? Anyway, I'm happy to adapt whatever you all use as a workflow :) |
3676e33
to
9324566
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the feedback. LGTM!
b5a23e9
to
14f4523
Compare
I think this PR is in a nice shape now, only a small suggestion: append [AIE2p] in front of each commit message. |
92354f8
to
a815f2b
Compare
Ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. It would be nice to add LLVM IR -> assembly tests for all types that we expect to work, similar to https://github.com/Xilinx/llvm-aie/blob/aie-public/llvm/test/CodeGen/AIE/extractelement.ll
TODO(newling) don't land before I've verified numerical correctness of this transform. |
Add instruction selection tablegen pattern for
It is based on existing patterns for
and
The final asm with this new lowering for
trunc <32 x i32> %0 to <32 x i8>
isQuestions for reviewers:
I couldn't find a way in tablegen to do this with just 2 shuffles, and I'm not sure if it's possible with tablegen. The first 2
vshuffles
compute the same thing (question: is CSE after instruction selection possible?). Maybe if I wrote the pattern in C++ I could do it with 2 shuffles and avoid the 'rematerialization', but will the final code be more efficient?I see in the tests before this PR, tests of
accregbank
andvecregbank
. Do I need to have the same in my tests?Do I need end-to-end tests? Full llc run, some kind of numerical test?