-
Notifications
You must be signed in to change notification settings - Fork 14.6k
[clang][CodeGen][SPIR-V][AMDGPU] Tweak AMDGCNSPIRV ABI to allow for the correct handling of aggregates passed to kernels / functions. #102776
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
d41faf6
757e119
af1a416
0cb89f6
11162b0
13f83ac
fe68593
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -32,7 +32,9 @@ class SPIRVABIInfo : public CommonSPIRABIInfo { | |
void computeInfo(CGFunctionInfo &FI) const override; | ||
|
||
private: | ||
ABIArgInfo classifyReturnType(QualType RetTy) const; | ||
ABIArgInfo classifyKernelArgumentType(QualType Ty) const; | ||
ABIArgInfo classifyArgumentType(QualType Ty) const; | ||
}; | ||
} // end anonymous namespace | ||
namespace { | ||
|
@@ -64,6 +66,27 @@ void CommonSPIRABIInfo::setCCs() { | |
RuntimeCC = llvm::CallingConv::SPIR_FUNC; | ||
} | ||
|
||
ABIArgInfo SPIRVABIInfo::classifyReturnType(QualType RetTy) const { | ||
if (getTarget().getTriple().getVendor() != llvm::Triple::AMD) | ||
return DefaultABIInfo::classifyReturnType(RetTy); | ||
if (!isAggregateTypeForABI(RetTy) || getRecordArgABI(RetTy, getCXXABI())) | ||
return DefaultABIInfo::classifyReturnType(RetTy); | ||
|
||
if (const RecordType *RT = RetTy->getAs<RecordType>()) { | ||
const RecordDecl *RD = RT->getDecl(); | ||
if (RD->hasFlexibleArrayMember()) | ||
return DefaultABIInfo::classifyReturnType(RetTy); | ||
} | ||
|
||
// TODO: The AMDGPU ABI is non-trivial to represent in SPIR-V; in order to | ||
// avoid encoding various architecture specific bits here we return everything | ||
// as direct to retain type info for things like aggregates, for later perusal | ||
// when translating back to LLVM/lowering in the BE. This is also why we | ||
// disable flattening as the outcomes can mismatch between SPIR-V and AMDGPU. | ||
// This will be revisited / optimised in the future. | ||
return ABIArgInfo::getDirect(CGT.ConvertType(RetTy), 0u, nullptr, false); | ||
} | ||
|
||
ABIArgInfo SPIRVABIInfo::classifyKernelArgumentType(QualType Ty) const { | ||
if (getContext().getLangOpts().CUDAIsDevice) { | ||
// Coerce pointer arguments with default address space to CrossWorkGroup | ||
|
@@ -78,18 +101,51 @@ ABIArgInfo SPIRVABIInfo::classifyKernelArgumentType(QualType Ty) const { | |
return ABIArgInfo::getDirect(LTy, 0, nullptr, false); | ||
} | ||
|
||
// Force copying aggregate type in kernel arguments by value when | ||
// compiling CUDA targeting SPIR-V. This is required for the object | ||
// copied to be valid on the device. | ||
// This behavior follows the CUDA spec | ||
// https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#global-function-argument-processing, | ||
// and matches the NVPTX implementation. | ||
if (isAggregateTypeForABI(Ty)) | ||
if (isAggregateTypeForABI(Ty)) { | ||
if (getTarget().getTriple().getVendor() == llvm::Triple::AMD) | ||
// TODO: The AMDGPU kernel ABI passes aggregates byref, which is not | ||
// currently expressible in SPIR-V; SPIR-V passes aggregates byval, | ||
// which the AMDGPU kernel ABI does not allow. Passing aggregates as | ||
// direct works around this impedance mismatch, as it retains type info | ||
// and can be correctly handled, post reverse-translation, by the AMDGPU | ||
// BE, which has to support this CC for legacy OpenCL purposes. It can | ||
// be brittle and does lead to performance degradation in certain | ||
// pathological cases. This will be revisited / optimised in the future, | ||
// once a way to deal with the byref/byval impedance mismatch is | ||
Comment on lines
+109
to
+114
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The main blocker to only using byref on AMDGPU is the OpenCL call-kernel-as-function case. The ticket I filed for this eons ago actually had some recent signs of life There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's fine, when it amounts to something we'll (me) just fix it here. |
||
// identified. | ||
return ABIArgInfo::getDirect(LTy, 0, nullptr, false); | ||
// Force copying aggregate type in kernel arguments by value when | ||
// compiling CUDA targeting SPIR-V. This is required for the object | ||
// copied to be valid on the device. | ||
// This behavior follows the CUDA spec | ||
// https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#global-function-argument-processing, | ||
// and matches the NVPTX implementation. | ||
return getNaturalAlignIndirect(Ty, /* byval */ true); | ||
} | ||
} | ||
return classifyArgumentType(Ty); | ||
} | ||
|
||
ABIArgInfo SPIRVABIInfo::classifyArgumentType(QualType Ty) const { | ||
if (getTarget().getTriple().getVendor() != llvm::Triple::AMD) | ||
return DefaultABIInfo::classifyArgumentType(Ty); | ||
if (!isAggregateTypeForABI(Ty)) | ||
return DefaultABIInfo::classifyArgumentType(Ty); | ||
|
||
// Records with non-trivial destructors/copy-constructors should not be | ||
// passed by value. | ||
if (auto RAA = getRecordArgABI(Ty, getCXXABI())) | ||
return getNaturalAlignIndirect(Ty, RAA == CGCXXABI::RAA_DirectInMemory); | ||
|
||
if (const RecordType *RT = Ty->getAs<RecordType>()) { | ||
const RecordDecl *RD = RT->getDecl(); | ||
if (RD->hasFlexibleArrayMember()) | ||
return DefaultABIInfo::classifyArgumentType(Ty); | ||
} | ||
|
||
return ABIArgInfo::getDirect(CGT.ConvertType(Ty), 0u, nullptr, false); | ||
} | ||
|
||
void SPIRVABIInfo::computeInfo(CGFunctionInfo &FI) const { | ||
// The logic is same as in DefaultABIInfo with an exception on the kernel | ||
// arguments handling. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the difficulty representing it. SPIR/SPIRV just made the a mistake by using byval for anything (where they just took the default clang happens to emit). There's no real reason SPIR-V needs to continue using byval in kernels for its IR, and can switch to byref for kernels? What is the underlying SPIR-V representation like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you misunderstand the core problem, which is orthogonal to LLVM: SPIR-V the IR specification does not encode byref, only byval. Whatever you might feel about what is right or wrong is somewhat unhelpful here, unless you are going to add an extension to that end? Trying to backdoor this is just a lot of pain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To wit, it's necessary pain, in the end, but we've not yet figured out the right way to do it, since there are a few moving pieces involved. Perhaps the solution will end up to add
byref
to the SPIR-V spec; in the meanwhile, this is a sufficient, if temporary and slightly icky, solution.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
byval doesn't really make sense on a kernel as there is no real caller, but depending on how SPIRV defined its byval attribute, maybe you can codegen LLVM byref into SPIRV byval?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good suggestion, it's one of the things I'm looking at, but it's a bit messy to retain the "this is actually
byref
" (it is doable, but was quite intrusive with tools like the translator), and I admit to not fully debugging / validating it. I just realised that the "add an extension to SPIR-V" bit came off pretty passive-aggressive, I apologise - I think that might just be TheRightWayTM to deal with this,byref
is common enough to justify it.