Skip to content

[clang][CodeGen][SPIR-V][AMDGPU] Tweak AMDGCNSPIRV ABI to allow for the correct handling of aggregates passed to kernels / functions. #102776

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Aug 21, 2024
70 changes: 63 additions & 7 deletions clang/lib/CodeGen/Targets/SPIR.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,9 @@ class SPIRVABIInfo : public CommonSPIRABIInfo {
void computeInfo(CGFunctionInfo &FI) const override;

private:
ABIArgInfo classifyReturnType(QualType RetTy) const;
ABIArgInfo classifyKernelArgumentType(QualType Ty) const;
ABIArgInfo classifyArgumentType(QualType Ty) const;
};
} // end anonymous namespace
namespace {
Expand Down Expand Up @@ -64,6 +66,27 @@ void CommonSPIRABIInfo::setCCs() {
RuntimeCC = llvm::CallingConv::SPIR_FUNC;
}

ABIArgInfo SPIRVABIInfo::classifyReturnType(QualType RetTy) const {
if (getTarget().getTriple().getVendor() != llvm::Triple::AMD)
return DefaultABIInfo::classifyReturnType(RetTy);
if (!isAggregateTypeForABI(RetTy) || getRecordArgABI(RetTy, getCXXABI()))
return DefaultABIInfo::classifyReturnType(RetTy);

if (const RecordType *RT = RetTy->getAs<RecordType>()) {
const RecordDecl *RD = RT->getDecl();
if (RD->hasFlexibleArrayMember())
return DefaultABIInfo::classifyReturnType(RetTy);
}

// TODO: The AMDGPU ABI is non-trivial to represent in SPIR-V; in order to
// avoid encoding various architecture specific bits here we return everything
// as direct to retain type info for things like aggregates, for later perusal
// when translating back to LLVM/lowering in the BE. This is also why we
// disable flattening as the outcomes can mismatch between SPIR-V and AMDGPU.
// This will be revisited / optimised in the future.
Comment on lines +81 to +86
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the difficulty representing it. SPIR/SPIRV just made the a mistake by using byval for anything (where they just took the default clang happens to emit). There's no real reason SPIR-V needs to continue using byval in kernels for its IR, and can switch to byref for kernels? What is the underlying SPIR-V representation like?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you misunderstand the core problem, which is orthogonal to LLVM: SPIR-V the IR specification does not encode byref, only byval. Whatever you might feel about what is right or wrong is somewhat unhelpful here, unless you are going to add an extension to that end? Trying to backdoor this is just a lot of pain.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To wit, it's necessary pain, in the end, but we've not yet figured out the right way to do it, since there are a few moving pieces involved. Perhaps the solution will end up to add byref to the SPIR-V spec; in the meanwhile, this is a sufficient, if temporary and slightly icky, solution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

byval doesn't really make sense on a kernel as there is no real caller, but depending on how SPIRV defined its byval attribute, maybe you can codegen LLVM byref into SPIRV byval?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

byval doesn't really make sense on a kernel as there is no real caller, but depending on how SPIRV defined its byval attribute, maybe you can codegen LLVM byref into SPIRV byval?

This is a good suggestion, it's one of the things I'm looking at, but it's a bit messy to retain the "this is actually byref" (it is doable, but was quite intrusive with tools like the translator), and I admit to not fully debugging / validating it. I just realised that the "add an extension to SPIR-V" bit came off pretty passive-aggressive, I apologise - I think that might just be TheRightWayTM to deal with this, byref is common enough to justify it.

return ABIArgInfo::getDirect(CGT.ConvertType(RetTy), 0u, nullptr, false);
}

ABIArgInfo SPIRVABIInfo::classifyKernelArgumentType(QualType Ty) const {
if (getContext().getLangOpts().CUDAIsDevice) {
// Coerce pointer arguments with default address space to CrossWorkGroup
Expand All @@ -78,18 +101,51 @@ ABIArgInfo SPIRVABIInfo::classifyKernelArgumentType(QualType Ty) const {
return ABIArgInfo::getDirect(LTy, 0, nullptr, false);
}

// Force copying aggregate type in kernel arguments by value when
// compiling CUDA targeting SPIR-V. This is required for the object
// copied to be valid on the device.
// This behavior follows the CUDA spec
// https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#global-function-argument-processing,
// and matches the NVPTX implementation.
if (isAggregateTypeForABI(Ty))
if (isAggregateTypeForABI(Ty)) {
if (getTarget().getTriple().getVendor() == llvm::Triple::AMD)
// TODO: The AMDGPU kernel ABI passes aggregates byref, which is not
// currently expressible in SPIR-V; SPIR-V passes aggregates byval,
// which the AMDGPU kernel ABI does not allow. Passing aggregates as
// direct works around this impedance mismatch, as it retains type info
// and can be correctly handled, post reverse-translation, by the AMDGPU
// BE, which has to support this CC for legacy OpenCL purposes. It can
// be brittle and does lead to performance degradation in certain
// pathological cases. This will be revisited / optimised in the future,
// once a way to deal with the byref/byval impedance mismatch is
Comment on lines +109 to +114
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main blocker to only using byref on AMDGPU is the OpenCL call-kernel-as-function case. The ticket I filed for this eons ago actually had some recent signs of life

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine, when it amounts to something we'll (me) just fix it here.

// identified.
return ABIArgInfo::getDirect(LTy, 0, nullptr, false);
// Force copying aggregate type in kernel arguments by value when
// compiling CUDA targeting SPIR-V. This is required for the object
// copied to be valid on the device.
// This behavior follows the CUDA spec
// https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#global-function-argument-processing,
// and matches the NVPTX implementation.
return getNaturalAlignIndirect(Ty, /* byval */ true);
}
}
return classifyArgumentType(Ty);
}

ABIArgInfo SPIRVABIInfo::classifyArgumentType(QualType Ty) const {
if (getTarget().getTriple().getVendor() != llvm::Triple::AMD)
return DefaultABIInfo::classifyArgumentType(Ty);
if (!isAggregateTypeForABI(Ty))
return DefaultABIInfo::classifyArgumentType(Ty);

// Records with non-trivial destructors/copy-constructors should not be
// passed by value.
if (auto RAA = getRecordArgABI(Ty, getCXXABI()))
return getNaturalAlignIndirect(Ty, RAA == CGCXXABI::RAA_DirectInMemory);

if (const RecordType *RT = Ty->getAs<RecordType>()) {
const RecordDecl *RD = RT->getDecl();
if (RD->hasFlexibleArrayMember())
return DefaultABIInfo::classifyArgumentType(Ty);
}

return ABIArgInfo::getDirect(CGT.ConvertType(Ty), 0u, nullptr, false);
}

void SPIRVABIInfo::computeInfo(CGFunctionInfo &FI) const {
// The logic is same as in DefaultABIInfo with an exception on the kernel
// arguments handling.
Expand Down
Loading
Loading