-
Notifications
You must be signed in to change notification settings - Fork 14.6k
[AMDGPU] Defaults for missing dimensions in SYCL required wg size #68872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
|
@@ -317,10 +317,38 @@ static bool processUse(CallInst *CI, bool IsV5OrAbove) { | |||
return MadeChange; | ||||
} | ||||
|
||||
// SYCL allows required work-group size attribute to be partially specified | ||||
// (not all three dimensions), provide a default value (1) for the missing | ||||
// dimensions. | ||||
static void updateSYCLreqdWorkGroupMD(Function &F) { | ||||
auto *Node = F.getMetadata("reqd_work_group_size"); | ||||
if (!Node || Node->getNumOperands() == 3) | ||||
return; | ||||
|
||||
auto &Context = F.getContext(); | ||||
SmallVector<uint64_t, 3> RWGS; | ||||
for (auto &Op : Node->operands()) | ||||
RWGS.push_back(mdconst::extract<ConstantInt>(Op)->getZExtValue()); | ||||
while (RWGS.size() != 3) | ||||
RWGS.push_back(1); | ||||
|
||||
llvm::Metadata *RWGSArgs[] = { | ||||
llvm::ConstantAsMetadata::get(llvm::ConstantInt::get( | ||||
llvm::IntegerType::get(Context, 32), llvm::APInt(32, RWGS[0]))), | ||||
llvm::ConstantAsMetadata::get(llvm::ConstantInt::get( | ||||
llvm::IntegerType::get(Context, 32), llvm::APInt(32, RWGS[1]))), | ||||
llvm::ConstantAsMetadata::get(llvm::ConstantInt::get( | ||||
llvm::IntegerType::get(Context, 32), llvm::APInt(32, RWGS[2])))}; | ||||
F.setMetadata("reqd_work_group_size", llvm::MDNode::get(Context, RWGSArgs)); | ||||
} | ||||
|
||||
// TODO: Move makeLIDRangeMetadata usage into here. Seem to not get | ||||
// TargetPassConfig for subtarget. | ||||
bool AMDGPULowerKernelAttributes::runOnModule(Module &M) { | ||||
for (auto &F : M) | ||||
if (F.hasFnAttribute("sycl-module-id")) | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is unnecessary There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is an error condition in OpenCL, which is check by the verifier:
However, it is acceptable in SYCL. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What's an error condition? That verifier is just checking the final output parses? The backend shouldn't need language specific checks There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it just tripping over the assert 3 element condition? But you fixed that by adjusting the metadata to match? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry, I should have been more clear.
No, OpenCL mandates all 3 dimensions are set, the verifier, rightly so, expects 3 elements and errors out on sub 3 elements. It's just that the same metadata now can be created through either SYCL, or OpenCL, and SYCL happens to relax the restriction.
Yes, with this patch the condition is also true for SYCL's required work group size (padded with 1 for sub 3 elements). A quick grep through ROCm-Developer-Tools, shows that this assumption (all 3 elements specified) is relied upon in a bunch of places: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @arsenm did I make a convincing argument for this patch, could we please procede? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wouldn't auto *Node = F.getMetadata("reqd_work_group_size");
if (!Node || Node->getNumOperands() == 3)
return; So we should be able to skip this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeap, it would, but I wanted to be explicit about the fact that this is a sycl specific update. Happy to get rid of it though. |
||||
updateSYCLreqdWorkGroupMD(F); | ||||
|
||||
bool MadeChange = false; | ||||
bool IsV5OrAbove = AMDGPU::getCodeObjectVersion(M) >= AMDGPU::AMDHSA_COV5; | ||||
Function *BasePtr = getBasePtrIntrinsic(M, IsV5OrAbove); | ||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,78 @@ | ||
# RUN: llc -march=amdgcn -mcpu=gfx90a -run-pass amdgpu-lower-kernel-attributes -o - %s | FileCheck %s | ||
|
||
# As SYCL allows for the required work group to be specified partially, we need | ||
# to patch it up to 3 dimensions. Make sure that it only happens when dealing | ||
# with SYCL kernels. | ||
|
||
--- | | ||
target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8" | ||
|
||
; CHECK-LABEL: sycl_kernel_3dim | ||
; CHECK: !reqd_work_group_size [[SYCL_3DIM:![0-9]+]] | ||
define amdgpu_kernel void @sycl_kernel_3dim() #0 !reqd_work_group_size !0 { | ||
entry: | ||
ret void | ||
} | ||
|
||
; CHECK-LABEL: sycl_kernel_2dim | ||
; CHECK: !reqd_work_group_size [[SYCL_2DIM:![0-9]+]] | ||
define amdgpu_kernel void @sycl_kernel_2dim() #0 !reqd_work_group_size !1 { | ||
entry: | ||
ret void | ||
} | ||
|
||
; CHECK-LABEL: non_sycl_kernel_3dim | ||
; CHECK: !reqd_work_group_size [[NON_SYCL_3DIM:![0-9]+]] | ||
define amdgpu_kernel void @non_sycl_kernel_3dim() #1 !reqd_work_group_size !2 { | ||
entry: | ||
ret void | ||
} | ||
|
||
; CHECK-LABEL: non_sycl_kernel_2dim | ||
; CHECK: !reqd_work_group_size [[NON_SYCL_2DIM:![0-9]+]] | ||
define amdgpu_kernel void @non_sycl_kernel_2dim() #1 !reqd_work_group_size !3 { | ||
entry: | ||
ret void | ||
} | ||
|
||
attributes #0 = { "sycl-module-id"="sycl-reqd-work-group-size.cpp" "target-cpu"="gfx90a" } | ||
attributes #1 = { "target-cpu"="gfx90a" } | ||
|
||
; CHECK: [[SYCL_3DIM]] = !{i32 8, i32 16, i32 2} | ||
!0 = !{i32 8, i32 16, i32 2} | ||
; CHECK: [[SYCL_2DIM]] = !{i32 8, i32 16, i32 1} | ||
!1 = !{i32 8, i32 16} | ||
; CHECK: [[NON_SYCL_3DIM]] = !{i32 4, i32 8, i32 4} | ||
!2 = !{i32 4, i32 8, i32 4} | ||
; CHECK: [[NON_SYCL_2DIM]] = !{i32 4, i32 8} | ||
!3 = !{i32 4, i32 8} | ||
|
||
... | ||
--- | ||
name: sycl_kernel_3dim | ||
body: | | ||
bb.0.entry: | ||
S_ENDPGM 0 | ||
|
||
... | ||
--- | ||
name: sycl_kernel_2dim | ||
body: | | ||
bb.0.entry: | ||
S_ENDPGM 0 | ||
|
||
... | ||
--- | ||
name: non_sycl_kernel_3dim | ||
body: | | ||
bb.0.entry: | ||
S_ENDPGM 0 | ||
|
||
... | ||
--- | ||
name: non_sycl_kernel_2dim | ||
body: | | ||
bb.0.entry: | ||
S_ENDPGM 0 | ||
|
||
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, why would this happen? I would expect all producers of this to agree on one consistent format (which we should also add to the LangRef).
Second, a random backend pass shouldn't be going out of its way to canonicalize metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why this happens is because a discrepancy in how
reqd_work_group_size
is handled in OpenCL and SYCL. OpenCL mandates that all 3 dimensions are specified, padded by1
if applicable (6.7.2 Optional Attribute Qualifiers of OpenCL 1.2 spec):However, SYCL allows for any of the dimensions to be specified (Table 180 of SYCL 2020 spec):
Furthermore, the SYCL runtime is supposed to throw an exception when a kernel is launched with a number of dimensions that does not match the required work group size dimensionality, which makes padding the dimensions up to 3 very awkward.
I think this is a bit too harsh. It is a lower kernel attributes pass and it handles just that, an attribute. While I agree, that this is not "a fault" of AMD backend, I do think that sanitising the values here is correct, there are many places in the codebase where backends make special provisions for languages that are not in tree (rust for instance). And it solves the problem (admittedly, SYCL's problem) of loosing the dimensionality at the point where it is save to do so. Alternatively, maybe the verifier could be used to canonicalize SYCL required work group size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like a source syntax difference, clang could still emit consistent metadata with 1 padding. The IR doesn't need to 1:1 correspond to what the source looks like
If we really needed to support 2 forms of the metadata, I would expect to just have a utility function to read the effective value. You don't need to rewrite it
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess in principle you are right, but, we do not want to alter the metadata in clang, as it would lose the dimensionality information that is important.
Since this feature is borrowed from OpenCL, and OpenCL assumptions are held in the ROCm toolchain (see examples in the comment below) I feel it has to be rewritten to converge to OpenCL version; and this spot seems very convenient for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So just to clarify based on the examples below, the main reason we need to do this is because the ROCm toolchain expects 3 entries in the final metadata?
Would it make more sense/be more acceptable to update this pass to handle any number of dimensions, and to instead add the padding when generating the AMD specific binary metadata for
reqd_work_group_size
, rather than doing it at IR level?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, we could use
MetadataStreamerMsgPackV3::getWorkGroupDimensions
to pad the missing dimensions and keep the metadata intact.