-
Notifications
You must be signed in to change notification settings - Fork 125
Add new launch property to support work_group_scratch_memory #1968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't squash on merge so if you could squash the commits in this PR that would be appriciated.
55f2d02
to
ae9804c
Compare
@oneapi-src/unified-runtime-cuda-write @oneapi-src/unified-runtime-level-zero-write please review, this is intended for the next release. |
ae9804c
to
44bcff4
Compare
44bcff4
to
2afce8e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUDA LGTM
a64d649
to
bc9eef9
Compare
ping @oneapi-src/unified-runtime-level-zero-write please review this is intended for the next release. |
Actually latest testing shows some related failures in the cuda build |
as far as I can tell it is because needs the DPC++ change (function signature change). The related PR on the dpc++ side is green, same reason for the level 0 failure. the other failures are unrelated to this patch (failure to fetch xpti) |
intel/llvm#15061 introduces a new property work_group_scratch_memory which allow the user to set a given amount of local memory to be used. In order to pass this information to the adaptor, the patch adds a new launch property to urEnqueueKernelLaunchCustomExp. The patch also changes the signature of urEnqueueKernelLaunchCustomExp to add global offset in order to maintain features when using this extension. Signed-off-by: Victor Lomuller <[email protected]>
bc9eef9
to
7222f79
Compare
Hi @Naghasan, unfortunately I had to revert this PR in #2400 due to build failures in intel/llvm#15061 - I was looking at the wrong results when I hit merge. That said I'm not sure why the adapters built ok here but not in intel/llvm. I updated the tag there to point at |
@callumfare the build failure was due to the fact you haven't bumped the tag, only updated the comment ;) |
The patch partially implements `work_group_static` and update proposal. Implemented: - `work_group_static` to handle static allocation in kernel. - `get_dynamic_work_group_memory` to handle runtime allocation, but only on CUDA `work_group_static` is implemented by exposing `SYCLScope(WorkGroup)`, allowing the class to be decorated by the attribute and uses the same mechanism during lowering to place the variable in local memory. `get_dynamic_work_group_memory` uses a new builtin function, `__sycl_dynamicLocalMemoryPlaceholder `, which is lowered into referencing a 0 sized array GV when targeting NVPTX. The approach for SPIR will need to differ from this lowering. UR change oneapi-src/unified-runtime#1968, oneapi-src/unified-runtime#2403 --------- Signed-off-by: Lukas Sommer <[email protected]> Signed-off-by: Victor Lomuller <[email protected]> Co-authored-by: Lukas Sommer <[email protected]> Co-authored-by: Atharva Dubey <[email protected]> Co-authored-by: Marcos Maronas <[email protected]> Co-authored-by: Callum Fare <[email protected]> Co-authored-by: Martin Morrison-Grant <[email protected]>
The patch partially implements `work_group_static` and update proposal. Implemented: - `work_group_static` to handle static allocation in kernel. - `get_dynamic_work_group_memory` to handle runtime allocation, but only on CUDA `work_group_static` is implemented by exposing `SYCLScope(WorkGroup)`, allowing the class to be decorated by the attribute and uses the same mechanism during lowering to place the variable in local memory. `get_dynamic_work_group_memory` uses a new builtin function, `__sycl_dynamicLocalMemoryPlaceholder `, which is lowered into referencing a 0 sized array GV when targeting NVPTX. The approach for SPIR will need to differ from this lowering. UR change oneapi-src/unified-runtime#1968, oneapi-src/unified-runtime#2403 --------- Signed-off-by: Lukas Sommer <[email protected]> Signed-off-by: Victor Lomuller <[email protected]> Co-authored-by: Lukas Sommer <[email protected]> Co-authored-by: Atharva Dubey <[email protected]> Co-authored-by: Marcos Maronas <[email protected]> Co-authored-by: Callum Fare <[email protected]> Co-authored-by: Martin Morrison-Grant <[email protected]>
intel/llvm#15061 introduce a new property
work_group_scratch_size
which allow the user to set a given amount of local memory to be used.In order to pass this information to the adaptor, the patch adds a new launch property to
urEnqueueKernelLaunchCustomExp
.The patch also changes the signature of
urEnqueueKernelLaunchCustomExp
to addglobal offset
in order to maintain features when using this extension.