-
Notifications
You must be signed in to change notification settings - Fork 125
Add new launch property to support work_group_scratch_memory #2403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Just to avoid the same mistake happening again, the intel/llvm PR is failing so this shouldn't be labelled ready to merge until that gets fixed |
@callumfare there is nothing to fix, your tag bump is the cause of the failure as you didn't bump the tag |
@Naghasan Sorry yep you're right, I think there may have been another failure before the bad tag bump commit but it looks like everything should be resolved now. |
2d7b879
to
80dd603
Compare
80dd603
to
7ba97fa
Compare
@EwanC / @Bensuo my changes impact the last PR you merged, can you have a look at source/adapters/cuda/kernel.hpp @frasercrmck I'd appreciate the extra review on the file :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still looks alright to me, with some minor suggestions
intel/llvm#15061 introduces a new property work_group_scratch_memory which allow the user to set a given amount of local memory to be used. In order to pass this information to the adaptor, the patch adds a new launch property to urEnqueueKernelLaunchCustomExp. The patch also changes the signature of urEnqueueKernelLaunchCustomExp to add global offset in order to maintain features when using this extension. Signed-off-by: Victor Lomuller <[email protected]>
7ba97fa
to
41ad797
Compare
Changes make sense me. We added quite a few UR CTS tests and DPC++ E2E tests in that PR, so if they are passing in CI we can have reasonable confidence it's working fine. |
great, thanks @frasercrmck @EwanC |
https://github.com/oneapi-src/unified-runtime/actions/runs/12143564588/job/33860974258?pr=2403 failed with an unrelated failure |
This may go away with a rerun once the workflow has completed. |
failures unrelated to the patch: either a nativecpu timeout, or level 0 error cause by as before, the SYCL e2e testing fails because it needs the dpc++ patch. |
The L0 failures are due to the runner not finding any L0 devices. |
The patch partially implements `work_group_static` and update proposal. Implemented: - `work_group_static` to handle static allocation in kernel. - `get_dynamic_work_group_memory` to handle runtime allocation, but only on CUDA `work_group_static` is implemented by exposing `SYCLScope(WorkGroup)`, allowing the class to be decorated by the attribute and uses the same mechanism during lowering to place the variable in local memory. `get_dynamic_work_group_memory` uses a new builtin function, `__sycl_dynamicLocalMemoryPlaceholder `, which is lowered into referencing a 0 sized array GV when targeting NVPTX. The approach for SPIR will need to differ from this lowering. UR change oneapi-src/unified-runtime#1968, oneapi-src/unified-runtime#2403 --------- Signed-off-by: Lukas Sommer <[email protected]> Signed-off-by: Victor Lomuller <[email protected]> Co-authored-by: Lukas Sommer <[email protected]> Co-authored-by: Atharva Dubey <[email protected]> Co-authored-by: Marcos Maronas <[email protected]> Co-authored-by: Callum Fare <[email protected]> Co-authored-by: Martin Morrison-Grant <[email protected]>
intel/llvm PR intel/llvm#15061 |
The patch partially implements `work_group_static` and update proposal. Implemented: - `work_group_static` to handle static allocation in kernel. - `get_dynamic_work_group_memory` to handle runtime allocation, but only on CUDA `work_group_static` is implemented by exposing `SYCLScope(WorkGroup)`, allowing the class to be decorated by the attribute and uses the same mechanism during lowering to place the variable in local memory. `get_dynamic_work_group_memory` uses a new builtin function, `__sycl_dynamicLocalMemoryPlaceholder `, which is lowered into referencing a 0 sized array GV when targeting NVPTX. The approach for SPIR will need to differ from this lowering. UR change oneapi-src/unified-runtime#1968, oneapi-src/unified-runtime#2403 --------- Signed-off-by: Lukas Sommer <[email protected]> Signed-off-by: Victor Lomuller <[email protected]> Co-authored-by: Lukas Sommer <[email protected]> Co-authored-by: Atharva Dubey <[email protected]> Co-authored-by: Marcos Maronas <[email protected]> Co-authored-by: Callum Fare <[email protected]> Co-authored-by: Martin Morrison-Grant <[email protected]>
reland #1968 (exact same code) required for intel/llvm#15061, #2400 reverted it due to a mistake in the tag bump in intel/llvm intel/llvm@33ea3b0