Skip to content

Add new launch property to support work_group_scratch_memory #2403

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 4, 2024

Conversation

Naghasan
Copy link
Contributor

@Naghasan Naghasan commented Dec 2, 2024

reland #1968 (exact same code) required for intel/llvm#15061, #2400 reverted it due to a mistake in the tag bump in intel/llvm intel/llvm@33ea3b0

@Naghasan Naghasan requested review from a team as code owners December 2, 2024 09:10
@callumfare callumfare added the v0.11.x Include in the v0.11.x release label Dec 2, 2024
@callumfare
Copy link
Contributor

Just to avoid the same mistake happening again, the intel/llvm PR is failing so this shouldn't be labelled ready to merge until that gets fixed
https://github.com/intel/llvm/actions/runs/12085983863/job/33704353503?pr=15061

@Naghasan
Copy link
Contributor Author

Naghasan commented Dec 2, 2024

@callumfare there is nothing to fix, your tag bump is the cause of the failure as you didn't bump the tag

@callumfare
Copy link
Contributor

@callumfare there is nothing to fix, your tag bump is the cause of the failure as you didn't bump the tag

@Naghasan Sorry yep you're right, I think there may have been another failure before the bad tag bump commit but it looks like everything should be resolved now.

@aarongreig aarongreig added the ready to merge Added to PR's which are ready to merge label Dec 3, 2024
@kbenzie kbenzie force-pushed the work_group_static_reland branch from 2d7b879 to 80dd603 Compare December 3, 2024 12:24
@github-actions github-actions bot added loader Loader related feature/bug conformance Conformance test suite issues. specification Changes or additions to the specification experimental Experimental feature additions/changes/specification level-zero L0 adapter specific issues cuda CUDA adapter specific issues labels Dec 3, 2024
@Naghasan Naghasan force-pushed the work_group_static_reland branch from 80dd603 to 7ba97fa Compare December 3, 2024 15:29
@Naghasan
Copy link
Contributor Author

Naghasan commented Dec 3, 2024

@EwanC / @Bensuo my changes impact the last PR you merged, can you have a look at source/adapters/cuda/kernel.hpp

@frasercrmck I'd appreciate the extra review on the file :)

Copy link
Contributor

@frasercrmck frasercrmck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still looks alright to me, with some minor suggestions

intel/llvm#15061 introduces a new property work_group_scratch_memory which allow the user to set a given amount of local memory to be used.

In order to pass this information to the adaptor, the patch adds a new launch property to urEnqueueKernelLaunchCustomExp.

The patch also changes the signature of urEnqueueKernelLaunchCustomExp to add global offset in order to maintain features when using this extension.

Signed-off-by: Victor Lomuller <[email protected]>
@Naghasan Naghasan force-pushed the work_group_static_reland branch from 7ba97fa to 41ad797 Compare December 3, 2024 16:05
@EwanC
Copy link
Contributor

EwanC commented Dec 3, 2024

@EwanC / @Bensuo my changes impact the last PR you merged, can you have a look at source/adapters/cuda/kernel.hpp

Changes make sense me. We added quite a few UR CTS tests and DPC++ E2E tests in that PR, so if they are passing in CI we can have reasonable confidence it's working fine.

@Naghasan
Copy link
Contributor Author

Naghasan commented Dec 3, 2024

great, thanks @frasercrmck @EwanC

@Naghasan
Copy link
Contributor Author

Naghasan commented Dec 3, 2024

https://github.com/oneapi-src/unified-runtime/actions/runs/12143564588/job/33860974258?pr=2403 failed with an unrelated failure 51 - enqueue-adapter_native_cpu (Timeout)

@kbenzie
Copy link
Contributor

kbenzie commented Dec 3, 2024

https://github.com/oneapi-src/unified-runtime/actions/runs/12143564588/job/33860974258?pr=2403 failed with an unrelated failure 51 - enqueue-adapter_native_cpu (Timeout)

This may go away with a rerun once the workflow has completed.

@Naghasan
Copy link
Contributor Author

Naghasan commented Dec 3, 2024

failures unrelated to the patch: either a nativecpu timeout, or level 0 error cause by uur::Result(urPlatformGet(&a, 1, count, platform_list.data(), nullptr))

as before, the SYCL e2e testing fails because it needs the dpc++ patch.

@kbenzie
Copy link
Contributor

kbenzie commented Dec 4, 2024

The L0 failures are due to the runner not finding any L0 devices.

@martygrant martygrant merged commit 5acc824 into oneapi-src:main Dec 4, 2024
64 of 72 checks passed
martygrant added a commit to intel/llvm that referenced this pull request Dec 4, 2024
The patch partially implements `work_group_static` and update proposal.

Implemented:

- `work_group_static` to handle static allocation in kernel.
- `get_dynamic_work_group_memory` to handle runtime allocation, but only
on CUDA

`work_group_static` is implemented by exposing `SYCLScope(WorkGroup)`,
allowing the class to be decorated by the attribute and uses the same
mechanism during lowering to place the variable in local memory.

`get_dynamic_work_group_memory` uses a new builtin function,
`__sycl_dynamicLocalMemoryPlaceholder `, which is lowered into
referencing a 0 sized array GV when targeting NVPTX. The approach for
SPIR will need to differ from this lowering.

UR change oneapi-src/unified-runtime#1968,
oneapi-src/unified-runtime#2403

---------

Signed-off-by: Lukas Sommer <[email protected]>
Signed-off-by: Victor Lomuller <[email protected]>
Co-authored-by: Lukas Sommer <[email protected]>
Co-authored-by: Atharva Dubey <[email protected]>
Co-authored-by: Marcos Maronas <[email protected]>
Co-authored-by: Callum Fare <[email protected]>
Co-authored-by: Martin Morrison-Grant <[email protected]>
@martygrant
Copy link
Contributor

intel/llvm PR intel/llvm#15061

KornevNikita pushed a commit to intel/llvm that referenced this pull request Feb 25, 2025
The patch partially implements `work_group_static` and update proposal.

Implemented:

- `work_group_static` to handle static allocation in kernel.
- `get_dynamic_work_group_memory` to handle runtime allocation, but only
on CUDA

`work_group_static` is implemented by exposing `SYCLScope(WorkGroup)`,
allowing the class to be decorated by the attribute and uses the same
mechanism during lowering to place the variable in local memory.

`get_dynamic_work_group_memory` uses a new builtin function,
`__sycl_dynamicLocalMemoryPlaceholder `, which is lowered into
referencing a 0 sized array GV when targeting NVPTX. The approach for
SPIR will need to differ from this lowering.

UR change oneapi-src/unified-runtime#1968,
oneapi-src/unified-runtime#2403

---------

Signed-off-by: Lukas Sommer <[email protected]>
Signed-off-by: Victor Lomuller <[email protected]>
Co-authored-by: Lukas Sommer <[email protected]>
Co-authored-by: Atharva Dubey <[email protected]>
Co-authored-by: Marcos Maronas <[email protected]>
Co-authored-by: Callum Fare <[email protected]>
Co-authored-by: Martin Morrison-Grant <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
conformance Conformance test suite issues. cuda CUDA adapter specific issues experimental Experimental feature additions/changes/specification level-zero L0 adapter specific issues loader Loader related feature/bug ready to merge Added to PR's which are ready to merge specification Changes or additions to the specification v0.11.x Include in the v0.11.x release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants