Skip to content

Add new launch property to support work_group_scratch_memory #1968

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 29, 2024

Conversation

Naghasan
Copy link
Contributor

@Naghasan Naghasan commented Aug 13, 2024

intel/llvm#15061 introduce a new property work_group_scratch_size which allow the user to set a given amount of local memory to be used.

In order to pass this information to the adaptor, the patch adds a new launch property to urEnqueueKernelLaunchCustomExp.

The patch also changes the signature of urEnqueueKernelLaunchCustomExp to add global offset in order to maintain features when using this extension.

Copy link
Contributor

@kbenzie kbenzie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't squash on merge so if you could squash the commits in this PR that would be appriciated.

@Naghasan Naghasan changed the title Add new launch property to support work_group_static_memory Add new launch property to support work_group_scratch_memory Nov 13, 2024
@Naghasan Naghasan force-pushed the work_group_static branch 2 times, most recently from 55f2d02 to ae9804c Compare November 18, 2024 12:51
@kbenzie
Copy link
Contributor

kbenzie commented Nov 19, 2024

@oneapi-src/unified-runtime-cuda-write @oneapi-src/unified-runtime-level-zero-write please review, this is intended for the next release.

Copy link
Contributor

@JackAKirk JackAKirk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@frasercrmck frasercrmck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA LGTM

@Naghasan Naghasan force-pushed the work_group_static branch 2 times, most recently from a64d649 to bc9eef9 Compare November 19, 2024 15:37
@Naghasan
Copy link
Contributor Author

Naghasan commented Nov 22, 2024

ping @oneapi-src/unified-runtime-level-zero-write please review this is intended for the next release.

@JackAKirk
Copy link
Contributor

LGTM

Actually latest testing shows some related failures in the cuda build

@Naghasan
Copy link
Contributor Author

Actually latest testing shows some related failures in the cuda build

as far as I can tell it is because needs the DPC++ change (function signature change). The related PR on the dpc++ side is green, same reason for the level 0 failure.

the other failures are unrelated to this patch (failure to fetch xpti)

@pbalcer pbalcer added the ready to merge Added to PR's which are ready to merge label Nov 25, 2024
intel/llvm#15061 introduces a new property work_group_scratch_memory which allow the user to set a given amount of local memory to be used.

In order to pass this information to the adaptor, the patch adds a new launch property to urEnqueueKernelLaunchCustomExp.

The patch also changes the signature of urEnqueueKernelLaunchCustomExp to add global offset in order to maintain features when using this extension.

Signed-off-by: Victor Lomuller <[email protected]>
@kbenzie kbenzie added the v0.11.x Include in the v0.11.x release label Nov 26, 2024
@callumfare callumfare merged commit 78b95e6 into oneapi-src:main Nov 29, 2024
69 of 75 checks passed
@callumfare
Copy link
Contributor

Hi @Naghasan, unfortunately I had to revert this PR in #2400 due to build failures in intel/llvm#15061 - I was looking at the wrong results when I hit merge. That said I'm not sure why the adapters built ok here but not in intel/llvm. I updated the tag there to point at main, you'll have to switch that back to your branch again when you pick this up.

@Naghasan Naghasan deleted the work_group_static branch December 2, 2024 09:03
@Naghasan
Copy link
Contributor Author

Naghasan commented Dec 2, 2024

@callumfare the build failure was due to the fact you haven't bumped the tag, only updated the comment ;)

intel/llvm@33ea3b0

martygrant added a commit to intel/llvm that referenced this pull request Dec 4, 2024
The patch partially implements `work_group_static` and update proposal.

Implemented:

- `work_group_static` to handle static allocation in kernel.
- `get_dynamic_work_group_memory` to handle runtime allocation, but only
on CUDA

`work_group_static` is implemented by exposing `SYCLScope(WorkGroup)`,
allowing the class to be decorated by the attribute and uses the same
mechanism during lowering to place the variable in local memory.

`get_dynamic_work_group_memory` uses a new builtin function,
`__sycl_dynamicLocalMemoryPlaceholder `, which is lowered into
referencing a 0 sized array GV when targeting NVPTX. The approach for
SPIR will need to differ from this lowering.

UR change oneapi-src/unified-runtime#1968,
oneapi-src/unified-runtime#2403

---------

Signed-off-by: Lukas Sommer <[email protected]>
Signed-off-by: Victor Lomuller <[email protected]>
Co-authored-by: Lukas Sommer <[email protected]>
Co-authored-by: Atharva Dubey <[email protected]>
Co-authored-by: Marcos Maronas <[email protected]>
Co-authored-by: Callum Fare <[email protected]>
Co-authored-by: Martin Morrison-Grant <[email protected]>
KornevNikita pushed a commit to intel/llvm that referenced this pull request Feb 25, 2025
The patch partially implements `work_group_static` and update proposal.

Implemented:

- `work_group_static` to handle static allocation in kernel.
- `get_dynamic_work_group_memory` to handle runtime allocation, but only
on CUDA

`work_group_static` is implemented by exposing `SYCLScope(WorkGroup)`,
allowing the class to be decorated by the attribute and uses the same
mechanism during lowering to place the variable in local memory.

`get_dynamic_work_group_memory` uses a new builtin function,
`__sycl_dynamicLocalMemoryPlaceholder `, which is lowered into
referencing a 0 sized array GV when targeting NVPTX. The approach for
SPIR will need to differ from this lowering.

UR change oneapi-src/unified-runtime#1968,
oneapi-src/unified-runtime#2403

---------

Signed-off-by: Lukas Sommer <[email protected]>
Signed-off-by: Victor Lomuller <[email protected]>
Co-authored-by: Lukas Sommer <[email protected]>
Co-authored-by: Atharva Dubey <[email protected]>
Co-authored-by: Marcos Maronas <[email protected]>
Co-authored-by: Callum Fare <[email protected]>
Co-authored-by: Martin Morrison-Grant <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready to merge Added to PR's which are ready to merge v0.11.x Include in the v0.11.x release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants