-
Notifications
You must be signed in to change notification settings - Fork 797
[SYCL][Doc] Extended group load/store APIs proposal #7593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][Doc] Extended group load/store APIs proposal #7593
Conversation
@andreyfe1, @Pennycook, could you please take a look at the initial proposal? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also wrap things at 80 columns; it makes it much easier to review.
sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_store_extended.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_store_extended.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_store_extended.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_store_extended.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_store_extended.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_store_extended.asciidoc
Outdated
Show resolved
Hide resolved
work-group or sub-group and some associated scratch space. | ||
|
||
_Effects_: Loads `ItemsPerWorkItem` elements from `in_ptr` to `out` | ||
using the `gh` group helper object. `GroupMemoryHelper` specifies data placement properties and also can work with extra options such as specifying out-of-boundry value and limited work-items number to work with. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using extra options in the GroupMemoryHelper
, could we make these two new properties? Then a developer could use them with the overload that doesn't require a GroupMemoryHelper
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, removed (2) and (4) and added a TODO for other properties, wanna do some prototyping for it
sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_store_extended.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_store_extended.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_store_extended.asciidoc
Outdated
Show resolved
Hide resolved
…extended.asciidoc Co-authored-by: John Pennycook <[email protected]>
Co-authored-by: John Pennycook <[email protected]>
Co-authored-by: John Pennycook <[email protected]>
…lizaro/llvm into dev/aelizaro/block_load_store
It will be good to have complete kernels for this extension in tests directory. |
@dkhaldi, do you mean to add full tests without implementation, to illustrate how it supposes to work with sycl's data types? |
Right. |
Test cases are added to illustrate how APIs should work.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spotted a few minor issues during review of the latest changes, but I think the fixes are straightforward. Let me know when they're applied and I'll approve.
sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_store_extended.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_store_extended.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_store_extended.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_store_extended.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_store_extended.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_store_extended.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_store_extended.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_store_extended.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_store_extended.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_store_extended.asciidoc
Outdated
Show resolved
Hide resolved
…e_extended.asciidoc Co-authored-by: John Pennycook <[email protected]>
…e_extended.asciidoc Co-authored-by: John Pennycook <[email protected]>
…e_extended.asciidoc Co-authored-by: John Pennycook <[email protected]>
…e_extended.asciidoc Co-authored-by: John Pennycook <[email protected]>
…extended.asciidoc Co-authored-by: John Pennycook <[email protected]>
…extended.asciidoc Co-authored-by: John Pennycook <[email protected]>
…extended.asciidoc Co-authored-by: John Pennycook <[email protected]>
…extended.asciidoc Co-authored-by: John Pennycook <[email protected]>
namespace sycl::ext::oneapi::experimental { | ||
|
||
struct full_work_group_hint { | ||
using value_t = | ||
property_value<full_work_group_hint>; | ||
}; | ||
|
||
inline constexpr full_work_group_hint::value_t full_work_group; | ||
|
||
struct full_sub_group_hint { | ||
using value_t = | ||
property_value<full_sub_group_hint>; | ||
}; | ||
|
||
inline constexpr full_sub_group_hint::value_t full_sub_group; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should combine these into one hint. We can call it full_group
for now.
We already know whether we're using a group or a sub-group from the first argument. There are also going to be other group types in the future, and it would be convenient to be able to re-use the same property across all of those group types as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure full_group
is a good name. parallel_for
requires that global size is a multiple of a work group size, so WG is always "full" in some sense. It might still be that the last SG of each WG isn't full.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but that just means that full_group
doesn't do anything when the algorithm is invoked with a sycl::group
, but improves performance when the algorithm is invoked with a sycl::sub_group
. The "group" in "full group" is intended to mean "anything satisfying the group concept" and not "sycl::group
".
I want to push for a renaming of sycl::group
to sycl::work_group
to avoid this sort of confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we need another property for the WG case, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? Maybe I'm misunderstanding what you're expecting to happen here. I thought your earlier message was saying that work-groups were always "full"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are, but the block-read is SG-level intrinsic. We need SGs to be "full" to use them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're talking past each other.
Consider the below:
auto sg = it.get_sub_group();
group_load(sg, input, output_span, properties{contiguous_memory, full_group});
The interpretation here is that sg
represents a full sub-group.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the user writes
auto g = it.get_group();
group_load(g, input, output_span, properties{contiguous_memory, full_group});
we still can't use block loads without runtime checks. WG is full, but its last SG might not be. Also, full_group
is meaningless for WG because it always full per SYCL spec parallel_for
/nd_range
restrictions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We couldn't use block loads in this case anyway, because the stride in this case is the size of the work-group and not the sub-group. You're right that we could generate block loads in some cases (like a work-group + blocked data placement) but I think we'd still need a runtime check because different devices carve up work-groups into sub-groups differently.
full_group
is currently meaningless for work-groups, but I think it makes more sense to have a generic full_group
than to make the property sub-group specific. That allows a developer to write generic code that uses full_group
and have it work in both cases, and it may be meaningful for future group types.
The following properties is introduced to be used | ||
as a hint that implementation can use get_max_local_range(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users might not understand why this is helpful for implementations, so I think we should be a little more explicit here. Maybe something like:
The following properties is introduced to be used | |
as a hint that implementation can use get_max_local_range(): | |
The following property can be used as a hint that | |
`get_local_range()` is equal to `get_max_local_range()`, | |
which may enable more aggressive optimizations for some | |
implementations. | |
[NOTE] | |
==== | |
Using `full_group` is necessary to generate SPIR-V block read | |
and block write instructions, because these instructions are | |
defined to use the maximum group size as the stride. | |
==== |
sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_store.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_store.asciidoc
Outdated
Show resolved
Hide resolved
Co-authored-by: Greg Lueck <[email protected]>
Co-authored-by: Greg Lueck <[email protected]>
@gmlueck, @Pennycook, @aelovikov-intel, could you please, double-check the PR? (and huge thanks to you for contribution!) I think I addressed everything except one ongoing discussion about |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spotted one minor potential issue, but apart from this and the full_group
discussion, I think we're good.
sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_store.asciidoc
Outdated
Show resolved
Hide resolved
`Properties` argument is reserved for future revisions of this extention and is | ||
ignored now. | ||
Default value is empty `sycl::ext::oneapi::experimental::empty_properties_t` | ||
May be used in future for setting boundary values or limiting numbers of work | ||
items. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question as above. Can we use contiguous
here?
Co-authored-by: John Pennycook <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good, but I have some comments about the use of the word "hint". See below.
sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_store.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_store.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_store.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_store.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_store.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_store.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_store.asciidoc
Outdated
Show resolved
Hide resolved
Co-authored-by: Greg Lueck <[email protected]>
Co-authored-by: Greg Lueck <[email protected]>
Co-authored-by: Greg Lueck <[email protected]>
Co-authored-by: Greg Lueck <[email protected]>
Co-authored-by: Greg Lueck <[email protected]>
Co-authored-by: Greg Lueck <[email protected]>
Co-authored-by: Greg Lueck <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@Pennycook, can we merge it now? |
I think so! @intel/llvm-gatekeepers, please merge. |
An initial draft of extended group load/store APIs to provide capabilities to work with temporary memory buffers and load/store multiple elements per work item.