[SYCL][Doc] Extended group load/store APIs proposal #7593

aelizaro · 2022-11-30T16:39:16Z

An initial draft of extended group load/store APIs to provide capabilities to work with temporary memory buffers and load/store multiple elements per work item.

aelizaro · 2022-11-30T16:42:33Z

@andreyfe1, @Pennycook, could you please take a look at the initial proposal?

Pennycook

Please also wrap things at 80 columns; it makes it much easier to review.

sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_store_extended.asciidoc

Pennycook · 2022-11-30T17:20:46Z

sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_store_extended.asciidoc

+work-group or sub-group and some associated scratch space.
+
+_Effects_: Loads `ItemsPerWorkItem` elements from `in_ptr` to `out`
+using the `gh` group helper object. `GroupMemoryHelper` specifies data placement properties and also can work with extra options such as specifying out-of-boundry value and limited work-items number to work with.


Instead of using extra options in the GroupMemoryHelper, could we make these two new properties? Then a developer could use them with the overload that doesn't require a GroupMemoryHelper.

OK, removed (2) and (4) and added a TODO for other properties, wanna do some prototyping for it

sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_store_extended.asciidoc

…extended.asciidoc Co-authored-by: John Pennycook <[email protected]>

Co-authored-by: John Pennycook <[email protected]>

…lizaro/llvm into dev/aelizaro/block_load_store

dkhaldi · 2022-12-02T20:06:21Z

It will be good to have complete kernels for this extension in tests directory.
Specifically, I would like to see the usage of vec, marray, and simd as the span memory storage.

aelizaro · 2022-12-06T11:00:55Z

@dkhaldi, do you mean to add full tests without implementation, to illustrate how it supposes to work with sycl's data types?

dkhaldi · 2022-12-06T19:56:51Z

@dkhaldi, do you mean to add full tests without implementation, to illustrate how it supposes to work with sycl's data types?

Right.

aelizaro · 2023-01-06T13:48:51Z

Test cases are added to illustrate how APIs should work.
For sycl::vec case, we can use both span approaches (as it is shown in the test) or have a specialization for a single value case - it will look cleaner.
For sycl::marray it works nicely as its internal representation guarantees contiguous storage of memory, so we can use:

sycl::marray<InputT, items_per_thread> data;
sycl_exp::joint_load(item.get_group(), in.get_pointer(), sycl::span<InputT, items_per_thread>{ data.begin(), data.end() });

Pennycook

Spotted a few minor issues during review of the latest changes, but I think the fixes are straightforward. Let me know when they're applied and I'll approve.

sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_store_extended.asciidoc

…e_extended.asciidoc Co-authored-by: John Pennycook <[email protected]>

…extended.asciidoc Co-authored-by: John Pennycook <[email protected]>

Pennycook · 2024-03-22T17:07:30Z

sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_store.asciidoc

+namespace sycl::ext::oneapi::experimental {
+
+struct full_work_group_hint {
+  using value_t =
+      property_value<full_work_group_hint>;
+};
+
+inline constexpr full_work_group_hint::value_t full_work_group;
+
+struct full_sub_group_hint {
+  using value_t =
+      property_value<full_sub_group_hint>;
+};
+
+inline constexpr full_sub_group_hint::value_t full_sub_group;


I think we should combine these into one hint. We can call it full_group for now.

We already know whether we're using a group or a sub-group from the first argument. There are also going to be other group types in the future, and it would be convenient to be able to re-use the same property across all of those group types as well.

I'm not sure full_group is a good name. parallel_for requires that global size is a multiple of a work group size, so WG is always "full" in some sense. It might still be that the last SG of each WG isn't full.

Right, but that just means that full_group doesn't do anything when the algorithm is invoked with a sycl::group, but improves performance when the algorithm is invoked with a sycl::sub_group. The "group" in "full group" is intended to mean "anything satisfying the group concept" and not "sycl::group".

I want to push for a renaming of sycl::group to sycl::work_group to avoid this sort of confusion.

Then we need another property for the WG case, I think.

Why? Maybe I'm misunderstanding what you're expecting to happen here. I thought your earlier message was saying that work-groups were always "full"?

They are, but the block-read is SG-level intrinsic. We need SGs to be "full" to use them.

I think we're talking past each other.

Consider the below:

auto sg = it.get_sub_group(); group_load(sg, input, output_span, properties{contiguous_memory, full_group});

The interpretation here is that sg represents a full sub-group.

If the user writes

auto g = it.get_group(); group_load(g, input, output_span, properties{contiguous_memory, full_group});

we still can't use block loads without runtime checks. WG is full, but its last SG might not be. Also, full_group is meaningless for WG because it always full per SYCL spec parallel_for/nd_range restrictions.

We couldn't use block loads in this case anyway, because the stride in this case is the size of the work-group and not the sub-group. You're right that we could generate block loads in some cases (like a work-group + blocked data placement) but I think we'd still need a runtime check because different devices carve up work-groups into sub-groups differently.

full_group is currently meaningless for work-groups, but I think it makes more sense to have a generic full_group than to make the property sub-group specific. That allows a developer to write generic code that uses full_group and have it work in both cases, and it may be meaningful for future group types.

Pennycook · 2024-03-22T17:15:03Z

sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_store.asciidoc

+The following properties  is introduced to be used
+as a hint that implementation can use get_max_local_range():


Users might not understand why this is helpful for implementations, so I think we should be a little more explicit here. Maybe something like:

Suggested change

The following properties is introduced to be used

as a hint that implementation can use get_max_local_range():

The following property can be used as a hint that

`get_local_range()` is equal to `get_max_local_range()`,

which may enable more aggressive optimizations for some

implementations.

[NOTE]

====

Using `full_group` is necessary to generate SPIR-V block read

and block write instructions, because these instructions are

defined to use the maximum group size as the stride.

====

sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_store.asciidoc

Co-authored-by: Greg Lueck <[email protected]>

…e requirement

aelizaro · 2024-03-26T17:07:05Z

@gmlueck, @Pennycook, @aelovikov-intel, could you please, double-check the PR? (and huge thanks to you for contribution!)

I think I addressed everything except one ongoing discussion about full_group hint naming #7593 (comment)

Pennycook

I spotted one minor potential issue, but apart from this and the full_group discussion, I think we're good.

sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_store.asciidoc

Pennycook · 2024-03-26T17:51:51Z

sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_store.asciidoc

+`Properties` argument is reserved for future revisions of this extention and is
+ignored now.
+Default value is empty `sycl::ext::oneapi::experimental::empty_properties_t`
+May be used in future for setting boundary values or limiting numbers of work
+items.


Same question as above. Can we use contiguous here?

Co-authored-by: John Pennycook <[email protected]>

gmlueck

I think this looks good, but I have some comments about the use of the word "hint". See below.

sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_store.asciidoc

Co-authored-by: Greg Lueck <[email protected]>

gmlueck

LGTM!

aelizaro · 2024-04-04T13:27:20Z

@Pennycook, can we merge it now?

Pennycook · 2024-04-04T14:48:15Z

@Pennycook, can we merge it now?

I think so! @intel/llvm-gatekeepers, please merge.

aelizaro added 3 commits November 30, 2022 08:22

[SYCL][DOC] Extended group load/store APIs

0878bda

minor typo fixes

82111cd

Add comments and spaces

b23fda8

aelizaro requested a review from a team as a code owner November 30, 2022 16:39

Pennycook requested changes Nov 30, 2022

View reviewed changes

aelizaro and others added 6 commits November 30, 2022 17:36

Update sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_store_…

c6f74ad

…extended.asciidoc Co-authored-by: John Pennycook <[email protected]>

Update Sycl version

b8e3d87

Co-authored-by: John Pennycook <[email protected]>

typo fix

4a8cd52

Co-authored-by: John Pennycook <[email protected]>

address comments

349f259

Merge branch 'dev/aelizaro/block_load_store' of https://github.com/ae…

f994780

…lizaro/llvm into dev/aelizaro/block_load_store

fyx iterator typo

6e1b9ff

aelizaro added 2 commits January 6, 2023 05:39

fixes in doc

fb9f3c4

add testcases for group load/store

426bf40

aelizaro requested a review from a team as a code owner January 6, 2023 13:40

aelizaro requested a review from KseniyaTikhomirova January 6, 2023 13:40

[Tests] Comment tests as functionality is not yet implemented

db09297

Pennycook requested changes Jan 23, 2023

View reviewed changes

aelizaro and others added 8 commits January 24, 2023 09:59

Typo fix sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_stor…

2966951

…e_extended.asciidoc Co-authored-by: John Pennycook <[email protected]>

Typo fix sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_stor…

a02d7e7

…e_extended.asciidoc Co-authored-by: John Pennycook <[email protected]>

Typo fix sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_stor…

7994bec

…e_extended.asciidoc Co-authored-by: John Pennycook <[email protected]>

Typo fix sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_stor…

ddf2c06

…e_extended.asciidoc Co-authored-by: John Pennycook <[email protected]>

Update sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_store_…

9ee288c

…extended.asciidoc Co-authored-by: John Pennycook <[email protected]>

Update sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_store_…

d7e0a13

…extended.asciidoc Co-authored-by: John Pennycook <[email protected]>

Update sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_store_…

4b9ebd5

…extended.asciidoc Co-authored-by: John Pennycook <[email protected]>

Update sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_store_…

c579854

…extended.asciidoc Co-authored-by: John Pennycook <[email protected]>

aelizaro added 2 commits March 22, 2024 09:59

remove private memory precondition and add full_wg/sg hints

d09949c

fix function name

7463119

Pennycook reviewed Mar 22, 2024

View reviewed changes

gmlueck reviewed Mar 22, 2024

View reviewed changes

sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_store.asciidoc Outdated Show resolved Hide resolved

gmlueck reviewed Mar 22, 2024

View reviewed changes

sycl/doc/extensions/proposed/sycl_ext_oneapi_group_load_store.asciidoc Outdated Show resolved Hide resolved

aelizaro and others added 5 commits March 25, 2024 15:45

memory_hint->memory_key

1f05500

Co-authored-by: Greg Lueck <[email protected]>

fix namespace

cfcd567

Co-authored-by: Greg Lueck <[email protected]>

Unify full_group_key, add trivially copyable and default constructibl…

e86e983

…e requirement

add constrains for scalar case

4d3b536

Merge branch 'intel:sycl' into dev/aelizaro/block_load_store

544336c

Pennycook reviewed Mar 26, 2024

View reviewed changes

aelizaro and others added 2 commits March 26, 2024 18:56

Update properties description for scalar version

74930ce

Co-authored-by: John Pennycook <[email protected]>

Update properties description for scalar version 2

ad64bd6

gmlueck reviewed Mar 28, 2024

View reviewed changes

aelizaro and others added 10 commits March 29, 2024 11:53

Updater wording for properties

60e54bd

Co-authored-by: Greg Lueck <[email protected]>

Update wording for properties

ba8f86f

Co-authored-by: Greg Lueck <[email protected]>

contiguous_memory_hint->contiguous_memory_key

6c502e5

Co-authored-by: Greg Lueck <[email protected]>

contiguous_memory_hint->contiguous_memory_key

f6f5dc5

Co-authored-by: Greg Lueck <[email protected]>

Update wording for properties

ff943a1

Co-authored-by: Greg Lueck <[email protected]>

Update wording for properties

fa5c151

Co-authored-by: Greg Lueck <[email protected]>

Update hint->property

717c5b8

Co-authored-by: Greg Lueck <[email protected]>

align wording

00d7259

fix typo

8404b10

fix typo 2

808d5f3

gmlueck approved these changes Mar 29, 2024

View reviewed changes

AlexeySachkov mentioned this pull request Apr 3, 2024

[SYCL][SubBlockNDRange] Extend spec to allow dealing with USM pointers #849

Closed

dm-vodopyanov merged commit e320aa4 into intel:sycl Apr 4, 2024

		The following properties is introduced to be used
		as a hint that implementation can use get_max_local_range():

-The following properties  is introduced to be used
-as a hint that implementation can use get_max_local_range():
+The following property can be used as a hint that
+`get_local_range()` is equal to `get_max_local_range()`,
+which may enable more aggressive optimizations for some
+implementations.
+[NOTE]
+====
+Using `full_group` is necessary to generate SPIR-V block read
+and block write instructions, because these instructions are
+defined to use the maximum group size as the stride.
+====

[SYCL][Doc] Extended group load/store APIs proposal #7593

[SYCL][Doc] Extended group load/store APIs proposal #7593

Uh oh!

Conversation

aelizaro commented Nov 30, 2022

Uh oh!

aelizaro commented Nov 30, 2022

Uh oh!

Pennycook left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dkhaldi commented Dec 2, 2022

Uh oh!

aelizaro commented Dec 6, 2022

Uh oh!

dkhaldi commented Dec 6, 2022

Uh oh!

aelizaro commented Jan 6, 2023

Uh oh!

Pennycook left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Pennycook Mar 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

aelizaro commented Mar 26, 2024

Uh oh!

Pennycook left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gmlueck left a comment

Choose a reason for hiding this comment

Uh oh!

Pennycook Mar 22, 2024 •

edited

Loading