[SYCL] Add a document describing the current status of reduction impl… #4831

v-klochkov · 2021-10-27T18:56:49Z

…ementation

Signed-off-by: Vyacheslav N Klochkov [email protected]

…ementation Signed-off-by: Vyacheslav N Klochkov <[email protected]>

steffenlarsen

Good stuff! Some minor corrections and a style note:

Some of the inline code snippets (like "usm memory") are not code. Would it make sense to use bold or italic for highlighting these instead?

sycl/doc/extensions/Reduction/Reduction_status.md

Co-authored-by: Steffen Larsen <[email protected]>

gmlueck · 2021-11-01T22:09:32Z

Are we planning to implement (and support) both the SYCL 2020 version of the reduction API and also the "ext::oneapi" version? As you note in the document, the APIs are similar, but not exactly the same.

v-klochkov · 2021-11-04T17:44:03Z

As you note in the document, the APIs are similar, but n

This question is for @Pennycook . I suppose the current plan is to support both APIs.

Pennycook · 2021-11-04T18:31:39Z

This question is for @Pennycook . I suppose the current plan is to support both APIs.

My preference is to not support the ext::oneapi version any longer than we have to. The SYCL 2020 version is superior, and we should be steering developers towards using it. I would be in favor of marking the original extension deprecated as soon as possible, and removing it in the first release where that is an option.

v-klochkov · 2021-11-10T18:05:48Z

This question is for @Pennycook . I suppose the current plan is to support both APIs.

My preference is to not support the ext::oneapi version any longer than we have to. The SYCL 2020 version is superior, and we should be steering developers towards using it. I would be in favor of marking the original extension deprecated as soon as possible, and removing it in the first release where that is an option.

If it is not planned to be supported anymore, then it is good time to add the deprecated warning to ext::oneapi::reduction

As discussed in #4831, we should officially deprecate the original reduction extension in favor of the equivalent SYCL 2020 reduction functionality. This PR only makes the changes to the documentation -- adding deprecation notices to the implementation will be done in a separate PR.

s-kanaev

LGTM

v-klochkov · 2021-12-08T17:57:57Z

@Pennycook - do you think it makes sense merging this PR?

Pennycook · 2021-12-08T18:05:15Z

sycl/doc/extensions/Reduction/Reduction_status.md

+
+There are 2 specifications of the reduction feature and both are still actual:
+
+* `sycl::ext::oneapi::reduction` is described in [this document](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/Reduction/Reduction.md). This extension was created as part of a pathfinding/prototyping work before it was added to SYCL 2020 standard.


Suggested change

* `sycl::ext::oneapi::reduction` is described in [this document](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/Reduction/Reduction.md). This extension was created as part of a pathfinding/prototyping work before it was added to SYCL 2020 standard.

* `sycl::ext::oneapi::reduction` is described in [this document](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/Reduction/Reduction.md). This extension is deprecated, and was created as part of a pathfinding/prototyping work before it was added to SYCL 2020 standard.

Thank you. Fixed as suggested: 55c3d75

Pennycook · 2021-12-08T18:06:16Z

sycl/doc/extensions/Reduction/Reduction_status.md

+
+## Reduction inside 1 work-group - the main building block for reduction
+
+The reduction algorithm for 1 basic block depends currently on combination/availability of 2 features:


Suggested change

The reduction algorithm for 1 basic block depends currently on combination/availability of 2 features:

The reduction algorithm for 1 work-group depends currently on combination/availability of 2 features:

Misprint. Thank you. Fixed as suggested: 55c3d75

Pennycook · 2021-12-08T18:06:47Z

sycl/doc/extensions/Reduction/Reduction_status.md

+- `fast atomics` (i.e. operations available on the target device), such as fetch_add(), fetch_min(), etc.
+- `fast reduce` operation for work-group, i.e. `reduce_over_group()`
+
+So, if the reduction operation/type has both `fast atomics` and `fast reduce`, then the reduction on work-group with 1 reduction variable is implemented using does the following: the elements inside the work-group are reduced using `ext::oneapi::reduce` and the final result is atomically added to the final/global reduction variable.


Suggested change

So, if the reduction operation/type has both `fast atomics` and `fast reduce`, then the reduction on work-group with 1 reduction variable is implemented using does the following: the elements inside the work-group are reduced using `ext::oneapi::reduce` and the final result is atomically added to the final/global reduction variable.

So, if the reduction operation/type has both `fast atomics` and `fast reduce`, then the reduction on work-group with 1 reduction variable does the following: the elements inside the work-group are reduced using `ext::oneapi::reduce` and the final result is atomically added to the final/global reduction variable.

Thank you. Fixed as suggested: 55c3d75

Pennycook · 2021-12-08T18:07:05Z

sycl/doc/extensions/Reduction/Reduction_status.md

+```
+
+The most general case is when the `fast atomics` and `fast reduce` are not available.
+It computes the partial sum using tree-reduction loop and stores the partial sum for the work-group to a global array of partial sums, which later must be also reduced (by using additional(s) kernel(s)). This general case algorithm also requires allocation of a local memory for the tree-reduction loop:


Suggested change

It computes the partial sum using tree-reduction loop and stores the partial sum for the work-group to a global array of partial sums, which later must be also reduced (by using additional(s) kernel(s)). This general case algorithm also requires allocation of a local memory for the tree-reduction loop:

It computes the partial sum using tree-reduction loop and stores the partial sum for the work-group to a global array of partial sums, which later must be also reduced (by using additional kernel(s)). This general case algorithm also requires allocation of a local memory for the tree-reduction loop:

Thank you. Fixed as suggested: 55c3d75

Pennycook · 2021-12-08T18:08:23Z

sycl/doc/extensions/Reduction/Reduction_status.md

+The *additional kernel(s)* that reduce the partial sums look very similar to the *main* kernel, the only difference is that they do not call user's function.
+
+---
+TODO #1 (Performance): After the reduction accepting 'range' was implemented it became possible to simply re-use that implementation for the *additional kernel* runs. Instead of running the *additional kernel* several times (if the number of partial sums is still big), it is better to run the parallel_for accepting 'range' once.


I think you should qualify here which devices you're talking about, unless these statements about performance apply to all backends and devices.

This is the general comment, i.e. for all backends/devices. If there is an efficient implementation of parallel_for(range, reduction), then why not use that to reduce let's say 1M partial sums, instead of inventing/adding N calls of additional kernels inside parallel_for(nd_range, reduction) implementation.

Pennycook · 2021-12-08T18:11:00Z

sycl/doc/extensions/Reduction/Reduction_status.md

+### 3) Support `parallel_for` accepting `range` and 2 or more reduction variables.
+Currently `parallel_for()` accepting `range` may handle only 1 reduction variable. It does not support 2 or more. 
+
+The temporary work-around for that is to use some container multiple reduction variables, i.e. std::pair, std::struct or a custom struct/class containing 2 or more reduction variables, and also define a custom operator that would be passed to `reduction` constructor.


What is std::struct? I'm not familiar with it, and haven't been able to find any information online.

Misprint. I was thinking about std::tuple and struct (that is written few words later), which resulted into misprint "std::struct".
Fixed in: 55c3d75

Pennycook · 2021-12-08T18:13:01Z

@Pennycook - do you think it makes sense merging this PR?

I'm not opposed to things being merged in their current form, but doing another read through this morning I spotted a few typos and some points that could be clarified.

steffenlarsen

LGTM. I will approve once @Pennycook's comments have been addressed.

Signed-off-by: Vyacheslav N Klochkov <[email protected]>

[SYCL] Add a document describing the current status of reduction impl…

6bb311c

…ementation Signed-off-by: Vyacheslav N Klochkov <[email protected]>

v-klochkov requested review from steffenlarsen and s-kanaev October 27, 2021 20:10

steffenlarsen reviewed Oct 28, 2021

View reviewed changes

Apply suggestions from code review

06624cb

Co-authored-by: Steffen Larsen <[email protected]>

v-klochkov marked this pull request as ready for review November 1, 2021 18:30

v-klochkov requested a review from a team as a code owner November 1, 2021 18:30

Pennycook mentioned this pull request Nov 11, 2021

[SYCL][Doc] Deprecate reduction extension #4941

Merged

s-kanaev requested a review from steffenlarsen December 8, 2021 13:12

s-kanaev previously approved these changes Dec 8, 2021

View reviewed changes

Pennycook reviewed Dec 8, 2021

View reviewed changes

steffenlarsen reviewed Dec 9, 2021

View reviewed changes

Minor fixes per reviewer's comments/requests

55c3d75

Signed-off-by: Vyacheslav N Klochkov <[email protected]>

v-klochkov dismissed s-kanaev’s stale review via 55c3d75 December 13, 2021 22:54

steffenlarsen approved these changes Dec 14, 2021

View reviewed changes

Pennycook approved these changes Dec 14, 2021

View reviewed changes

v-klochkov merged commit 6f1bd41 into intel:sycl Dec 15, 2021

v-klochkov deleted the reduction_impl_doc branch December 15, 2021 17:10


		There are 2 specifications of the reduction feature and both are still actual:

		* `sycl::ext::oneapi::reduction` is described in [this document](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/Reduction/Reduction.md). This extension was created as part of a pathfinding/prototyping work before it was added to SYCL 2020 standard.


		## Reduction inside 1 work-group - the main building block for reduction

		The reduction algorithm for 1 basic block depends currently on combination/availability of 2 features:

	The reduction algorithm for 1 basic block depends currently on combination/availability of 2 features:
	The reduction algorithm for 1 work-group depends currently on combination/availability of 2 features:

	So, if the reduction operation/type has both `fast atomics` and `fast reduce`, then the reduction on work-group with 1 reduction variable is implemented using does the following: the elements inside the work-group are reduced using `ext::oneapi::reduce` and the final result is atomically added to the final/global reduction variable.
	So, if the reduction operation/type has both `fast atomics` and `fast reduce`, then the reduction on work-group with 1 reduction variable does the following: the elements inside the work-group are reduced using `ext::oneapi::reduce` and the final result is atomically added to the final/global reduction variable.

	It computes the partial sum using tree-reduction loop and stores the partial sum for the work-group to a global array of partial sums, which later must be also reduced (by using additional(s) kernel(s)). This general case algorithm also requires allocation of a local memory for the tree-reduction loop:
	It computes the partial sum using tree-reduction loop and stores the partial sum for the work-group to a global array of partial sums, which later must be also reduced (by using additional kernel(s)). This general case algorithm also requires allocation of a local memory for the tree-reduction loop:

[SYCL] Add a document describing the current status of reduction impl… #4831

[SYCL] Add a document describing the current status of reduction impl… #4831

Uh oh!

Conversation

v-klochkov commented Oct 27, 2021

Uh oh!

steffenlarsen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gmlueck commented Nov 1, 2021

Uh oh!

v-klochkov commented Nov 4, 2021

Uh oh!

Pennycook commented Nov 4, 2021

Uh oh!

v-klochkov commented Nov 10, 2021

Uh oh!

s-kanaev left a comment

Choose a reason for hiding this comment

Uh oh!

v-klochkov commented Dec 8, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Pennycook commented Dec 8, 2021

Uh oh!

steffenlarsen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!