[SYCL][CUDA] Add async_group_copy free function implementation #4907

FMarno · 2021-11-05T15:58:44Z

Partial Implement for proposal at #4903.
This adds async_group_copy as a free function, implemented for group and sub_group.
This also adds optimizations for NVIDIA architectures of sm_80 and over, making it actually async.

Credit to @JackAKirk for most of the implementation.

Tests to follow shortly

JackAKirk · 2021-11-05T16:07:33Z

Tests are here: intel/llvm-test-suite#552

alexbatashev

just a few nits

alexbatashev · 2021-11-15T06:16:17Z

sycl/include/sycl/ext/oneapi/group_algorithm.hpp

+detail::enable_if_t<is_group_v<Group> && !detail::is_bool<dataT>::value,
+                    device_event>


This is C++14 feature, we can use it safely

Suggested change

detail::enable_if_t<is_group_v<Group> && !detail::is_bool<dataT>::value,

device_event>

std::enable_if_t<is_group_v<Group> && !detail::is_bool<dataT>::value,

device_event>

alexbatashev · 2021-11-15T06:19:55Z

sycl/include/sycl/ext/oneapi/group_algorithm.hpp

+  static_assert(sizeof(bool) == sizeof(uint8_t),
+                "Async copy to/from bool memory is not supported.");


I guess the idea here is that bool takes 1 byte? Wouldn't it be better then to replace uint8_t with char?

Yes, I think that is the idea. I've replaced that now.

alexbatashev · 2021-11-15T06:21:11Z

sycl/include/sycl/ext/oneapi/group_algorithm.hpp

+  static_assert(sizeof(bool) == sizeof(uint8_t),
+                "Async copy to/from bool memory is not supported.");
+  using VecT = detail::change_base_type_t<T, uint8_t>;
+  auto DestP = multi_ptr<VecT, DestS>(reinterpret_cast<VecT *>(Dest.get()));


Same as above. This reinterpret_cast is UB. Can we replace uint8_t with char?

alexbatashev · 2021-11-15T06:23:22Z

sycl/include/sycl/ext/oneapi/group_algorithm.hpp

+  // https://github.com/intel/llvm/blob/sycl/libclc/generic/libspirv/async/wait_group_events.cl
+  // __spirv_ControlBarrier calls __syncthreads or __nvvm_bar_warp_sync
+  // https://github.com/intel/llvm/blob/sycl/libclc/ptx-nvidiacl/libspirv/synchronization/barrier.cl
+  (Events.ext_oneapi_wait(g), ...);


Folding expressions are C++ 17 feature. @romanovvlad should we guard those with macros for extensions?

It depends. There is a test: sycl/test/basic_tests/stdcpp_compat.cpp, if it passes there is no need to guard.

Works fine on my machine

libclc/generic/include/clc/async/common.h

libclc/ptx-nvidiacl/libspirv/async/async_work_group_strided_copy.cl

libclc/generic/include/clc/async/common.h

bader · 2021-11-30T13:26:12Z

/verify with intel/llvm-test-suite#552

JackAKirk · 2021-11-30T14:24:57Z

/verify with intel/llvm-test-suite#552

I've updated intel/llvm-test-suite#552 reflecting the latest commits in this PR.

Pennycook · 2021-11-30T18:05:22Z

I don't think we should merge this before the spec (#4950) is merged, and I don't think we should mark it as a final/implemented/supported extension until we get clarification from Khronos on whether group algorithms act as synchronization points (https://gitlab.khronos.org/sycl/Specification/-/issues/576).

If we want to merge this soon, it may be worth considering putting it in the experimental:: namespace. That would make the feature available for testing, and we could take things out of experimental:: once we've reached consensus on the expected synchronization behavior. Tagging @gmlueck for comment, since he's also a reviewer on #4950.

gmlueck · 2021-12-03T13:59:56Z

Moved async_group_copy to experimental

If the implementation is moved to "experimental", then the API spec in #4903 should also be updated to say the extension is experimental.

FMarno · 2021-12-03T15:43:35Z

@gmlueck @Pennycook I've moved the implementation to the experimental namespace and updated the proposal in #4950. We'll update #4903 in the near future.

also reworked enable_if for wait_for

Signed-off-by: jack.kirk <[email protected]>

JackAKirk · 2022-02-09T17:11:25Z

All OpenCL types with 4, 8 or 16 byte alignment have been optimized in the cuda backend for sm_80. The test is up to date here: intel/llvm-test-suite#552

bader

@FMarno, could you resolve merge conflicts, please?

JackAKirk · 2022-02-10T11:17:16Z

@FMarno, could you resolve merge conflicts, please?

We've marked this as draft for the moment, I've resolved the conflicts. It needs some fixes.

…impl.

JackAKirk · 2022-02-11T15:08:48Z

@FMarno, could you resolve merge conflicts, please?

We've marked this as draft for the moment, I've resolved the conflicts. It needs some fixes.

I've fixed the outstanding issues.

I've removed the implementation for non OpenCL supported trivially copyable types: This implementation was broken for group<1> etc and is not yet possible because it requires the group::get_local_id method but SYCL 2020 groups are not implemented, although there is a PR for the necessary functionality here: #5447

If we add back this full trivially copyable support then it should also probably be added to async_work_group_copy at the same time.

JackAKirk · 2022-02-18T20:34:21Z

@FMarno
We decided that this implementation requires some more work in the meeting today. Can you mark this PR and the Proposal PR as draft again please?

@Pennycook

Thanks again for the very useful discussions today. I've added the libclc nvptx implementation part of this PR to #5611 (minus the subgroup case).

[SYCL][CUDA] impl of async group copy

7c6da97

FMarno requested review from bader and a team as code owners November 5, 2021 15:58

FMarno requested a review from alexbatashev November 5, 2021 15:58

FMarno mentioned this pull request Nov 5, 2021

[SYCL][CUDA][DOC] Sub-group bitmask group functions #4903

Closed

format

4214f18

JackAKirk mentioned this pull request Nov 5, 2021

[SYCL] Tests for subgroup async copies and the wait collective intel/llvm-test-suite#552

Closed

dm-vodopyanov added the cuda CUDA back-end label Nov 8, 2021

reordered dest and src in async_group_copy

e792d1b

FMarno mentioned this pull request Nov 12, 2021

[SYCL][DOC] Proposal to generalize async_work_group_copy to include sub-group #4950

Closed

alexbatashev reviewed Nov 15, 2021

View reviewed changes

bader changed the title ~~[SYCL][CUDA] impl of async group copy~~ [SYCL][CUDA] Add async_group_copy free function implementation Nov 16, 2021

FMarno added 2 commits November 19, 2021 14:37

WIP: follow proposal updates

0e98b82

inclusion of stride types

770b8bf

bader previously approved these changes Nov 27, 2021

View reviewed changes

FMarno added 2 commits November 29, 2021 14:47

refactored group size query into macro

b53a666

Merge branch 'sycl' into async_group_copy_impl

2552ca3

FMarno dismissed bader’s stale review via 2552ca3 November 30, 2021 11:43

bader reviewed Nov 30, 2021

View reviewed changes

libclc/generic/include/clc/async/common.h Outdated Show resolved Hide resolved

Update libclc/generic/include/clc/async/common.h

dc29d26

bader requested review from romanovvlad, alexbatashev and Pennycook November 30, 2021 13:25

bader previously approved these changes Nov 30, 2021

View reviewed changes

alexbatashev previously approved these changes Nov 30, 2021

View reviewed changes

Moved async_group_copy to experimental

d17c23d

FMarno dismissed bader’s stale review via d17c23d December 3, 2021 13:39

added async_group_copy feature test macro

d853ba4

added async_copy_event for async copy

5f34077

FMarno force-pushed the async_group_copy_impl branch from fbe5835 to 5f34077 Compare January 10, 2022 16:31

FMarno added 2 commits January 12, 2022 17:21

added synchronous fallback for async_group_copy

34754a8

updated comments for async group copy

3f402ba

also reworked enable_if for wait_for

romanovvlad removed their request for review January 18, 2022 13:19

JackAKirk added 2 commits February 8, 2022 16:56

Added 16 byte async copy/more optimized cases.

82727c1

Signed-off-by: jack.kirk <[email protected]>

Small improvement.

65ae386

bader requested a review from alexbatashev February 10, 2022 10:31

bader requested changes Feb 10, 2022

View reviewed changes

Merge branch 'sycl' into async_group_copy_impl

5af5bf2

FMarno marked this pull request as draft February 10, 2022 11:14

JackAKirk added 2 commits February 11, 2022 14:56

renamed to joint_async_copy, fixed reflect, removed broken non SPIRV …

fd9a6f0

…impl.

Updated joint_async_copy description.

fb4f33a

FMarno marked this pull request as ready for review February 11, 2022 16:34

JackAKirk added 2 commits February 11, 2022 16:39

replaced std::is_same_v with sycl::detail::is_same_v.

caa5ba4

format.

166d0bc

FMarno marked this pull request as draft February 22, 2022 11:12

FMarno assigned JackAKirk Feb 22, 2022

github-actions bot added the Stale label Aug 22, 2022

github-actions bot closed this Sep 21, 2022

AerialMantis reopened this Jan 27, 2023

AerialMantis closed this Jan 27, 2023

		detail::enable_if_t<is_group_v<Group> && !detail::is_bool<dataT>::value,
		device_event>

		static_assert(sizeof(bool) == sizeof(uint8_t),
		"Async copy to/from bool memory is not supported.");

[SYCL][CUDA] Add async_group_copy free function implementation #4907

[SYCL][CUDA] Add async_group_copy free function implementation #4907

Uh oh!

Conversation

FMarno commented Nov 5, 2021

Uh oh!

JackAKirk commented Nov 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexbatashev left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

romanovvlad Nov 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bader commented Nov 30, 2021

Uh oh!

JackAKirk commented Nov 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Pennycook commented Nov 30, 2021

Uh oh!

gmlueck commented Dec 3, 2021

Uh oh!

FMarno commented Dec 3, 2021

Uh oh!

JackAKirk commented Feb 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bader left a comment

Choose a reason for hiding this comment

Uh oh!

JackAKirk commented Feb 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JackAKirk commented Feb 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JackAKirk commented Feb 18, 2022

Uh oh!

Uh oh!

JackAKirk commented Nov 5, 2021 •

edited

Loading

romanovvlad Nov 25, 2021 •

edited

Loading

JackAKirk commented Nov 30, 2021 •

edited

Loading

JackAKirk commented Feb 9, 2022 •

edited

Loading

JackAKirk commented Feb 10, 2022 •

edited

Loading

JackAKirk commented Feb 11, 2022 •

edited

Loading