-
Notifications
You must be signed in to change notification settings - Fork 797
[SYCL] Implement user-defined reduction extension #7436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Implement user-defined reduction extension #7436
Conversation
Spec: intel/llvm#7202 Implementation: intel/llvm#7436
Spec: intel/llvm#7202 Implementation: intel/llvm#7436
/verify with intel/llvm-test-suite#1395 |
sycl/include/sycl/ext/oneapi/experimental/user_defined_reductions.hpp
Outdated
Show resolved
Hide resolved
sycl/include/sycl/ext/oneapi/experimental/user_defined_reductions.hpp
Outdated
Show resolved
Hide resolved
sycl/include/sycl/ext/oneapi/experimental/user_defined_reductions.hpp
Outdated
Show resolved
Hide resolved
sycl/include/sycl/ext/oneapi/experimental/user_defined_reductions.hpp
Outdated
Show resolved
Hide resolved
sycl/include/sycl/ext/oneapi/experimental/user_defined_reductions.hpp
Outdated
Show resolved
Hide resolved
sycl/include/sycl/ext/oneapi/experimental/user_defined_reductions.hpp
Outdated
Show resolved
Hide resolved
sycl/include/sycl/ext/oneapi/experimental/user_defined_reductions.hpp
Outdated
Show resolved
Hide resolved
|
||
// ---- four reduce_over_group overloads with native binary_op | ||
template <typename GroupHelper, typename T, class BinaryOperation> | ||
sycl::detail::enable_if_t< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer if constexpr
-based implementation but I'm curious what others think.
|
||
// ---- reduce_over_group | ||
template <typename GroupHelper, typename T, typename BinaryOperation> | ||
sycl::detail::enable_if_t<(is_group_helper_v<std::decay_t<GroupHelper>>), T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not make std::decay_t
part of is_group_helper
?
sycl/include/sycl/ext/oneapi/experimental/user_defined_reductions.hpp
Outdated
Show resolved
Hide resolved
sycl/include/sycl/ext/oneapi/experimental/user_defined_reductions.hpp
Outdated
Show resolved
Hide resolved
sycl::detail::for_each( | ||
g.get_group(), first, last, | ||
[&](const typename sycl::detail::remove_pointer<Ptr>::type &x) { | ||
partial = binary_op(partial, x); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be linear and not logarithmic in complexity. I'd expect something like https://github.com/intel/llvm/blob/sycl/sycl/include/sycl/reduction.hpp#L1810-L1818 might be applicable here, although I'm not sure what gives better performance for different work sizes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to put a comment here for now. We don't yet have any data from actual uses of this new function about which use cases (data type, wg sizes) are the most popular ones.
Update simd construct definition and add mlir tests to prove that order(concurrent) clause is correctly modeled for simd construct.
…CTOR_ELT. 1- Enable custom lowering INSERT_VECTOR_ELT to generate code compatible to streaming mode. 2- Add testing file: insert-vector-elt.ll Differential Revision: https://reviews.llvm.org/D138222
Bot failure: https://lab.llvm.org/buildbot/#/builders/16/builds/38156 e3cd498 (D133318) updates CloneBlock to preserve debug use-before-defs. Update local-as-metadata-undominated-use.ll to reflect this.
…mbine in assignment tracking builds The Assignment Tracking debug-info feature is outlined in this RFC: https://discourse.llvm.org/t/ rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir This reduces peak memory overhead by 15% when building CTMark's tramp3d-v4 with -O2 -g with assignment tracking enabled. Reviewed By: jmorse Differential Revision: https://reviews.llvm.org/D133321
And use it findHeaders. findHeaders now finds all header candidates given a symbol location (these headers will be attached with proper signals, in a followup patch). Differential Revision: https://reviews.llvm.org/D137698
This fixes #36373, #36905 and partial of #58685. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D137711
Reviewed By: tstellar Differential Revision: https://reviews.llvm.org/D137418
This is useful where tests previously encoded information in the name names of ranges and points. Currently, this is pretty limited because names consist of only alphanumeric characters and '_'. With this patch, we can keep the names simple and attach optional payloads to ranges and points instead. The new syntax should be fully backwards compatible (if I haven't missed anything). I tested this against clangd unit tests and everything still passes. Differential Revision: https://reviews.llvm.org/D137909
This patch adds bazel tests for llvm-libc. Some math tests rely on the `mpfr` library. This is controlled via the `--@llvm-project//libc:libc_math_mpfr` flag. It can take three values: - `external` (default) will build `mpfr` and `gmp` from source. - `system` will use the system installed `mpfr` library. - `disable` will skip tests relying on `mpfr`. Reviewed By: sivachandra, GMNGeoffrey Differential Revision: https://reviews.llvm.org/D119547
Instead of importing constant expressions recursively, the revision walks all dependencies of an LLVM constant iteratively. The actual conversion then iterates over a list of constants and all intermediate constant values are added to the value mapping. As a result, an LLVM IR constant maps to exactly one MLIR operation per function. The revision adapts the existing tests since the constant ordering changed for aggregate types. Additionally, it adds extra tests that mix aggregate constants and constant expressions. Depends on D137416 Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D137559
…truction classes The zip/uzp (2-vector) instruction classes have the incorrect register constraints and mark the destination as also being an input. However, the instructions are fully destructive so I've restructured the classes. Differential Revision: https://reviews.llvm.org/D138288
…ating a thin archive As seen in llvm/llvm-project#55023 when a thin archive is updated when not in the CWD, replacement does not work as expected. This change fixes the relative file path comparison so the correct files are updated. Differential Revision: https://reviews.llvm.org/D138218
…roadcast. Added dimensions can be both static and dinamic. Mapped dimension should be the same in the input and the init. Differential Revision: https://reviews.llvm.org/D138291
The backward register scavenger has correct register liveness information. PEI should leverage the backward register scavenger. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D137574
We don't modify them. Also use auto here since we already wrote the full type in the cast.
…st-link (intel#7471) When generating program metadata for `reqd_work_group_size` information sycl-post-link currently expects the existence of all three dimensions in the metadata. As a result the compile-time kernel property `work_group_size` needs to pad itself to adhere to this requirement. This commit relaxes the requirement by instead allowing fewer than three metadata operands in both `reqd_work_group_size` and `work_group_size_hint` when generating the program metadata. Additionally, padding will no longer be added when converting "sycl-work-group-size" and "sycl-work-group-size-hint" into the corresponding metadata. Signed-off-by: Larsen, Steffen <[email protected]>
Docs CI is currently failing due to an interaction between Python 3.10 and Sphinx 4.1.2. This commit bumps the Sphinx version used by docs CI to 4.2.0. Signed-off-by: Larsen, Steffen <[email protected]>
This change makes bfloat16 a supported feature. Signed-off-by: Rajiv Deodhar <[email protected]> Signed-off-by: JackAKirk <[email protected]>
ea9ac35 re-adds the llvm::decodeBase64 function, but on our branch it seems to have been lost in the merge 8b549f6. This change adds it back to fix build issues introduced by upstream changes that rely on this function. This should fix intel#7515.
…7512) The patch removes standalone splitter we had for `large-grf` and moves `large-grf` handling into per-aspect splitter. The change is intended to be non-functional: at most it may affect the order and names of modules produced by `sycl-post-link`, but not their content.
…ntel#7505) The headers use builtins for casting from generic pointers to other address spaces. However, of these definitions it is missing variants for non-const volatile. This commit adds these missing definitions. Signed-off-by: Larsen, Steffen <[email protected]>
bfloat16 was recently moved out of the experimental namespace but windows_build_test.cpp did not reflect this. This commit removes the use of the experimental namespace. Signed-off-by: Larsen, Steffen <[email protected]>
…ty (intel#7523) Adds a new queue property (hint) for setting its priority in backends that support it. Currently supported in Level Zero backend only and ignored for others. Test: intel/llvm-test-suite#1414 Signed-off-by: Sergey V Maslov <[email protected]>
Spec: #7202
Tests: intel/llvm-test-suite#1395