-
Notifications
You must be signed in to change notification settings - Fork 797
[SYCL] Support for arrays as kernel parameters #1387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: rdeodhar <[email protected]>
Signed-off-by: rdeodhar <[email protected]>
Implementation of piEventSetCallback with tests GlueEvent uses now the correct plugins The SYCL RT code for GlueEvent calls now the right plugin to create the event that triggers the dependency chain. Renamed variables to clarify the source code and avoid confusions between Context and Plugin Signed-off-by: Ruyman Reyes <[email protected]> Signed-off-by: Stuart Adams <[email protected]> Signed-off-by: Steffen Larsen <[email protected]>
Signed-off-by: Stuart Adams <[email protected]>
…1381) Signed-off-by: gejin <[email protected]>
NOTE: This flag is not exposed to the driver and not intended for users. It's added to make experiments and identify issues with optimizations. Signed-off-by: Alexey Bader <[email protected]>
By emitting the legacy variant of the LLVM IR alongside the newer representation of the attribute, backwards compatibility with any existing BE implementation is restored. A smooth transition period is thus achieved for the aforementiond BE - until it's able to consume the new LLVM IR, it has an option to simply ignore the unknown metadata. Signed-off-by: Artem Gindinson <[email protected]>
If found alloca command is not sub-buffer alloca, then it's parent alloca which has same context Signed-off-by: Ivan Karachun <[email protected]>
@romanovvlad As discussed previously via email, I want @rolandschulz to review the design/interface/IR output, so we're waiting on the submitter to provide the design documentation for this. Additionally, @elizabethandrews and @Fznamznon know this section of code better than me, so I'll delegate review to them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This patch is not so trivial.
Could you please describe the approach in commit message?
Could you please also add a CodeGen test?
@@ -0,0 +1,259 @@ | |||
// This test checks array kernel parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that this test checks some kernel execution. We don't put such tests in llvm/clang/test/* directories. Please put somewhere to llvm/sycl/test/ directory.
BTW, SYCL kernels can be run on different devices. Other tests in llvm/sycl/test/ directory has bunch of special RUN lines to run the test on particular device. Like // RUN: %GPU_RUN_PLACEHOLDER %t.out
here:
// RUN: %GPU_RUN_PLACEHOLDER %t.out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will write a design document to explain the overall approach to supporting array arguments of kernels. I assume attaching documents here is accepted practice?
I will also address comments posted here in my next attempt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, we appreciate design docs. You can put your doc somewhere in this folder https://github.com/intel/llvm/tree/sycl/sycl/doc (or extend some existing doc here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
// This test checks if compiler reports compilation error on an attempt to pass | ||
// a struct with non-trivially copyable type as SYCL kernel parameter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean array of non-trivially copyable objects, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, comment changed.
clang/lib/Sema/SemaSYCL.cpp
Outdated
@@ -1095,6 +1225,32 @@ static void populateIntHeader(SYCLIntegrationHeader &H, const StringRef Name, | |||
populateHeaderForWrappedAccessors(FldType, | |||
Offset + OffsetInWrapper); | |||
} | |||
} else if (FldType->isConstantArrayType()) { | |||
auto CAT = cast<ConstantArrayType>(FldType); | |||
auto ET = CAT->getElementType(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do not use auto when the type is not obvious from the context.
See https://llvm.org/docs/CodingStandards.html#id26
Please apply this comment to all uses of auto
in your patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
clang/lib/Sema/SemaSYCL.cpp
Outdated
if (Util::isSyclAccessorType(ET)) { | ||
size_t NumElements = | ||
static_cast<size_t>(CAT->getSize().getZExtValue()); | ||
DeclAccessPair FieldDAP = DeclAccessPair::make(Field, AS_none); | ||
auto Lhs = MemberExpr::Create( | ||
S.Context, KernelObjCloneRef, false, Loc, | ||
NestedNameSpecifierLoc(), Loc, Field, FieldDAP, | ||
DeclarationNameInfo(Field->getDeclName(), Loc), nullptr, | ||
Field->getType(), VK_LValue, OK_Ordinary, NOUR_None); | ||
// Treat each element of accessor array as a separate item | ||
// Call its __init function | ||
for (size_t I = 0; I < NumElements; ++I) { | ||
ExprResult IndexExpr = S.ActOnIntegerConstant(Loc, I); | ||
auto MA = S.CreateBuiltinArraySubscriptExpr(Lhs, Loc, | ||
IndexExpr.get(), Loc); | ||
getExprForSpecialSYCLObj(ET, nullptr, MA.get(), | ||
ET->getAsCXXRecordDecl(), nullptr, | ||
InitMethodName, BodyStmts); | ||
KernelFuncParam++; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Déjà vu...
The new code in getExprForWrappedAccessorInit
lambda looks almost the same. Can we create a function for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two blocks of code are superficially similar. The check for Accessor type and then iterating over the array elements is common. However, the actions in the two cases are very different. It is not practical to make a callable function out of these two blocks of code.
clang/lib/Sema/SemaSYCL.cpp
Outdated
} | ||
} | ||
} else if (FieldType->isConstantArrayType()) { | ||
auto Loc = Field->getLocation(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really get why you are applying source location of Kernel Object field to newly generated AST nodes, I think it can be tricky. Let's apply just empty source location since these AST nodes do not represent real code from source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. This was done based on a previous reviewers suggestion. I have reverted to empty source location to conform to the approach in other cases in this function.
}; | ||
RecordDecl *NewClass; | ||
FieldDecl *Field; | ||
wrapAnArray(ArgTy, NewClass, Field); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did I get it right that to pass array of something, we wrap it by some struct (which seems to be generated by the compiler)? i.e. resulting kernel for each array field of kernel object has some wrapped_array
struct parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, a design document will be posted.
clang/lib/Sema/SemaSYCL.cpp
Outdated
auto CAT = cast<ConstantArrayType>(ArgTy); | ||
auto ET = CAT->getElementType(); | ||
if (Util::isSyclAccessorType(ET)) { | ||
size_t NumElements = static_cast<size_t>(CAT->getSize().getZExtValue()); | ||
const auto *Accessor = ET->getAsCXXRecordDecl(); | ||
ASTContext &AccessorCtx = Accessor->getASTContext(); | ||
const ASTRecordLayout &AccessorLayout = | ||
AccessorCtx.getASTRecordLayout(Accessor); | ||
auto AccessorSize = AccessorCtx.toBits(AccessorLayout.getSize()) / 8; | ||
// Start with offset of array, i.e., offset of first element of array | ||
uint64_t OffsetInArray = Offset; | ||
// Treat each element of accessor array as a separate item | ||
// Create a parameter descriptor for it | ||
for (size_t I = 0; I < NumElements; ++I) { | ||
populateHeaderForAccessor(ET, OffsetInArray); | ||
OffsetInArray += AccessorSize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After lines 1231-1253 it's déjà vu again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, there was somewhat more similarity in the actions done in the two cases, and this code has been factored out into a separate function, called twice.
…1344) Signed-off-by: Michael Kinsner <[email protected]>
From what I undestand you've handled 3 types of kernel parameters here -
For 1. and 3., you added each element as a separate parameter in generated OpenCL Kernel and called init functions accordingly. For 2. You wrapped the array inside a new class 'wrapped _array' and added this as the parameter. Is this a gist of what you did? Am I missing anything? It would be helpful having this information in the commit message. |
Please add a Sema test showing AST generated. |
Signed-off-by: Alexey Sachkov <[email protected]>
Enable -fdeclare-spirv-builtins for SYCL device compilation mode For device compilation, SPIR-V builtins are now looked up by the device compiler. They now longer need to be forward declared. [SYCL-PTX] Revert manual mangling of some SPIR-V builtins [SYCL-PTX] Add fmod builtin [SYCL-PTX] Update Atomic mangling Signed-off-by: Victor Lomuller <[email protected]>
…<dir> (#1346) When using /Fo<dir> the improper dependency file name was generated, causing the bundle step to not be able to locate the dependency file when compiling to object Signed-off-by: Michael D Toguchi <[email protected]>
This patch introduces the following loop attributes: - loop_coalesce: Indicates that the loop nest should be coalesced into a single loop without affecting functionality - speculated_iterations: Specifies the number of concurrent speculated iterations that will be in flight for a loop invocation - disable_loop_pipelining: Disables pipelining of the loop data path, causing the loop to be executed serially - max_interleaving: Places a maximum limit N on the number of interleaved invocations of an inner loop by an outer loop Signed-off-by: Viktoria Maksimova <[email protected]>
Fixed the buffer constructor called with a pair of iterators. The current implementation has a problem due to ambiguous spec. The buffer should never write back data unless there is a call to set_final_data(), but the current implementation does it. I corrected the spec in KhronosGroup/SYCL-Docs#76. So, now we can change the buffer implementation according to the clarified spec. The test case buffer.cpp also needed change because of this change. The user should not expect the automatic write-back of data upon destruction of buffer. Signed-off-by: Byoungro So <[email protected]> Co-authored-by: Ronan Keryell <[email protected]>
A simple library which allows to construct and serialize/deserialize a sequence of typed property sets, where each property is a <name,typed value> pair. To be used in offload tools. Signed-off-by: Konstantin S Bobrovsky <[email protected]>
The library allows to create, serialize/deserialize tables of strings, insert/delete/replace/rename columns, add rows. To be used in offload tools. Signed-off-by: Konstantin S Bobrovsky <[email protected]>
Signed-off-by: rdeodhar <[email protected]>
Yes, that is the gist of the changes. I have uploaded a document that describes this change.
From: elizabethandrews <[email protected]>
Sent: Wednesday, March 25, 2020 12:28 PM
To: intel/llvm <[email protected]>
Cc: Deodhar, Rajiv <[email protected]>; Author <[email protected]>
Subject: Re: [intel/llvm] [SYCL] Support for arrays as kernel parameters (#1387)
From what I undestand you've handled 3 types of kernel parameters here -
1. Accessor arrays
2. Arrays of other types
3. Class type with an accessor array as a member.
For 1. and 3., you added each element as a separate parameter in generated OpenCL Kernel and called init functions accordingly.
For 2. You wrapped the array inside a new class 'wrapped _array' and added this as the parameter.
Is this a gist of what you did? Am I missing anything? It would be helpful having this information in the commit message.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#1387 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AFRFAQ4CTSCRSITU25ZWDJ3RJJLL3ANCNFSM4LTADDZA>.
|
Hmm.. I cannot help but wonder what happens when multiple arrays are passed? Are they in separate types? Are those types necessary? I think LLVM-IR supports array-level parameters (despite C/C++ not ), so I wonder if we can do away with the type itself to avoid the obvious trouble that comes with passing it through a struct. |
Design document attached. |
This reverts commit d357add. Signed-off-by: Vladimir Lazarev <[email protected]>
Signed-off-by: Alexander Batashev <[email protected]>
…1359) Signed-off-by: Konstantin S Bobrovsky <[email protected]>
…for (#1348) The kernel callable being invoked from an nd_range parallel_for is accepting an id argument, while it should be nd_item. After my analysis, I found we check arguments' type for kernel_parallel_for instead of parallel_for. But that check is useless, because the compiler can still find a candidate for kernel_parallel_for with nd_range and id which is a wrong combination. In my solution, parallel_for with nd_range calls kernel_parallel_for_nd_range(...) which is only available for nd_item. Signed-off-by: Bing1 Yu <[email protected]>
Implements a few code simplification/unification for LowerWGScope. Signed-off-by: Victor Lomuller <[email protected]>
) For NVPTX target address space inference for kernel arguments and allocas is happening in the backend (NVPTXLowerArgs and NVPTXLowerAlloca passes). After frontend these pointers are in LLVM default address space 0 which is the generic address space for NVPTX target. Perform address space cast of a pointer to the shadow global variable from the local to the generic address space before replacing all usages of a byval argument. Signed-off-by: Artur Gainullin <[email protected]>
- Adds static members to sub_group class. - sub_group member functions marked deprecated, to be removed later. - SPIR-V helpers expanded to convert SYCL group to SPIR-V scope. - Add workaround for half types Signed-off-by: John Pennycook <[email protected]>
Whereas it is not possible to generate vector of bools in FE, we have to change return type for corresponding instructions in SPIRV translator to vector of bools. SPIRV translator already did this for some instructions, this patch extends this behaviour to handle more instructions.
Adding doxygen documentation to PI CUDA backend. Some code is re-ordered in the file to help sorting the doxygen. Co-Authored-By: Alexey Bader <[email protected]> Co-Authored-By: Alexander Batashev <[email protected]> Co-Authored-By: Romanov Vlad <[email protected]> Signed-off-by: Ruyman Reyes <[email protected]>
Based on https://github.com/codeplaysoftware/standards-proposals/blob/master/spec-constant/index.md * [SYCL] PI changes: 1. Add specialization constant API to the SYCL RT Plugin Interface. New PI API added: pi_result piProgramSetSpecializationConstant(pi_program prog, pi_uint32 spec_id, size_t spec_size, const void *spec_value); 2. Add property set fields to the binary image descriptor, bump PI version. This change breaks backward binary compatibility of device binary image descriptors. 3. Add convenience C++ wrappers for PI binary image hierarchy objects. * [SYCL] Support device binary properties and file tables in the offload wrapper. 1. New option - "-properties=<file>". <file> must be a property set registry file, as defined by llvm/Support/PropertySetIO.h. The wrapper will add the property sets to the binary image descriptor and the them available to the runtime. 2. New options - "-batch". With this option the only input can be a file table, as defined by llvm/Support/SimpleTable.h. Column names are a part of interface between this tool and the sycl-post-link, which produces the file table. 3. Binary image descriptor LLVM type updated to resemble changes in Plugin Interface v1.2. * [SYCL] Specialization constants support in the Front End. 1. Detect kernel lambda object captures corresponding to specialization constants and (a) don't create kernel arguments for them (b) generate specializations of the SpecConstantInfo structure into the integration header. 2. Recognize the __unique_stable_name intrinsic and replace it with a string literal uniquely identifying the type of the typename template parameter to this intrinsic. 3. FE-related changes in the runtime: - new SpecConstantInfo templated struct for type->name translation for specialization constants used by integration header - define the __sycl_fe_getStableUniqueTypeName intrinsic * [SYCL] Add specialization constant support in SYCL runtime. 1. Define SYCL API (sycl/include/CL/sycl/experimental/spec_constant.hpp) 2. Add convenience C++ wrappers for PI device binary structures and refactor runtime to use the wrappers. Get rid of custom deleters for binary images. 3. Implement SYCL spec constant APIs in program an program manager. * [SYCL] Use file-table-tform in SYCL offload processing in clang driver. Clang driver's design can't handily model (1) multiple inputs/outputs in the action graph. Because of that, for example, sycl-post-link tool is invoked twice - once to to split the code and produce multiple bitcode files, and secondly - to generate symbol files for the split modules. (2) "Clusters" of inputs/outputs, when subsets of inputs/outputs are associated and describe different aspects of the same data. Example of such clustering is the split module + its symbol file above. Clustering would require support both in the driver and the tools invoked in response to actions. This commit moves SYCL offload processing to the "file table concept." sycl-post-link instead of (1) being invoked n times, once per each output type requested (once for device split and once for symbol file generation) (2) outputting multiple file lists each listing outputs from the corresponding invocation above is now invoked once and produces single file table output. E.g. [Code|Symbols|Properties] a_0.bc|a_0.sym|a_0.props a_1.bc|a_1.sym|a_1.props This solves both problems - multiple input/output and clustering. Combined with the file-table-tform tool, this allows for efficent handling of multiple clusters of files (each represented as a row in the table file) in the clang driver infrastructure. For example, there is a real offload processing problem: step1. sycl-post-link outputs N clusters of files step2. "Code" file of each cluster resuilting from step1 ({a_0.bc, a_1.bc} in the example above) must undergo further transformations - translation to SPIRV and optional ahead-of-time compilation. step3. In each cluster resulting from step1 the "Code" file needs to be replaced with the result of step2 step4. All the clusters are processed by the ClangOffloadWrapper tool, which needs to know how files are distributed into clusters and what is the roles of each file in a cluster - whether it is "Code", "Symbol" or "Properties". To solve this, the following action graph is constructed in the clang driver: column:"Code" t1 -> [file-table-tform:extract column] -> t1a -> [for-each:] -> t1b llvm-spirv aot-comp t1 \ column:"Code" [file-table-tform:replace column] -> t2 -> [ClangOffloadWrapper] / t1b where t1b is ["Code"] and t2 is [Code|Symbols|Properties] a_0.bin a_0.bin|a_0.sym|a_0.props a_1.bin a_1.bin|a_1.sym|a_1.props Note that the graph does not change with growing number of clusters, neither it changes when more files are added to each cluster (e.g. a "Manifest" file). * [SYCL] Process specialization constants in sycl-post-link tool. Add a spec constant lowering pass to sycl-post-link tool. Support file table output format. * [SYCL] Temporarily disable spec_const_hw.cpp on CPU. CPU OpenCL Runtime on build machines is not updated yet. Signed-off-by: Konstantin S Bobrovsky <[email protected]>
Signed-off-by: rdeodhar <[email protected]>
Signed-off-by: rdeodhar <[email protected]>
Seems to be moved to #1423 |
Signed-off-by: rdeodhar [email protected]