From b4a90d31e9cb11724157cfdc3261341a83c7051d Mon Sep 17 00:00:00 2001 From: Alexey Sachkov Date: Mon, 28 Mar 2022 22:26:35 +0300 Subject: [PATCH 01/16] Initial version of the doc --- .../MappingHostAddressesToDeviceEntities.md | 240 ++++++++++++++++++ 1 file changed, 240 insertions(+) create mode 100644 sycl/doc/design/MappingHostAddressesToDeviceEntities.md diff --git a/sycl/doc/design/MappingHostAddressesToDeviceEntities.md b/sycl/doc/design/MappingHostAddressesToDeviceEntities.md new file mode 100644 index 0000000000000..b62b4004b4d11 --- /dev/null +++ b/sycl/doc/design/MappingHostAddressesToDeviceEntities.md @@ -0,0 +1,240 @@ +# Mapping host variables to compiler-generated info + +[SYCL 2020][sycl-2020-spec] specification and some extensions such as +[SYCL_INTEL_device_global][device-global-ext-spec] imply that implementation is +capable to somehow map addresses of a host objects to their counterparts in +device programs. + +For example, in order to implement specialization constants on top of SPIR-V, we +need to be able to map addresses of `specialization_id` variables into numeric +IDs of corresponding specialization constants at SPIR-V level. + +Another example is device global [implementation][device-global-design], where +in order to communicate a value of `device_global` variable between host and +device we need to map its host address to a symbolic name/identifier and some +other info, which is used at PI layer and below. + +This design document describes a generic way how to map address of any SYCL +object defined in a namespace scope to its unique symbolic ID. Please note that +this document doesn't try to map the address to something other than a unique +symbolic ID: other required information is usually generated by the device +compiler and communicated to the runtime by device image properties. Unique +symbolic ID which can be obtained from mapping mechanism described in this +design document could be used as a key in those properties to propagate +additional information using existing mechanisms. + +So, overall the picture looks like: +- device compiler generates property set/s which provide mapping + "unique symbolic ID" -> "various information required by DPC++ RT" +- device or host compiler generates mapping + "address of a host variable" -> "unique symbolic ID" (as described below by + this document) +- DPC++ RT uses these to mappings to obtain required information and somehow + uses it + +This design document describes two approaches of how the +"address of a host variable" -> "unique symbolic ID" mapping can be generated: +the first one with integration footer and another one with modification of the +host compiler. + +Both approaches have their pros and cons and they are expected to be implemented +and exists in the implementation at the same time, but only one of them will be +used at a time depending on whether 3rd-party host compiler is used or not. + +Integration footer can be used with 3rd-party host compilers, but it requires +appending to a translation unit provided by user, which could affect debug +information: since there are no compilers that support appending a file at the +end (similar to `-include`), appending is done by generating a temporary input +file using concatenation of the original input and integration footer. + +Such replacement of the main translation unit causes the following issues: +- debug information about the source file might be incorrect, leading to + problems with gdb `l` command and code coverage tools +- checksum of host and device source files becomes different which causes device + code debugging to be completely broken in some environments (such as MS Visual + Studio, for example) + +Customizing host compiler allows to avoid issues with debuggers and code +coverage tools, but that is not an option if user wants to compile host part +of an app with a 3rd-party host compiler. + +Further sections describe the implementation design of both approaches in more +details, note that there are few components which should be modified regardless +of which approach is in use. + +## Common front-end part + +DPC++ FE should support the following attribute: +`[[__sycl_detail__::uniquely_identifiable_object(kind)]]`. This attribute accepts +a string literal and should be applied to types (like `device_global` or +`specialization_id`). + +Presence of the attribute instructs the compiler to perform the following +things: +- emit `sycl-unique-id` LLVM IR attribute on each definition of a variable of + type marked with `[[__sycl_detail__::uniquely_identifiable_object(kind)]]` + attribute. `sycl-unique-id` LLVM IR attribute should be accompanied by a + unique string identifier of a variable it is attached to. The rules for + creating this string are the same as for `__builtin_sycl_unique_stable_id` and + the same algorithm can be used when generating the string for the attribute +- emit `sycl-uid-type` LLVM IR attribute alongside `sycl-unique-id`, which + contains the `kind` string passed to + `[[__sycl_detail__::uniquely_identifiable_object(kind)]]` attribute + +**TODO**: we have `[[__sycl_detail__::device_global]]` attribute documented in +[device global design doc][device-global-design], which instructs front-end to +emit some additional semantic checking. Shall we leave it in place or that +request for semantic checking should also be documented by +`[[__sycl_detail__::uniquely_identifiable_object(kind)]]` attribute when `kind` +is set to a certain value? + +**TODO**: alternatively, we could completely re-use existing +`[[__sycl_detail__::device_global]]` attribute and introduce another one for +specialization constants, i.e. it is a question of whether or not we want to +generalize unique IDs generation in form of a generic attribute or not. + +When DPC++ compiler is used as both host and device compiler, then the attribute +should be respected by both host and device compiler passes and LLVM IR +attributes should appear in LLVM IR for both host and device code. When DPC++ +compiler is only used as a device compiler, then we don't expect the attribute +to be handled on host, apparently. + +Another thing we need from DPC++ FE compiler is to define a special macro, which +will allow to distinguish it from other compilers. That is needed to apply the +aforementioned attribute conditionally to avoid spamming users with warnings +about unknown attributes. + +The suggested macro name is `__INTEL_SYCL_HOST_COMPILER__`. It should be defined +when the compiler is invoked in SYCL host mode (`-fsycl-is-host` `-cc1` flag). + +## Common headers part + +Header files should be modified by adding the new attributes to types +declarations, objects of which we will need in our mapping.Again, +`device_global` and `specialization_id` are examples here: + +``` +template +class +#if defined(__SYCL_DEVICE_ONLY__) || defined(__INTEL_SYCL_HOST_COMPILER__) + [[__sycl_detail__::uniquely_identifiable_object("specialization_id")]] +#endif +specialization_id { +// ... +}; +``` + +## Common runtime part + +The runtime should implement the following function, which will be called from +a code generated by the compiler (see the next section): + +``` +void __register_uniquely_identifiable_object( + void *Address, const char* UniqueID, const char *Kind); +``` + +The function accepts the following arguments: +- `Address` is an address of a variable, which exists in an application on host +- `UniqueID` is a unique symbolic ID, which corresponds to that variable +- `Kind` is a string which corresponds to `kind` argument passed to + `[[__sycl_detail__::uniquely_identifiable_object(kind)]]` attribute attached + to the type of the variable identified by `Address`. It can be used to + distinguish different entities like `specialization_id` and `device_global`: + for example they could be stored in different maps to speed up certain + operations with them. + +The compiler guarantees that the function will be called zero or more times +(depending on the amount of uniquely identifiable objects found in a program) +_before_ application's `main()` function, i.e. in a global constructor. + +That poses some restrictions on those uniquely identifiable object, i.e. that +they can't be used from another global object due to risk of accessing a +non-initialized object, but that is an UB anyway because the order of global +objects initialization is not defined in C++ when those objects are defined in +different translation unit. + +## Compiler driver part + +The compiler driver is the component which is responsible for selecting the +approach we are taking and the decision is made based on whether or not +3rd-party host compiler is in use. + +If `-fsycl-host-compiler` option is present, the compiler driver chooses the +integration footer approach: +- it supplies device compilation step with `-fsycl-int-footer` option to + instruct device compiler to emit integration footer +- it appends the integration footer to user-provided translation unit before + passing it to a host compiler + +Otherwise, if `-fsycl-host-compiler` is not present, then the compiler driver +chooses another approach by simply doing nothing related to integration footer: +- `-fsycl-int-footer` is **not** passed to device compiler +- user-provided translation unit is passes as-is to host compiler + +## Integration footer approach + +When this approach is used, not only extra file (integration footer) is +generated, but integration header is also modified: FE compiler generates a +definition of a namespace scope variable of type +`__sycl_device_global_registration` whose sole purpose it to run its constructor +before the application's `main()` function: + +``` +namespace sycl::detail { +namespace { + +class __sycl_device_global_registration { + public: + __sycl_device_global_registration() noexcept; +}; +__sycl_device_global_registration __sycl_device_global_registrar; + +} // namespace (unnamed) +} // namespace sycl::detail +``` + +The integration footer generated by the compiler contains the definition of the +constructor, which calls a function in the DPC++ runtime, which registers +needed mappings: + +``` +namespace sycl::detail { +namespace { + +__sycl_device_global_registration::__sycl_device_global_registration() noexcept { + __register_uniquely_identifiable_object( + &::Foo, + /* same string returned from __builtin_sycl_unique_stable_id(::Foo) */, + "specialization_id"); + __register_uniquely_identifiable_object( + &::inner::Bar, + /* same string returned from __builtin_sycl_unique_stable_id(::inner::Bar) */, + "device_global"); +} + +} // namespace (unnamed) +} // namespace sycl::detail +``` + +## Custom host compiler approach + +With this approach, we simply schedule a one more pass in the optimization +pipeline, which should be executed regardless of the optimization level, because +it is required for proper functioning of some features. + +The pass does similar thing to integration footer: it emits a global constructor +which in turn calls `__register_uniquely_identifiable_object` to provide the +runtime with required mapping information. + +Unlike with integration footer approach no separate file is being generated, +which preserves all source files mapping and checksums to be in place and +correct. + +Generated constructor function should have internal linkage to avoid possible +names clashes and multiple definition errors later at link stage. + +Generated constructor contains a call to +`__register_uniquely_identifiable_object` for each global variable which has +`sycl-unique-id` and `sycl-uid-kind` attributes, passing values of those +attributes into the corresponding arguments of the function. From e6938d89f28538143c51ba3472c9185ead49f953 Mon Sep 17 00:00:00 2001 From: Alexey Sachkov Date: Tue, 29 Mar 2022 22:23:20 +0300 Subject: [PATCH 02/16] Apply comments. Add a note about host compiler identification macro --- .../MappingHostAddressesToDeviceEntities.md | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/sycl/doc/design/MappingHostAddressesToDeviceEntities.md b/sycl/doc/design/MappingHostAddressesToDeviceEntities.md index b62b4004b4d11..eec2b535f2796 100644 --- a/sycl/doc/design/MappingHostAddressesToDeviceEntities.md +++ b/sycl/doc/design/MappingHostAddressesToDeviceEntities.md @@ -38,7 +38,7 @@ the first one with integration footer and another one with modification of the host compiler. Both approaches have their pros and cons and they are expected to be implemented -and exists in the implementation at the same time, but only one of them will be +and exist in the implementation at the same time, but only one of them will be used at a time depending on whether 3rd-party host compiler is used or not. Integration footer can be used with 3rd-party host compilers, but it requires @@ -77,7 +77,7 @@ things: unique string identifier of a variable it is attached to. The rules for creating this string are the same as for `__builtin_sycl_unique_stable_id` and the same algorithm can be used when generating the string for the attribute -- emit `sycl-uid-type` LLVM IR attribute alongside `sycl-unique-id`, which +- emit `sycl-uid-kind` LLVM IR attribute alongside `sycl-unique-id`, which contains the `kind` string passed to `[[__sycl_detail__::uniquely_identifiable_object(kind)]]` attribute @@ -99,10 +99,14 @@ attributes should appear in LLVM IR for both host and device code. When DPC++ compiler is only used as a device compiler, then we don't expect the attribute to be handled on host, apparently. -Another thing we need from DPC++ FE compiler is to define a special macro, which -will allow to distinguish it from other compilers. That is needed to apply the -aforementioned attribute conditionally to avoid spamming users with warnings -about unknown attributes. +Another thing we need from DPC++ FE host compiler is to define a special macro, +which will allow to distinguish it from other host compilers. That is needed to +apply the aforementioned attribute conditionally to avoid spamming users with +warnings about unknown attributes. + +**NOTE:** Alternatively we could simply set a macro which tells us whether or +not integration footer is enabled in the compiler driver instead of creating +a special macro for differentiating our own host compiler. The suggested macro name is `__INTEL_SYCL_HOST_COMPILER__`. It should be defined when the compiler is invoked in SYCL host mode (`-fsycl-is-host` `-cc1` flag). From 323579e3af450b4faa445fa1cc964a85f046e0a2 Mon Sep 17 00:00:00 2001 From: Alexey Sachkov Date: Tue, 29 Mar 2022 22:26:29 +0300 Subject: [PATCH 03/16] Make more strict gurantees about generated global constructor --- sycl/doc/design/MappingHostAddressesToDeviceEntities.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/sycl/doc/design/MappingHostAddressesToDeviceEntities.md b/sycl/doc/design/MappingHostAddressesToDeviceEntities.md index eec2b535f2796..040f4e0374f78 100644 --- a/sycl/doc/design/MappingHostAddressesToDeviceEntities.md +++ b/sycl/doc/design/MappingHostAddressesToDeviceEntities.md @@ -150,7 +150,10 @@ The function accepts the following arguments: The compiler guarantees that the function will be called zero or more times (depending on the amount of uniquely identifiable objects found in a program) -_before_ application's `main()` function, i.e. in a global constructor. +_before_ application's `main()` function and _before_ any other global +constructor defined in the same translation unit: this is needed to allow usages +of `specialization_id` and `device_global` variables from user-defined global +constructors. That poses some restrictions on those uniquely identifiable object, i.e. that they can't be used from another global object due to risk of accessing a From 7373409861e21120eeff83af5610ea705b52dc97 Mon Sep 17 00:00:00 2001 From: Alexey Sachkov Date: Tue, 29 Mar 2022 23:08:20 +0300 Subject: [PATCH 04/16] Add a section about shadowed variables and integration footer --- .../MappingHostAddressesToDeviceEntities.md | 76 ++++++++++++++++++- 1 file changed, 75 insertions(+), 1 deletion(-) diff --git a/sycl/doc/design/MappingHostAddressesToDeviceEntities.md b/sycl/doc/design/MappingHostAddressesToDeviceEntities.md index 040f4e0374f78..18ccf5bb75831 100644 --- a/sycl/doc/design/MappingHostAddressesToDeviceEntities.md +++ b/sycl/doc/design/MappingHostAddressesToDeviceEntities.md @@ -201,6 +201,19 @@ __sycl_device_global_registration __sycl_device_global_registrar; } // namespace sycl::detail ``` +Examples below are written for the following code snippet: + +``` +#include + +static sycl::device_global Foo; +namespace inner { + sycl::device_global Bar; +} // namespace inner + +// ... +``` + The integration footer generated by the compiler contains the definition of the constructor, which calls a function in the DPC++ runtime, which registers needed mappings: @@ -213,7 +226,7 @@ __sycl_device_global_registration::__sycl_device_global_registration() noexcept __register_uniquely_identifiable_object( &::Foo, /* same string returned from __builtin_sycl_unique_stable_id(::Foo) */, - "specialization_id"); + "device_global"); __register_uniquely_identifiable_object( &::inner::Bar, /* same string returned from __builtin_sycl_unique_stable_id(::inner::Bar) */, @@ -224,6 +237,67 @@ __sycl_device_global_registration::__sycl_device_global_registration() noexcept } // namespace sycl::detail ``` +### Handling shadowed variables + +The example above shows a simple case where the user's device global variables +can all be uniquely referenced via fully qualified lookup (e.g. +`::inner::Bar`). However, it is possible for users to construct applications +where this is not the case, for example: + +``` +sycl::device_global FuBar; +namespace { + sycl::device_global FuBar; +} +``` + +In this example, the `FuBar` variable in the global namespace shadows a +variable with the same name in the unnamed namespace. The integration footer +can reference the variable in the global namespace as `::FuBar`, but there is +no way to reference the variable in the unnamed namespace using fully qualified +lookup. + +Such programs are still legal, though. The integration footer can support +cases like this by defining a shim function that returns a reference to the +shadowed device global: + +``` +namespace { +namespace __sycl_detail { + +static constexpr decltype(FuBar) &__shim_1() { + return FuBar; // References 'FuBar' in the unnamed namespace +} + +} // namespace __sycl_detail +} // namespace (unnamed) + +namespace sycl::detail { + +__sycl_device_global_registration::__sycl_device_global_registration() noexcept { + __register_uniquely_identifiable_object( + &::FuBar, + /* same string returned from __builtin_sycl_unique_stable_id(::FuBar) */, + "device_global"); + __register_uniquely_identifiable_object( + &::__sycl_detail::__shim_1(), + /* same string returned from __builtin_sycl_unique_stable_id(::(unnamed)::FuBar) */, + "device_global"); +} + +} // namespace sycl::detail +``` + +The `__shim_1()` function is defined in the same namespace as the second +`FuBar` device global, so it can reference the variable through unqualified +name lookup. Furthermore, the name of the shim function is globally unique, so +it is guaranteed not to be shadowed by any other name in the translation unit. +This problem with variable shadowing is also a problem for the integration +footer we use for specialization constants. See the [specialization constant +design document][5] for more details on this topic. + +[5]: + ## Custom host compiler approach With this approach, we simply schedule a one more pass in the optimization From 6c7a9be71e57c2855e87e18de145a833629dc060 Mon Sep 17 00:00:00 2001 From: Alexey Sachkov Date: Wed, 30 Mar 2022 15:44:37 +0300 Subject: [PATCH 05/16] Fix a bunch of typos --- .../MappingHostAddressesToDeviceEntities.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/sycl/doc/design/MappingHostAddressesToDeviceEntities.md b/sycl/doc/design/MappingHostAddressesToDeviceEntities.md index 18ccf5bb75831..ab5f870593321 100644 --- a/sycl/doc/design/MappingHostAddressesToDeviceEntities.md +++ b/sycl/doc/design/MappingHostAddressesToDeviceEntities.md @@ -29,7 +29,7 @@ So, overall the picture looks like: - device or host compiler generates mapping "address of a host variable" -> "unique symbolic ID" (as described below by this document) -- DPC++ RT uses these to mappings to obtain required information and somehow +- DPC++ RT uses these two mappings to obtain required information and somehow uses it This design document describes two approaches of how the @@ -42,7 +42,7 @@ and exist in the implementation at the same time, but only one of them will be used at a time depending on whether 3rd-party host compiler is used or not. Integration footer can be used with 3rd-party host compilers, but it requires -appending to a translation unit provided by user, which could affect debug +appending to a translation unit provided by a user, which could affect debug information: since there are no compilers that support appending a file at the end (similar to `-include`), appending is done by generating a temporary input file using concatenation of the original input and integration footer. @@ -55,7 +55,7 @@ Such replacement of the main translation unit causes the following issues: Studio, for example) Customizing host compiler allows to avoid issues with debuggers and code -coverage tools, but that is not an option if user wants to compile host part +coverage tools, but that is not an option if a user wants to compile host part of an app with a 3rd-party host compiler. Further sections describe the implementation design of both approaches in more @@ -114,7 +114,7 @@ when the compiler is invoked in SYCL host mode (`-fsycl-is-host` `-cc1` flag). ## Common headers part Header files should be modified by adding the new attributes to types -declarations, objects of which we will need in our mapping.Again, +declarations, objects of which we will need in our mapping. Again, `device_global` and `specialization_id` are examples here: ``` @@ -155,11 +155,11 @@ constructor defined in the same translation unit: this is needed to allow usages of `specialization_id` and `device_global` variables from user-defined global constructors. -That poses some restrictions on those uniquely identifiable object, i.e. that +That poses some restrictions on those uniquely identifiable objects, i.e. that they can't be used from another global object due to risk of accessing a non-initialized object, but that is an UB anyway because the order of global objects initialization is not defined in C++ when those objects are defined in -different translation unit. +a different translation unit. ## Compiler driver part @@ -170,7 +170,7 @@ approach we are taking and the decision is made based on whether or not If `-fsycl-host-compiler` option is present, the compiler driver chooses the integration footer approach: - it supplies device compilation step with `-fsycl-int-footer` option to - instruct device compiler to emit integration footer + instruct device compiler to emit an integration footer - it appends the integration footer to user-provided translation unit before passing it to a host compiler @@ -184,7 +184,7 @@ chooses another approach by simply doing nothing related to integration footer: When this approach is used, not only extra file (integration footer) is generated, but integration header is also modified: FE compiler generates a definition of a namespace scope variable of type -`__sycl_device_global_registration` whose sole purpose it to run its constructor +`__sycl_device_global_registration` whose sole purpose is to run its constructor before the application's `main()` function: ``` From 9758b957223639b5eb797d834ec75eb5d55f4a32 Mon Sep 17 00:00:00 2001 From: Alexey Sachkov Date: Wed, 30 Mar 2022 16:32:24 +0300 Subject: [PATCH 06/16] Add a section about shadowed variables and host compiler approach --- .../MappingHostAddressesToDeviceEntities.md | 23 +++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/sycl/doc/design/MappingHostAddressesToDeviceEntities.md b/sycl/doc/design/MappingHostAddressesToDeviceEntities.md index ab5f870593321..f2bee7e2df78e 100644 --- a/sycl/doc/design/MappingHostAddressesToDeviceEntities.md +++ b/sycl/doc/design/MappingHostAddressesToDeviceEntities.md @@ -319,3 +319,26 @@ Generated constructor contains a call to `__register_uniquely_identifiable_object` for each global variable which has `sycl-unique-id` and `sycl-uid-kind` attributes, passing values of those attributes into the corresponding arguments of the function. + +### Handling shadowed variables + +Unlike with the integration footer the problem with shadowed variables doesn't +really exists with the custom host compiler approach, because it is compiler +responsibility to uniquely identify shadowed variables at LLVM IR level and we +are simply re-using what is already there. + +For example, for the following code snippet: + +``` +sycl::device_global FuBar; +namespace { + sycl::device_global FuBar; +} +``` + +The following IR is generated by our host compiler: + +``` +@FuBar = dso_local global %"class.cl::sycl::ext::oneapi::device_global" zeroinitializer, align 8 +@_ZN12_GLOBAL__N_15FuBarE = internal global %"class.cl::sycl::ext::oneapi::device_global" zeroinitializer, align 8 +``` From 5a1c630fb2dd17aeabef632ed4d56ae74bd00420 Mon Sep 17 00:00:00 2001 From: Alexey Sachkov Date: Thu, 31 Mar 2022 18:14:10 +0300 Subject: [PATCH 07/16] Apply comments --- .../MappingHostAddressesToDeviceEntities.md | 61 +++++++++---------- 1 file changed, 30 insertions(+), 31 deletions(-) diff --git a/sycl/doc/design/MappingHostAddressesToDeviceEntities.md b/sycl/doc/design/MappingHostAddressesToDeviceEntities.md index f2bee7e2df78e..b12c25be31505 100644 --- a/sycl/doc/design/MappingHostAddressesToDeviceEntities.md +++ b/sycl/doc/design/MappingHostAddressesToDeviceEntities.md @@ -1,12 +1,12 @@ # Mapping host variables to compiler-generated info [SYCL 2020][sycl-2020-spec] specification and some extensions such as -[SYCL_INTEL_device_global][device-global-ext-spec] imply that implementation is -capable to somehow map addresses of a host objects to their counterparts in -device programs. +[SYCL_INTEL_device_global][device-global-ext-spec] imply that the implementation +has the capability to somehow map addresses of a host objects to their +counterparts in device programs. For example, in order to implement specialization constants on top of SPIR-V, we -need to be able to map addresses of `specialization_id` variables into numeric +need to be able to map addresses of `specialization_id` variables to numeric IDs of corresponding specialization constants at SPIR-V level. Another example is device global [implementation][device-global-design], where @@ -14,7 +14,7 @@ in order to communicate a value of `device_global` variable between host and device we need to map its host address to a symbolic name/identifier and some other info, which is used at PI layer and below. -This design document describes a generic way how to map address of any SYCL +This design document describes a generic way how to map the address of any SYCL object defined in a namespace scope to its unique symbolic ID. Please note that this document doesn't try to map the address to something other than a unique symbolic ID: other required information is usually generated by the device @@ -23,29 +23,28 @@ symbolic ID which can be obtained from mapping mechanism described in this design document could be used as a key in those properties to propagate additional information using existing mechanisms. -So, overall the picture looks like: +So, the overall process is: - device compiler generates property set/s which provide mapping "unique symbolic ID" -> "various information required by DPC++ RT" - device or host compiler generates mapping "address of a host variable" -> "unique symbolic ID" (as described below by this document) -- DPC++ RT uses these two mappings to obtain required information and somehow - uses it +- DPC++ RT uses these two mappings to obtain required information -This design document describes two approaches of how the -"address of a host variable" -> "unique symbolic ID" mapping can be generated: +This design document describes two approaches for how the mapping of +"address of a host variable" -> "unique symbolic ID" can be generated: the first one with integration footer and another one with modification of the host compiler. Both approaches have their pros and cons and they are expected to be implemented -and exist in the implementation at the same time, but only one of them will be -used at a time depending on whether 3rd-party host compiler is used or not. +and exist in the implementation at the same time. Only one of them will be +used at a time, depending on whether a 3rd-party host compiler is used or not. -Integration footer can be used with 3rd-party host compilers, but it requires -appending to a translation unit provided by a user, which could affect debug -information: since there are no compilers that support appending a file at the -end (similar to `-include`), appending is done by generating a temporary input -file using concatenation of the original input and integration footer. +Integration footer can be used with 3rd-party host compilers. This, however +requires appending to a translation unit provided by a user, which could affect +debug information. Since there are no compilers that support appending a file at +the end (similar to `-include`), appending is done by generating a temporary +input file using concatenation of the original input and integration footer. Such replacement of the main translation unit causes the following issues: - debug information about the source file might be incorrect, leading to @@ -58,8 +57,8 @@ Customizing host compiler allows to avoid issues with debuggers and code coverage tools, but that is not an option if a user wants to compile host part of an app with a 3rd-party host compiler. -Further sections describe the implementation design of both approaches in more -details, note that there are few components which should be modified regardless +The sections below describe the implementation design of both approaches in more +detail. Note that there are few components which should be modified regardless of which approach is in use. ## Common front-end part @@ -74,11 +73,11 @@ things: - emit `sycl-unique-id` LLVM IR attribute on each definition of a variable of type marked with `[[__sycl_detail__::uniquely_identifiable_object(kind)]]` attribute. `sycl-unique-id` LLVM IR attribute should be accompanied by a - unique string identifier of a variable it is attached to. The rules for + unique string identifier of the variable it is attached to. The rules for creating this string are the same as for `__builtin_sycl_unique_stable_id` and the same algorithm can be used when generating the string for the attribute - emit `sycl-uid-kind` LLVM IR attribute alongside `sycl-unique-id`, which - contains the `kind` string passed to + contains the `kind` string passed via `[[__sycl_detail__::uniquely_identifiable_object(kind)]]` attribute **TODO**: we have `[[__sycl_detail__::device_global]]` attribute documented in @@ -97,7 +96,7 @@ When DPC++ compiler is used as both host and device compiler, then the attribute should be respected by both host and device compiler passes and LLVM IR attributes should appear in LLVM IR for both host and device code. When DPC++ compiler is only used as a device compiler, then we don't expect the attribute -to be handled on host, apparently. +to be handled on host. Another thing we need from DPC++ FE host compiler is to define a special macro, which will allow to distinguish it from other host compilers. That is needed to @@ -181,8 +180,8 @@ chooses another approach by simply doing nothing related to integration footer: ## Integration footer approach -When this approach is used, not only extra file (integration footer) is -generated, but integration header is also modified: FE compiler generates a +When this approach is used, not only is an extra file (integration footer) +generated, but the integration header is also modified: FE compiler generates a definition of a namespace scope variable of type `__sycl_device_global_registration` whose sole purpose is to run its constructor before the application's `main()` function: @@ -300,17 +299,17 @@ design document][5] for more details on this topic. ## Custom host compiler approach -With this approach, we simply schedule a one more pass in the optimization +With this approach, we simply schedule one more pass in the optimization pipeline, which should be executed regardless of the optimization level, because it is required for proper functioning of some features. -The pass does similar thing to integration footer: it emits a global constructor -which in turn calls `__register_uniquely_identifiable_object` to provide the -runtime with required mapping information. +The pass has functionality similar to the integration footer, i.e. it emits a +global constructor which in turn calls `__register_uniquely_identifiable_object` +to provide the runtime with required mapping information. -Unlike with integration footer approach no separate file is being generated, -which preserves all source files mapping and checksums to be in place and -correct. +Unlike with the integration footer approach, no separate file is being +generated. This preserves all source files mapping and checksums to be in place +and correct. Generated constructor function should have internal linkage to avoid possible names clashes and multiple definition errors later at link stage. From 5b690487755936bbb07ae28cf37c1f82aa096027 Mon Sep 17 00:00:00 2001 From: Alexey Sachkov Date: Mon, 4 Apr 2022 11:41:54 +0300 Subject: [PATCH 08/16] Apply comments --- sycl/doc/design/MappingHostAddressesToDeviceEntities.md | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/sycl/doc/design/MappingHostAddressesToDeviceEntities.md b/sycl/doc/design/MappingHostAddressesToDeviceEntities.md index b12c25be31505..21259ff7629bc 100644 --- a/sycl/doc/design/MappingHostAddressesToDeviceEntities.md +++ b/sycl/doc/design/MappingHostAddressesToDeviceEntities.md @@ -12,7 +12,8 @@ IDs of corresponding specialization constants at SPIR-V level. Another example is device global [implementation][device-global-design], where in order to communicate a value of `device_global` variable between host and device we need to map its host address to a symbolic name/identifier and some -other info, which is used at PI layer and below. +other info like the size of an underlying type of a device global, which is used +at PI layer and below. This design document describes a generic way how to map the address of any SYCL object defined in a namespace scope to its unique symbolic ID. Please note that @@ -24,8 +25,10 @@ design document could be used as a key in those properties to propagate additional information using existing mechanisms. So, the overall process is: -- device compiler generates property set/s which provide mapping - "unique symbolic ID" -> "various information required by DPC++ RT" +- (optionally) device compiler generates property set/s which provide mapping + "unique symbolic ID" -> "various information required by DPC++ RT". + Note: The presence and the format of those property set is defined case by + case for each feature - device or host compiler generates mapping "address of a host variable" -> "unique symbolic ID" (as described below by this document) From ca7857d218647ebb12187924b3e0de27cd972444 Mon Sep 17 00:00:00 2001 From: Alexey Sachkov Date: Mon, 4 Apr 2022 12:06:00 +0300 Subject: [PATCH 09/16] Add the new doc into index --- sycl/doc/index.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/sycl/doc/index.rst b/sycl/doc/index.rst index f67434e7faaa4..cd9fdd5cdaf66 100644 --- a/sycl/doc/index.rst +++ b/sycl/doc/index.rst @@ -28,6 +28,7 @@ Design Documents for the oneAPI DPC++ Compiler Clang Documentation Clang API Reference design/CompilerAndRuntimeDesign + design/MappingHostAddressesToDeviceEntities design/KernelParameterPassing design/PluginInterface design/SpecializationConstants From 90d8cd11570487c45bac56844ac964f716661264 Mon Sep 17 00:00:00 2001 From: Alexey Sachkov Date: Mon, 4 Apr 2022 12:06:29 +0300 Subject: [PATCH 10/16] Add cpp/ir snippets into FE section --- .../MappingHostAddressesToDeviceEntities.md | 27 ++++++++++++++++++- 1 file changed, 26 insertions(+), 1 deletion(-) diff --git a/sycl/doc/design/MappingHostAddressesToDeviceEntities.md b/sycl/doc/design/MappingHostAddressesToDeviceEntities.md index 21259ff7629bc..f257a6c7326f1 100644 --- a/sycl/doc/design/MappingHostAddressesToDeviceEntities.md +++ b/sycl/doc/design/MappingHostAddressesToDeviceEntities.md @@ -67,7 +67,7 @@ of which approach is in use. ## Common front-end part DPC++ FE should support the following attribute: -`[[__sycl_detail__::uniquely_identifiable_object(kind)]]`. This attribute accepts +`[[__sycl_detail__::uniquely_identifiable_object(kind)]]`. The attribute accepts a string literal and should be applied to types (like `device_global` or `specialization_id`). @@ -83,6 +83,31 @@ things: contains the `kind` string passed via `[[__sycl_detail__::uniquely_identifiable_object(kind)]]` attribute +To illustrate, here is a SYCL code snippet: + +``` +template +class + // Note: the attribute usage will be guarded by macro to be only applied when + // DPC++ compiler is used to avoid generating warnings. That is described + // later in the doc + [[__sycl_detail__::uniquely_identifiable_object("specialization_id")]] + specialization_id { + // ... +}; + +specialization_id spec_const(38); +``` + +After processed by DPC++ compiler, it will result in the following LLVM IR: + +``` +%class.specialization_id = type { i32 } +@spec_const = dso_local global %class.specialization_id { i32 38 } #0 + +attributes #0 = { "sycl-unique-id"="string returned by __builtin_sycl_unique_id(spec_const)" "sycl-uid-kind"="specialization_id" } +``` + **TODO**: we have `[[__sycl_detail__::device_global]]` attribute documented in [device global design doc][device-global-design], which instructs front-end to emit some additional semantic checking. Shall we leave it in place or that From aff7c4b9bf4f0ff9b76684d575128998cc95e9fb Mon Sep 17 00:00:00 2001 From: Alexey Sachkov Date: Mon, 4 Apr 2022 12:26:08 +0300 Subject: [PATCH 11/16] Clarify whether the new attribute should be used for semantic checks --- .../MappingHostAddressesToDeviceEntities.md | 20 +++++++++---------- 1 file changed, 9 insertions(+), 11 deletions(-) diff --git a/sycl/doc/design/MappingHostAddressesToDeviceEntities.md b/sycl/doc/design/MappingHostAddressesToDeviceEntities.md index f257a6c7326f1..1ad61d486d5de 100644 --- a/sycl/doc/design/MappingHostAddressesToDeviceEntities.md +++ b/sycl/doc/design/MappingHostAddressesToDeviceEntities.md @@ -108,17 +108,15 @@ After processed by DPC++ compiler, it will result in the following LLVM IR: attributes #0 = { "sycl-unique-id"="string returned by __builtin_sycl_unique_id(spec_const)" "sycl-uid-kind"="specialization_id" } ``` -**TODO**: we have `[[__sycl_detail__::device_global]]` attribute documented in -[device global design doc][device-global-design], which instructs front-end to -emit some additional semantic checking. Shall we leave it in place or that -request for semantic checking should also be documented by -`[[__sycl_detail__::uniquely_identifiable_object(kind)]]` attribute when `kind` -is set to a certain value? - -**TODO**: alternatively, we could completely re-use existing -`[[__sycl_detail__::device_global]]` attribute and introduce another one for -specialization constants, i.e. it is a question of whether or not we want to -generalize unique IDs generation in form of a generic attribute or not. +The new attribute should not be used for any semantic checking and its +sole purpose is to generate necessary LLVM IR attributes. If some feature +requires some semantic checks, then a separate attribute should be introduced +to perform them: for example see `[[__sycl_detail__::device_global]]` in +[device global design doc][device-global-design]. + +Note about `kind` argument: it should not be parsed by the compiler in any way +and it should be simply propagated as-is through the compiler stack to be used +later at runtime. When DPC++ compiler is used as both host and device compiler, then the attribute should be respected by both host and device compiler passes and LLVM IR From 0d576703966f59398ab02189ef80d65f61a4a741 Mon Sep 17 00:00:00 2001 From: Alexey Sachkov Date: Mon, 4 Apr 2022 19:18:44 +0300 Subject: [PATCH 12/16] Fix links to other documents --- sycl/doc/design/MappingHostAddressesToDeviceEntities.md | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/sycl/doc/design/MappingHostAddressesToDeviceEntities.md b/sycl/doc/design/MappingHostAddressesToDeviceEntities.md index 1ad61d486d5de..366d7e525f9aa 100644 --- a/sycl/doc/design/MappingHostAddressesToDeviceEntities.md +++ b/sycl/doc/design/MappingHostAddressesToDeviceEntities.md @@ -5,6 +5,9 @@ has the capability to somehow map addresses of a host objects to their counterparts in device programs. +[sycl-2020-spec]: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html +[device-global-ext-spec]: <../extensions/proposed/sycl_ext_oneapi_device_global.asciidoc> + For example, in order to implement specialization constants on top of SPIR-V, we need to be able to map addresses of `specialization_id` variables to numeric IDs of corresponding specialization constants at SPIR-V level. @@ -15,6 +18,8 @@ device we need to map its host address to a symbolic name/identifier and some other info like the size of an underlying type of a device global, which is used at PI layer and below. +[device-global-design]: + This design document describes a generic way how to map the address of any SYCL object defined in a namespace scope to its unique symbolic ID. Please note that this document doesn't try to map the address to something other than a unique @@ -319,9 +324,9 @@ name lookup. Furthermore, the name of the shim function is globally unique, so it is guaranteed not to be shadowed by any other name in the translation unit. This problem with variable shadowing is also a problem for the integration footer we use for specialization constants. See the [specialization constant -design document][5] for more details on this topic. +design document][spec-constants-design] for more details on this topic. -[5]: +[spec-constants-design]: ## Custom host compiler approach From b83994ca66ecfa51ce0cbce828955095d4053cd1 Mon Sep 17 00:00:00 2001 From: Alexey Sachkov Date: Mon, 4 Apr 2022 21:09:53 +0300 Subject: [PATCH 13/16] Remove a paragraph about globals construction order --- sycl/doc/design/MappingHostAddressesToDeviceEntities.md | 6 ------ 1 file changed, 6 deletions(-) diff --git a/sycl/doc/design/MappingHostAddressesToDeviceEntities.md b/sycl/doc/design/MappingHostAddressesToDeviceEntities.md index 366d7e525f9aa..8dc1cebc61ff3 100644 --- a/sycl/doc/design/MappingHostAddressesToDeviceEntities.md +++ b/sycl/doc/design/MappingHostAddressesToDeviceEntities.md @@ -185,12 +185,6 @@ constructor defined in the same translation unit: this is needed to allow usages of `specialization_id` and `device_global` variables from user-defined global constructors. -That poses some restrictions on those uniquely identifiable objects, i.e. that -they can't be used from another global object due to risk of accessing a -non-initialized object, but that is an UB anyway because the order of global -objects initialization is not defined in C++ when those objects are defined in -a different translation unit. - ## Compiler driver part The compiler driver is the component which is responsible for selecting the From 2f43211c08dcc3de12400b68c5d72d8c91bdd54f Mon Sep 17 00:00:00 2001 From: Alexey Sachkov Date: Mon, 4 Apr 2022 21:16:33 +0300 Subject: [PATCH 14/16] Clarify when registration object is generated in header/footer --- sycl/doc/design/MappingHostAddressesToDeviceEntities.md | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/sycl/doc/design/MappingHostAddressesToDeviceEntities.md b/sycl/doc/design/MappingHostAddressesToDeviceEntities.md index 8dc1cebc61ff3..0b0558db30361 100644 --- a/sycl/doc/design/MappingHostAddressesToDeviceEntities.md +++ b/sycl/doc/design/MappingHostAddressesToDeviceEntities.md @@ -209,7 +209,8 @@ When this approach is used, not only is an extra file (integration footer) generated, but the integration header is also modified: FE compiler generates a definition of a namespace scope variable of type `__sycl_device_global_registration` whose sole purpose is to run its constructor -before the application's `main()` function: +before the application's `main()` (and any other global constructor defined in +a user-provided translation unit) function: ``` namespace sycl::detail { @@ -261,6 +262,12 @@ __sycl_device_global_registration::__sycl_device_global_registration() noexcept } // namespace sycl::detail ``` +Note: the integration footer is only populated with the registration object when +integration footer is enabled. Body of the registration object constructor can +be empty if there are no uniquely identifiable objects found in a translation +unit and FE is free to completely omit registration object generation in that +case as well. + ### Handling shadowed variables The example above shows a simple case where the user's device global variables From e963911a9455e5508174ae38d51a313d68182d47 Mon Sep 17 00:00:00 2001 From: Alexey Sachkov Date: Mon, 4 Apr 2022 21:19:48 +0300 Subject: [PATCH 15/16] Avoid referring to DPC++ compiler as to custom compiler --- .../design/MappingHostAddressesToDeviceEntities.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/sycl/doc/design/MappingHostAddressesToDeviceEntities.md b/sycl/doc/design/MappingHostAddressesToDeviceEntities.md index 0b0558db30361..1c2c0dc00477b 100644 --- a/sycl/doc/design/MappingHostAddressesToDeviceEntities.md +++ b/sycl/doc/design/MappingHostAddressesToDeviceEntities.md @@ -42,7 +42,7 @@ So, the overall process is: This design document describes two approaches for how the mapping of "address of a host variable" -> "unique symbolic ID" can be generated: the first one with integration footer and another one with modification of the -host compiler. +DPC++ host compiler. Both approaches have their pros and cons and they are expected to be implemented and exist in the implementation at the same time. Only one of them will be @@ -61,7 +61,7 @@ Such replacement of the main translation unit causes the following issues: code debugging to be completely broken in some environments (such as MS Visual Studio, for example) -Customizing host compiler allows to avoid issues with debuggers and code +Modifying DPC++ host compiler allows to avoid issues with debuggers and code coverage tools, but that is not an option if a user wants to compile host part of an app with a 3rd-party host compiler. @@ -329,7 +329,7 @@ design document][spec-constants-design] for more details on this topic. [spec-constants-design]: -## Custom host compiler approach +## Using a modified DPC++ as single source compiler With this approach, we simply schedule one more pass in the optimization pipeline, which should be executed regardless of the optimization level, because @@ -354,9 +354,9 @@ attributes into the corresponding arguments of the function. ### Handling shadowed variables Unlike with the integration footer the problem with shadowed variables doesn't -really exists with the custom host compiler approach, because it is compiler -responsibility to uniquely identify shadowed variables at LLVM IR level and we -are simply re-using what is already there. +really exists with the modified DPC++ host compiler, because it is compiler +responsibility to uniquely identify shadowed variables at LLVM IR level +and we are simply re-using what is already there. For example, for the following code snippet: From 96c0e37279fa1a74b4db818b59e85d98a50379a6 Mon Sep 17 00:00:00 2001 From: Alexey Sachkov Date: Mon, 4 Apr 2022 21:23:59 +0300 Subject: [PATCH 16/16] Apply comments --- sycl/doc/design/MappingHostAddressesToDeviceEntities.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/sycl/doc/design/MappingHostAddressesToDeviceEntities.md b/sycl/doc/design/MappingHostAddressesToDeviceEntities.md index 1c2c0dc00477b..3d22c3bf53392 100644 --- a/sycl/doc/design/MappingHostAddressesToDeviceEntities.md +++ b/sycl/doc/design/MappingHostAddressesToDeviceEntities.md @@ -102,6 +102,8 @@ class }; specialization_id spec_const(38); + +// some code which uses spec_const within a SYCL Kernel Function ``` After processed by DPC++ compiler, it will result in the following LLVM IR: @@ -354,7 +356,7 @@ attributes into the corresponding arguments of the function. ### Handling shadowed variables Unlike with the integration footer the problem with shadowed variables doesn't -really exists with the modified DPC++ host compiler, because it is compiler +really exists with the modified DPC++ host compiler, because it is compiler's responsibility to uniquely identify shadowed variables at LLVM IR level and we are simply re-using what is already there.