From dd7cdbd3a470696e19b06d0dcc9d14ec8d2b63fd Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Wed, 29 Sep 2021 18:17:20 -0400 Subject: [PATCH 1/3] Add FPGA properties to device global spec Although these properties are intended mostly for FPGA users, there is no prohibition against using them for other devices. Therefore, we describe them in the main device global spec, rather than creating a separate add-on spec for FPGA. --- .../SYCL_INTEL_device_global.asciidoc | 235 +++++++++++++++++- 1 file changed, 226 insertions(+), 9 deletions(-) diff --git a/sycl/doc/extensions/DeviceGlobal/SYCL_INTEL_device_global.asciidoc b/sycl/doc/extensions/DeviceGlobal/SYCL_INTEL_device_global.asciidoc index 3ed82fa748218..f97f06e5f1af1 100755 --- a/sycl/doc/extensions/DeviceGlobal/SYCL_INTEL_device_global.asciidoc +++ b/sycl/doc/extensions/DeviceGlobal/SYCL_INTEL_device_global.asciidoc @@ -19,6 +19,7 @@ // This is necessary for asciidoc, but not for asciidoctor :cpp: C++ +:dpcpp: DPC++ == Introduction In OpenCL 2.0 and later, a user is able to allocate program @@ -55,8 +56,8 @@ products. == Version -Built On: 2021-08-01 + -Revision: 2 +Built On: 2021-09-30 + +Revision: 3 == Contact @@ -186,13 +187,9 @@ The allocation of type _T_ is zero-initialized on each device prior to the first `device_global` may only be declared with static storage duration at namespace scope or class scope. If a `device_global` is declared with any other storage duration or scope, the program is ill-formed. -Properties may be specified for a `device_global` to provide semantic modification or optimization hint information to the compiler. Specific properties are defined in other extensions, but example uses of a property (with a "no copy" attribute described by another extension) are: - -[source,c++] ----- -device_global> dm1; -device_global> dm2; ----- +Properties may be specified for a `device_global` to provide semantic +modification or optimization hint information to the compiler. See the section +below for a list of the properties that are allowed. [NOTE] ==== @@ -398,6 +395,214 @@ Available only if `sycl::is_property_of_v> dm1; +device_global> dm2; +---- + +The following code synopsis shows the set of supported properties, and the +following table describes their effect. + +[source,c++] +---- +namespace sycl::ext::oneapi { + +struct device_image_life { + using value_t = property_value; +}; + +struct copy_access { + enum class access: /*unspecified*/ { + read, + write, + read_write, + no_access + }; + template + using value_t = property_value>; + +struct init_via { + enum class trigger: /*unspecified*/ { + reprogram, + reset + }; + template + using value_t = property_value>; +}; + +struct implement_in_csr { + template + using value_t = property_value>; +}; + + +inline constexpr device_image_life::value_t device_image_life_v; + +template +inline constexpr copy_access::value_t copy_access_v; + +template +inline constexpr init_via::value_t init_via_v; + +template +inline constexpr implement_in_csr::value_t implement_in_csr_v; + +} // namespace sycl::ext::oneapi +---- + +[frame="topbot",options="header"] +|=== +|Property |Description + +a| +[source,c++] +---- +device_image_life +---- +a| +This property is most useful for kernels that are submitted to an FPGA device, +but it may be used with any kernel. Normally, a single instance of a device +global variable is allocated for each device, and that instance is shared by +all kernels that are submitted to the device, regardless of which _device +image_ contains the kernel. When this property is specified, it is an +assertion by the user that the device global is referenced only from kernels +that are contained by the same _device image_. An implementation may be able +to optimize accesses to the device global when this property is specified +(especially on an FPGA device), but the user must be aware of which _device +image_ contains the kernels that use the variable. + +A device global that is decorated with this property may not be accessed from +kernels that reside in different _device images_, either by direct reference +to the variable or indirectly by passing the variable's address to another +kernel. The implementation is required to diagnose an error if the kernels +that directly access a variable do not all reside in the same _device image_, +however no diagnostic is required for an indirect access from another _device +image_. + +When a device global is decorated with this property, the implementation +re-initializes it whenever the _device image_ is loaded onto the device. As a +result, the application can only be guaranteed that a device global retains its +value between kernel invocations if it understands when the _device image_ is +loaded onto the device. For an FPGA, this happens whenever the device is +reprogrammed. Other devices typically load the _device image_ once before the +first invocation of any kernel in that _device image_, and then it remains +loaded onto the device until the program terminates. + +The application may copy to or from a device global even before any kernel in +the _device image_ is submitted to the device. Doing so causes the device +global to be initialized immediately before the copy happens. (Typically, the +copy operation causes the _device image_ to be loaded onto the device also.) +As a result, copying from a device global returns the initial value if the +_device image_ that contains the variable is not currently loaded onto the +device. + +a| +[source,c++] +---- +copy_access +---- +a| +This property provides an assertion by the user telling the implementation +whether the host code copies to or from the device global. As a result, the +implementation may be able to perform certain optimizations. Although this +property may be used with any device, it is generally only beneficial when used +on FPGA devices. + +The following values are supported: + +* `read`: The user asserts that the host code may copy from (read) the + variable, but it will never copy to (write) it. For an FPGA device, only a + read port is exposed. +* `write`: The user asserts that the host code may copy to (write) the + variable, but it never copy from (read) it. For an FPGA device, only a write + port is exposed. +* `no_access`: The user asserts that the host code will never copy to or copy + from the variable. For an FPGA device, no external ports are exposed. +* `read_write`: The user provides no assertions, and the host code may either + copy to or copy from the variable. This is the default. For an FPGA device, + a read/write port is exposed. + +a| +[source,c++] +---- +init_via +---- +a| +This property is only meaningful when used with an FPGA device. It is ignored +for other devices. The following values are supported: + +* `reprogram`: Initialization is performed by reprogramming the device. This + may require more frequent reprogramming but may reduce area. +* `reset`: Initialization is performed by sending a reset signal to the device. + This may increase area but may reduce reprogramming frequency. + +If the `init_via` property is not specified it may be set automatically to an +implementation defined default. + +a| +[source,c++] +---- +implement_in_csr +---- +a| +This property is only meaningful when used with an FPGA device. It is ignored +for other devices. The following values are supported: + +* `true`: Access to this memory is done through a CSR interface shared with + kernel arguments. +* `false`: Access to this memory is done through a dedicated interface. + +If the `implement_in_csr` property is not specified it may be set automatically +to an implementation defined default. + +|=== + +[NOTE] +==== +As stated above, the user must understand which _device image_ contains a +kernel in order to use the `device_image_life` property. Each implementation +may have its own rules that determine when two kernels are bundled together +into the same _device image_. For {dpcpp} two kernels _K1_ and _K2_ will be +bundled into the same _device image_ when both of the following conditions are +satisfied: + +* The translation unit containing _K1_ and the translation unit containing _K2_ + must both be compiled with `-fsycl-targets=X + -fsycl-assume-all-kernels-run-on-targets` where the target `X` is the same in + both compilations. (A list of targets may also be specified such as + `-fsycl-targets=X,Y`. In this case the list must be the same in both + compilations.) + +* The application must be linked with `-fsycl-device-code-split` such that the + kernels _K1_ and _K2_ are not split into different _device images_. For + example, if _K1_ and _K2_ reside in the same translation unit, + `-fsycl-device-code-split=per_source` will guarantee that they are bundled + together in the same _device image_. If they reside in different translation + units, `-fsycl-device-code-split=none` will guarantee that they reside in the + same _device image_. + +In addition, the following factors also affect how kernels are bundled into +_device images_: + +* Kernels that are online-compiled using `sycl::kernel_bundle` may reside in + different _device images_ if they are compiled from different `kernel_bundle` + objects. + +* A kernel that uses specialization constants may have a new instance in a new + _device image_ each time the application sets a new value for the + specialization constant. However, this happens only if the device supports + native specialization constants, which is not the case for FPGA devices. +==== + === Relax language restrictions for SYCL device functions SYCL 2020 restrictions must be relaxed to allow `device_global` to be used within @@ -650,6 +855,9 @@ void copy(const std::remove_all_extents_t *src, ---- | `T` must be device copyable. +Not available if `PropertyListT` contains the `copy_access` property with +`read` or `no_access` assertions. + Copies _count_ elements of type `std::remove_all_extents_t` from the pointer _src_ to the `device_global` _dest_, starting at _startIndex_ elements of _dest_. _src_ may be either a host or USM pointer. a| [source, c++] @@ -662,6 +870,9 @@ void copy(const device_global& src, ---- | `T` must be device copyable. +Not available if `PropertyListT` contains the `copy_access` property with +`write` or `no_access` assertions. + Copies _count_ elements of type `std::remove_all_extents_t` from the `device_global` _src_ to the pointer _dest_, starting at _startIndex_ elements of _src_. _dest_ may be either a host or USM pointer. a| @@ -673,6 +884,9 @@ void memcpy(device_global& dest, ---- |`T` must be device copyable. +Not available if `PropertyListT` contains the `copy_access` property with +`read` or `no_access` assertions. + Copies _count_ bytes from the pointer _src_ to the `device_global` _dest_, starting at _offset_ bytes. _src_ may be either a host or USM pointer. a| @@ -685,6 +899,9 @@ void memcpy(void *dest, ---- |`T` must be device copyable. +Not available if `PropertyListT` contains the `copy_access` property with +`write` or `no_access` assertions. + Copies _count_ bytes from the `device_global` _src_ to the pointer _dest_, starting at _offset_ bytes. _dest_ may be either a host or USM pointer. |==== -- From e0966b0f68d4248f7ff698557685bf33d417aa51 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Fri, 1 Oct 2021 13:47:56 -0400 Subject: [PATCH 2/3] Rename device global properties Rename properties to address review comments. --- .../SYCL_INTEL_device_global.asciidoc | 56 +++++++++---------- 1 file changed, 28 insertions(+), 28 deletions(-) diff --git a/sycl/doc/extensions/DeviceGlobal/SYCL_INTEL_device_global.asciidoc b/sycl/doc/extensions/DeviceGlobal/SYCL_INTEL_device_global.asciidoc index f97f06e5f1af1..0f370556d2782 100755 --- a/sycl/doc/extensions/DeviceGlobal/SYCL_INTEL_device_global.asciidoc +++ b/sycl/doc/extensions/DeviceGlobal/SYCL_INTEL_device_global.asciidoc @@ -405,8 +405,8 @@ parameter as shown in this example: ---- using namespace sycl::ext::oneapi; -device_global> dm1; -device_global> dm2; +device_global> dm1; +device_global> dm2; ---- The following code synopsis shows the set of supported properties, and the @@ -416,27 +416,27 @@ following table describes their effect. ---- namespace sycl::ext::oneapi { -struct device_image_life { - using value_t = property_value; +struct device_image_scope { + using value_t = property_value; }; -struct copy_access { +struct host_access { enum class access: /*unspecified*/ { read, write, read_write, - no_access + none }; template - using value_t = property_value>; + using value_t = property_value>; -struct init_via { +struct init_mode { enum class trigger: /*unspecified*/ { reprogram, reset }; template - using value_t = property_value>; + using value_t = property_value>; }; struct implement_in_csr { @@ -445,13 +445,13 @@ struct implement_in_csr { }; -inline constexpr device_image_life::value_t device_image_life_v; +inline constexpr device_image_scope::value_t device_image_scope_v; -template -inline constexpr copy_access::value_t copy_access_v; +template +inline constexpr host_access::value_t host_access_v; -template -inline constexpr init_via::value_t init_via_v; +template +inline constexpr init_mode::value_t init_mode_v; template inline constexpr implement_in_csr::value_t implement_in_csr_v; @@ -466,7 +466,7 @@ inline constexpr implement_in_csr::value_t implement_in_csr_v; a| [source,c++] ---- -device_image_life +device_image_scope ---- a| This property is most useful for kernels that are submitted to an FPGA device, @@ -508,7 +508,7 @@ device. a| [source,c++] ---- -copy_access +host_access ---- a| This property provides an assertion by the user telling the implementation @@ -525,7 +525,7 @@ The following values are supported: * `write`: The user asserts that the host code may copy to (write) the variable, but it never copy from (read) it. For an FPGA device, only a write port is exposed. -* `no_access`: The user asserts that the host code will never copy to or copy +* `none`: The user asserts that the host code will never copy to or copy from the variable. For an FPGA device, no external ports are exposed. * `read_write`: The user provides no assertions, and the host code may either copy to or copy from the variable. This is the default. For an FPGA device, @@ -534,7 +534,7 @@ The following values are supported: a| [source,c++] ---- -init_via +init_mode ---- a| This property is only meaningful when used with an FPGA device. It is ignored @@ -545,7 +545,7 @@ for other devices. The following values are supported: * `reset`: Initialization is performed by sending a reset signal to the device. This may increase area but may reduce reprogramming frequency. -If the `init_via` property is not specified it may be set automatically to an +If the `init_mode` property is not specified it may be set automatically to an implementation defined default. a| @@ -569,7 +569,7 @@ to an implementation defined default. [NOTE] ==== As stated above, the user must understand which _device image_ contains a -kernel in order to use the `device_image_life` property. Each implementation +kernel in order to use the `device_image_scope` property. Each implementation may have its own rules that determine when two kernels are bundled together into the same _device image_. For {dpcpp} two kernels _K1_ and _K2_ will be bundled into the same _device image_ when both of the following conditions are @@ -855,8 +855,8 @@ void copy(const std::remove_all_extents_t *src, ---- | `T` must be device copyable. -Not available if `PropertyListT` contains the `copy_access` property with -`read` or `no_access` assertions. +Not available if `PropertyListT` contains the `host_access` property with +`read` or `none` assertions. Copies _count_ elements of type `std::remove_all_extents_t` from the pointer _src_ to the `device_global` _dest_, starting at _startIndex_ elements of _dest_. _src_ may be either a host or USM pointer. a| @@ -870,8 +870,8 @@ void copy(const device_global& src, ---- | `T` must be device copyable. -Not available if `PropertyListT` contains the `copy_access` property with -`write` or `no_access` assertions. +Not available if `PropertyListT` contains the `host_access` property with +`write` or `none` assertions. Copies _count_ elements of type `std::remove_all_extents_t` from the `device_global` _src_ to the pointer _dest_, starting at _startIndex_ elements of _src_. _dest_ may be either a host or USM pointer. @@ -884,8 +884,8 @@ void memcpy(device_global& dest, ---- |`T` must be device copyable. -Not available if `PropertyListT` contains the `copy_access` property with -`read` or `no_access` assertions. +Not available if `PropertyListT` contains the `host_access` property with +`read` or `none` assertions. Copies _count_ bytes from the pointer _src_ to the `device_global` _dest_, starting at _offset_ bytes. _src_ may be either a host or USM pointer. @@ -899,8 +899,8 @@ void memcpy(void *dest, ---- |`T` must be device copyable. -Not available if `PropertyListT` contains the `copy_access` property with -`write` or `no_access` assertions. +Not available if `PropertyListT` contains the `host_access` property with +`write` or `none` assertions. Copies _count_ bytes from the `device_global` _src_ to the pointer _dest_, starting at _offset_ bytes. _dest_ may be either a host or USM pointer. |==== From 539edbfe27ebd540c8ba5b74f9e8fab416d87863 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Tue, 19 Oct 2021 12:35:24 -0400 Subject: [PATCH 3/3] Update wording about default behavior Clarify default behavior when `init_mode` or `implement_in_csr` are not specified. --- .../DeviceGlobal/SYCL_INTEL_device_global.asciidoc | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/sycl/doc/extensions/DeviceGlobal/SYCL_INTEL_device_global.asciidoc b/sycl/doc/extensions/DeviceGlobal/SYCL_INTEL_device_global.asciidoc index 0f370556d2782..1fe4db3eea11e 100755 --- a/sycl/doc/extensions/DeviceGlobal/SYCL_INTEL_device_global.asciidoc +++ b/sycl/doc/extensions/DeviceGlobal/SYCL_INTEL_device_global.asciidoc @@ -545,8 +545,9 @@ for other devices. The following values are supported: * `reset`: Initialization is performed by sending a reset signal to the device. This may increase area but may reduce reprogramming frequency. -If the `init_mode` property is not specified it may be set automatically to an -implementation defined default. +If the `init_mode` property is not specified, the default behavior is +equivalent to one of the values listed above, but the choice is implementation +defined. a| [source,c++] @@ -561,8 +562,9 @@ for other devices. The following values are supported: kernel arguments. * `false`: Access to this memory is done through a dedicated interface. -If the `implement_in_csr` property is not specified it may be set automatically -to an implementation defined default. +If the `implement_in_csr` property is not specified, the default behavior is +equivalent to one of the values listed above, but the choice is implementation +defined. |===