diff --git a/sycl/ReleaseNotes.md b/sycl/ReleaseNotes.md index 4cf80c8a4747c..861a1f3ce163e 100644 --- a/sycl/ReleaseNotes.md +++ b/sycl/ReleaseNotes.md @@ -1,3 +1,377 @@ +# June'22 release notes + +Release notes for commit range f34ba2c..4043dda + +## New features +### SYCL Compiler + - Added `-fcuda-prec-sqrt` frontend compiler option which enables higher presision version of `sqrt` in the device code [ebf9ea8] + - Added support for local memory accessors for the HIP backend. [58508ba] + - Added initial support of `-lname` processing when searching for fat static libraries. [35e32d8] [a33f9c8] + - Added `-fsycl-fp32-prec-sqrt` flag which enables correctly rounded `sycl::sqrt`. [5c8b7e7] + - Added support for `[[intel::loop_count()]]` attribute. [c536e76] + - Added support for passing driver options to JIT compiler and linker. [1c93bfe] + - Added default argument support for `work_group_size_hint` attribute. [0cff80e] + - Added support for float and double exchange and compare exchange atomic operations in CUDA libclc. [1d84c99] + - Added `--ffast-math` support for CUDA libclc. [0f0c5d1] + - Added support for software atomics (except for the ones using system scope) for lower sm versions of CUDA architecture. Enabled `SYCL_USE_NATIVE_FP_ATOMICS` by default. [7bc8447] + - Added support for the global offset for AMDGPU. [2dc3c06] + - Added support for asynchronous barrier for CUDA backend sm 80+. [6770421] + - Added `-f[no-]sycl-device-lib-jit-link` option to control JIT linking of SYCL device libraries. [dfb37a8] [c946286] + - Added support for the new FPGA attribute `[[intel::fpga_pipeline(N)]]` for loop pipelining. [92aadf3] + - Added `assert` support for Windows NVPTX. [f29b498] + - Added support for [`sycl_ext_oneapi_properties` extension](doc/extensions/experimental/sycl_ext_oneapi_properties.asciidoc). [87f60f6][1984e74][a2583ec][cdf561a][d2982c6][35c2e00] + +### SYCL Library + - Added support for Nvidia MMA for `bf16`, mixed precision int `((u)int8/int32)`, and mixed precision float `(half/float)`. [5373362] + - Added a mode for the Level Zero plugin where only last command in each batch yields a host-visible event. Enabled this mode by default. [c6b7b8e] + - Added an option to query for atomic scope capabilities for the CUDA backend. Updated returns for atomics memory order capabilties. [43a4192] + - Added support for an experimental Level Zero API for host pointer import into USM. The feature can be enabled using `SYCL_USM_HOSTPTR_IMPORT` environment variable. [844d7b6] + - Added support for the `wi_element` for `bf16` type. [9f2b7bd] + - Added complex support for the reduce and scan group algorithms. [90a4dc7] + - Added support for SYCL 2020 methods in the `group` class. [73d59ce] + - Added `SYCL_RT_WARNING_LEVEL` environment variable which allows to control amount of warnings and performance hints the runtime library may print. [2741010] + - Added `tanh` (for floats/halfs) and `exp2` (for halfs) native definitions for CUDA backend. [250c498] + - Added `bf16` builtins for `fma`, `fmin`, `fmax` and `fmax` on CUDA backend. [62651dd] + - Added support for USM buffer location properties which allows to specify at what memory location the device usm allocation should be in. [12c988a] + - Added support for `buffer_location` property to the `sycl::buffer`. [9808525] + - Added `single_task` support for ESIMD_EMULATOR backend. [2331160] + - Added support for SVM 1,2,4-elements gather/scatter for ESIMD. [e200720] + - Added support for `bf16` builtins operating on storage types for CUDA backend. [413a9ef] + - Added support for `backend_version` device property for CUDA backend. [4b1a4bc] + - Added support for round-robin submissions to multiple compute CCS for the Level Zero backend. Disabled by default, can be controlled using `SYCL_PI_LEVEL_ZERO_USE_COMPUTE_ENGINE`. [a836c87] + - Added support for buffer migration for contexts with multiple devices in the Level Zero plugin. [7baf152] + - Added mode where the Level Zero plugin uses immediate command-lists instead of standard command-lists. This mode is disabled by default, can be enabled using `SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS` environment variable. [b9cb1d1] + - Added support for `sycl::get_native(sycl::buffer)` for OpenCL and CUDA backends. [8b3c8c4] + - Added reduction overloads accepting `span`. [863383b] + - Added LSC support for ESIMD_EMULATOR backend. [b78bf00] + - Added `half` type support for `__esimd_convertvector_to/from`. [0bfffd6] + - Added `buffer_allocator` SYCL 2020 conformant variant. [53430c8] + - Added support for the USM buffer location property in `malloc_shared`. [6e89821] [9f61c8e][8c4d9a5] + - Added support for the USM buffer location property in `malloc_host`. [2c7caab] + - Added experimental context and device interoperability support for CUDA. [f0df89a] + - Added support for memory intrinsics for the ESIMD_EMULATOR plugin. [1a8f501] + - Added support for named barrier APIs for ESIMD. [1df0038] + - Added support for DPAS API for ESIMD. [5881938] + - Added support for LSC memory access APIs for ESIMD. [4bd50e7] + - Added support for the `invoke_simd` feature. [4072557][8471ff3][8c7bb45][62afb59][3e1c1bf] + - Added support for `info::device::atomic64` for OpenCL and Level Zero backends. [8feb558] + - Added support for [`sycl_ext_oneapi_usm_device_read_only` extension](doc/extensions/supported/sycl_ext_oneapi_usm_device_read_only.asciidoc) [644c614][58c9d3a] + - Added support for mapping/unmapping operations for ESIMD_EMULATOR plugin. [bc0579a] + - Added support for `make_buffer` API for the Level Zero backend. [7c49984] + - Added interoperability support for HIP backend. [e06d1b5] + - Added missing `+-*/` operations for `half`. [059efbc] + - Introduced new environment variable `SYCL_PI_CUDA_MAX_LOCAL_MEM_SZ` to control the max local memory allowed to be allocated per kernel on CUDA backend. [2e24304] + - Added `ext_intel_global_host_space` in accordance with [`sycl_ext_intel_usm_address_spaces` extension](doc/extensions/supported/sycl_ext_intel_usm_address_spaces.asciidoc). [7a2f44b] + - Added aspect for `bfloat16`. [f84fc32] + - Introduced "Intel math functions" device library with support of type cast util functions for float, double and integer type. [a310952] + - Added `bfloat16` support for `joint_matrix` [6ac62ab] + +### Documentation + - Added [`sycl_ext_oneapi_complex_algorithms` extension](doc/extensions/proposed/sycl_ext_oneapi_complex_algorithms.asciidoc) [7ae7ca8] + - Added a design document for [`sycl_ext_oneapi_device_global` extension](doc/design/DeviceGlobal.md) [8c22ef1] + - Added a design document for [`sycl_ext_oneapi_properties` extension](doc/design/CompileTimeProperties.md) [912572f] + - Added new [`sycl_ext_oneapi_free_function_queries` proposal](doc/extensions/proposed/sycl_ext_oneapi_free_function_queries.asciidoc). [7a93a49] + - Added [`sycl_ext_oneapi_group_load_store` extension](doc/extensions/experimental/sycl_ext_oneapi_group_load_store.asciidoc). [85ccdc0] + - Added validation rules to the SPIR-V extension [`SPV_INTEL_global_variable_decorations`](doc/design/spirv-extensions/SPV_INTEL_global_variable_decorations.asciidoc). [dfaa070] + - Added [`SYCL_INTEL_buffer_location` extension](doc/extensions/supported/sycl_ext_intel_buffer_location.asciidoc) to support `buffer_location` property for USM allocations. [962417d] [36a9ee2] + - Added [`sycl_ext_oneapi_named_sub_group_sizes` extension](doc/extensions/proposed/sycl_ext_oneapi_named_sub_group_sizes.asciidoc) proposal which aims to simplify the process of using sub-groups. [4f3d7e1] + - Added experimental latency control API into [`SYCL_INTEL_data_flow_pipes`](doc/extensions/supported/sycl_ext_intel_dataflow_pipes.asciidoc). [5224f78] + - Added [`sycl_ext_oneapi_auto_local_range` extension](doc/extensions/proposed/sycl_ext_oneapi_auto_local_range.asciidoc) proposal. [cb4e702] + - Added [SYCL 2020 spec constants](doc/design/SYCL2020-SpecializationConstants.md) design doc. [8ec9755] + - Added [`sycl_ext_oneapi_queue_status_query` extension](doc/extensions/proposed/sycl_ext_oneapi_queue_status_query.asciidoc) proposal. [b6143e5] + - Added initial version of [`sycl_ext_oneapi_invoke_simd`](doc/extensions/proposed/sycl_ext_oneapi_invoke_simd.asciidoc) and [`sycl_ext_oneapi_uniform`](doc/extensions/proposed/sycl_ext_oneapi_uniform.asciidoc) extenions proposal. [a37ca84] + - Added the [`sycl_ext_oneapi_annotated_arg` extension](doc/extensions/proposed/sycl_ext_oneapi_annotated_arg.asciidoc) proposal for applying properties on kernel arguments. [caa696f] + - Added [`sycl_ext_oneapi_cuda_async_barrier` extension](doc/extensions/experimental/sycl_ext_oneapi_cuda_async_barrier.asciidoc) for CUDA backend. [6770421] + - Added `bfloat16` support to the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions into [`sycl_ext_oneapi_bfloat16` extension](doc/extensions/experimental/sycl_ext_oneapi_bfloat16.asciidoc). [c76ef5c] + - Added initial version of [`sycl_ext_oneapi_root_group` extension](doc/extensions/proposed/sycl_ext_oneapi_root_group.asciidoc) proposal. [b59cd43] + +### Tools + - Implemented property set generation for device globals in the sycl-post-link. Added the `--device-globals` command-line argument for lowering and generating information about device global variables. [88123c1] + - Introduced XPTI-based tools for SYCL applications: sycl-trace, sycl-prof, sycl-sanitize. [789d138] + - Add support for tracing Level Zero API calls using XPTI and updated sycl-trace tool to be able to display both PI and Level Zero calls simultaneously. [fc9cf52] + +## Improvements +### SYCL Compiler + - Added a diagnostic on attempt to use zero length arrays in the device code [52e8f58] + - Added support for consuming fat objects containing SPIR-V [c878063] + - Added support for generating SPRIV based fat objects [1e94ef3] + - Added support for group collective functions for HIP backend [106882c] + - Added a diagnostic on attempt to use `-fsycl` and `-static-libstdc++` together. This combination is not supported due to the runtime dependence with libsycl.so [bb0055c] + - Added support for atomic loads and stores with various memory orders and scopes [e15ac50] [6b2635e] + - Improved performance of accessing memory pointed by `sycl::accessor` for FPGA device [fbab374] + - Added support for setting CUDA installation directory using `CUDA_PATH` environment variable [b0c145a] + - Improved deferred diagnostics for usages within function templates in device code. [d0efca5] + - Added support for `sycl_special_class` attribute to mark SYCL classes/struct that need the additional compiler handling. [8ba9c79] + - Improved driver to do device section checking only when offloading is enabled. [3742b93] + - Allowed calls to constant expression function pointers in device code. [e84c952] + - Disabled the passing code coverage and profiling options to device compilation. + - Added clang support of code location information for kernels. [96d2e17] + - Disallowed explicit casts between mismatching address spaces. [1cee960] + - Added support of `[[sycl::device_has]]` attribute on kernel. [aa2162c] + - Added a warning on explicit cast from default address space to named. [9adb25b] + - Added a warning for converting 'c' input to 'c++' in SYCL mode. [5b62ee0] + - Silenced unknown attribute warnings on host compilation. [2d359df] + - Added a diagnostic on attemp to use `accessor::operator[]` in ESIMD code. [9d7a651] + - Expanded driver's ability to discover fat static archives after `/link` option on Windows. [271ef40] + - Added support for saving user specified names for lambda captures in kernel lambda object for FPGA target. [af29982] [5ffb2ee] + - Adjusted the compilation when preprocessing is requested to allow for the device compilation to fail and continue to perform the preprocessing steps. [7f2e99c] + - Added the ability to detect a kernel size mismatch in the case when the host and device compilers are different. [ef90e6a] + - Improved handling of specialization constants by backends. [e62b5aa] + - Improved support of `-mlong-double` options. [6083920] + - Improved `-save-temps` to allow optimization when performing a SYCL device compilation. [05fe5ae] + - Removed warning diagnostic on host compilation when using `__attribute__((sycl_device))`. [49e595e] + - Introduceed multiple streams in each queue for CUDA backend to improve concurrent execution. [dd41845] + - Improved compiler to collect information for optimization record only if optmization record is saved by user (i.e. `-fsave-optimization-record` or `-opt-record-file` is passed). [cb94c80] + - Improved `[[intel::max_concurrency()]]` attribute support. [b6aa4cc] + - Added the new `kernel_arg_exclusive_ptr` metadata which guarantees that the kernel pointer argument, or pointers that derive from it, will not be dereferenced outside current invocation of the kernel. [e03c4ed] + - Added a warning for the case when invalid subgroup size is used on kernel for CUDA backend. [6dab69f] + - Improved deprecation messaging for options. [f0b65a1] + - Improved diagnostic behavior for `-fsanitize` with `-fsycl`. [9397cbc] + +### SYCL Library + - Added support for `sycl::ctz` API [d5eb769] + - Improved the diagnostic for unresolved symbols in the device code for Level Zero backend [33cfb9f] + - Added several arithmetic operations for `sycl::ext::oneapi::experimental::wi_element` [fe2e35e] + - Added support for `sycl::queue::submit_barrier` for HIP backend [53f1cce] + - Added `sycl::property_list` APIs to `sycl::stream` [4b39793] + - Defined `sycl::access::decorated` in the SYCL headers. [d183807] + - Improved performance by allowing batching for wait with a barrier commands for Level Zero backend [bd1ed6a] + - Disabled image support for CUDA backend since the support is not complete [90c8f05] + - Avoided JITing unnecessary device code when using `sycl::handler::set_specialization_constant` [5746906] + - Updated image accessor constructor to make it possible to use const references in parallel_for. [3070b95] + - Relax the mutex lock duration in queue finish for the Level Zero backend to allow working with the queue from other threads. [8573935] + - Added XPTI instrumentation for USM allocations. [7df3923] + - Extended XPTI information with buffer constructor data. [fc0d28a] + - Added error handling for `sycl::event::get_profiling_info()`. [ec74a5c] + - Eliminated recursion and duplicated dependencies in leaf buffers handling in the scheduler. [6f40376] + - Improved runtime to emit program build logs when `SYCL_RT_WARNING_LEVEL` is set to 2 or higher. [008519a] + - Improved the error message at command execution failure. [930ecbf] + - Improved runtime to build program for root device only and re-use the binary for sub-devices to avoid "duplicate" builds. [13a7455] + - Improved `sycl::kernel::get_kernel_bundle` performance. [4817b3f] + - Changed USM pooling parameters for the Level Zero backend to boost performance. [57f8a44][b000db8] + - Exposed `value_type` and `min_capacity` from SYCL pipes extension class. [e1619fa] + - Improved thread-safety of the Level Zero plugin by guarding access to the PI objects. [3321141] [a37c10b] [bd80f34] [8f97fe2] + - Optimized half builtins for `fma`, `fmin`, `fmax` and `fmax` on CUDA backend. [62651dd] + - Improved runtime to redirect warning from using `SYCL_DEVICE_FILTER` with `sycl-ls` to `std::cerr`. [70593d6] + - Use new SPIR-V group operations within uniform control flow instead of non-uniform operations in SYCL headers. [9b84dd8] + - Enabled online linking of the device libraries. [9fcab29] + - Improved esimd-verifier logic for detecting valid SYCL calls. [eaf8b42] + - Extended XPTI information with the kernel info. [4b9eef3] + - Improved error message for exceeding CUDA grid limits. [ed877c2] + - Added overload for `sycl::select(a, b, c)` where `c` is a bool. [7ae8fd3] + - Fixed batching related thresholds to improve performance. [c6313bd] + - Added always_inline for libdevice functions to enable which allows an underlying runtime to do inlining. [dfc87cc] + - Improved performance by caching the result of `zeKernelGetName` in the Level Zero plugin. [40cece3] + - Updated the experimental latency control API to use property list and made the template argument approach is deprecated. [273034a] + - Renewed and synced the `queue::parallel_for()` with SYCL2020 . [e59fb89] + - Improved runtime to ignore CUDA prefetch hint if not supported on the system and emit an optional warning in this case depending on warning level set using `SYCL_RT_WARNING_LEVEL`. [082929a] + - Enabled pooling of small USM allocations for the Level Zero backend to improve performance. [6244efe] + - Added managed memory check to enqueue prefetch and made it to ignore the prefetch hint and emit a warning if the memory provided is not managed. [0fe322c] + - Enabled device code instrumentation by default. [53fc8e4] + - Improveed performance of `queue::wait()` on CUDA backend. [8b85a3c] + - Optimized host event wait. [f3a0970] + - Improved default selector to filter devices based on available device images. [0e67db8] + - Enabled caching of native OpenCL and Level Zero executable binaries. [f0283fc] + - Deprecated `sycl::ext::intel::ctz` extension function,`sycl::ctz` from the core specification must be used instead. [50435a6] + - Deprecated `cl::sycl::atomic` in SYCL 2020 mode. [52fad5a] + - Deprecated `ext_intel_host_device_space` which is replaced by `ext_intel_global_host_space`. [7a2f44b] + - Added CUDA-specific USM memory advice values. [e2e71a9] + +### Tools + - Added an option `--enable-esimd-emulator` to enable esimd emulator build using configure.py. [ddbcbb1] + - Added an ability to build plugins separately. [c16412b] + - Added `--enable-all-llvm-targets` switch to configure.py. [5e6642a] + - Added PI tracing support for `loadOsLibrary`. [647249c] + +### Documentation + - Clarified the interaction between the [`sycl_ext_oneapi_invoke_simd` extension](doc/extensions/proposed/sycl_ext_oneapi_invoke_simd.asciidoc) and `SYCL_EXTERNAL` functions. [ac3e816] + - Removed extensions specifications which were adopted to SYCL 2020. Please refer to [extensions/removed/README](doc/extensions/removed/README.md) for the list of removed extensions. [ae7e3d8] [57c9017] + - Structured sycl/doc directory: + - Moved the supported extensions to the [sycl/doc/extensions/supported](doc/extensions/supported). [2521592] + - Moved the experimental extensions to the [sycl/doc/extensions/experimental](doc/extensions/experimental). [eae965c] + - Moved the proposed extensions to the [sycl/doc/extensions/proposed](doc/extensions/proposed). [eb350ee] + - Moved the deprecated extensions to the [sycl/doc/extensions/deprecated](doc/extensions/deprecated). [09be931] + - Moved internal design documentations and BKMs to [sycl/doc/developer](doc/developer). [edbfc99] + - Moved SPIR-V and OpenCL extensions to [doc/design](doc/design). [64e92cb] + - Updated [CompileTimeProperties](doc/design/CompileTimeProperties.md) design documentation. [2359e81][f6420c7] + - Clarified which SPIR-V decorations the `sycl-post-link` tool generates for each device global variable. [3c3b485] + - Updated the design for device global variables for variables that are "shadowed" in an unnamed namespace. [e211d73] + - Clarified the specification that device global with `SYCL_EXTERNAL` is allowed. [0d95d6f] + - Add an overview README for the extensions directory. [d54708a] + - Clarified the C++ attribute `[[sycl_detail::uses_aspects()]]` usage in the documentation. [27cc930] + - Added a template document to use when creating new SYCL extension specifications. [4bd6d20] + - Added new rule for naming C++ identifiers in the SYCL project. [1445528] + - Updated [GettingStartedGuide](doc/GetStartedGuide.md) to recommend cuda 11.6. [0b456ce] + - Added ESIMD_EMULATOR to `SYCL_DEVICE_FILTER` description. [526ad0c] + - Clarified availability of `get_property()`. [a2c5e90] + - Deprecated extended atomics extension. [7581741] + - Improved [GettingStartedGuide](doc/GetStartedGuide.md) for non-standard HIP installations. [09c3b46] + - Added description of ESIMD_EMULATOR backend to [sycl_ext_intel_esimd/README](doc/extensions/experimental/sycl_ext_intel_esimd/README.md). [b8e6d23] + +# Bug fixes +### SYCL Compiler + - Fixed a crash happened if an overloaded `new` operator is used in a recursive function in the device code [2085978] [2085978] + - Fixed wrong address space of `event_t` which could lead to builtins like barriers work incorrectly when using HIP backend. [22532c2] + - Fixed an issue with certain macros being unavailable when using a custom host compiler. [652417b] + - Fixed an issue with device code linking when one of the targets is not spir64 based. [1f8874f] + - Disabled part of SimplifyCFG optimizations in SYCL mode which was resuling in invalid optimizations in some cases. [8b29220] + - Silenced "unknown attribute" warning emitted during host part of full `-fsycl` compilation when it saw `[[intel::device_indirectly_callable]]` attribute. [718c0b1] + - Removed incorrect assertion for use of `-fopenmp-new-driver` for multiple inputs. [6e0f6d1] + - Fixed the issue caused by using the `nvvm_reflect` function in the nvptx backend with `-O0`. [537e51b] + - Fixed a regression in cases where function pointers were captured as kernel arguments. [b19e2e4] + - Fixed the libclc remangler to clone functions rather than aliasing to enable DPC++ for CUDA with `-O0`. [7b2fb02] + - Fixed the error "Explicit load/store type does not match pointee type of pointer operand" caused by incorrect address space. [e688fa5] + - Fixed incorrect diagnostic for `__spirv` calls when the `reqd_sub_group_size` attribute is applied on a sycl kernel. [1df7b59] + - Fixed alignment of emulated specialization constants. [0cec3c6] + - Fixed the group collective implementation for AMDGCN. Fixed the `shuffleUp` and `shuffleDown` functions for the AMDGCN builtins and SYCL headers. [d99e957] + - Removed `llvm.nvvm.suq.depth` instruction which was causing `CUDA_ERROR_NOT_FOUND` or `CUDA_ERROR_NOT_SUPPORTED` errors if present in the fatbin. [ec29322] + +### SYCL Library + - Fixed alignment of kernel local arguments in the CUDA backend. [ebb1281] + - Fixed a crash which could happen when bulding program for multiple devices. [64c2d35] + - Fixed max constant value query for the HIP backend. [1e55cf3] + - Fixed ambiguity error with `sycl::oneapi::experimental::this_nd_item`. [8aad52dd6027] + - Fixed a performance issue caused by unnecessary command batching in the Level Zero plugin. [4d031a4] + - Fixed an issue with `sycl::get_pointer_device` API working incorrectly for CUDA and HIP backends [8fa17b4] + - Fixed an issue which might result in JITing for only one device while context is accosiated with multiple devices for Level Zero backend [7068457] + - Fixed namespace ambiguity in `this_id`, `this_item`, and `this_group`. [19369b6] + - Workarounded two bugs in the Level Zero driver related to static linking extension. [2930a94] + - Fixed return type of `get_nativesycl::backend::opencl(event)` from `cl_event` to `vector`. [a2189c6] + - Modified Level Zero plugin support for copy engines to address scenario when main copy engine is not available. [478a576] + - Fixed support for query of USM capabilities. [5941394] + - Fixed memory leak in the USM prefetch functionality. [5d4573f] + - Fixed host device local accessor alignment. [08b14da] + - Fixed `sycl::errc values` for exceptions per SYCL 2020. [270e78d] + - Fixed bug with `constexpr_recurse` usage. [bd15de9] + - Fixed `max_work_group_size` and `reqd_work_group_size` attribute arguments check. [7f37250] + - Fixed iterator debug level mistmatch error on Widnows when building programs with `/MDd` when `libsycl-fallback-cassert.obj` is involved. [93b573a] + - Fixed `get_native()` for `sycl::event` per requirements of the specification. [8878962] + - Fixed device enumeration for the next platforms when current platform doesn't have devices. [0272ec2] + - Fixed thread-safety issue in the scheduler which can appear if command gets cleaned up by another thread while adding a host accessor. [62ca43a] + - Fixed reported device name for HIP backend. [68b089f] + - Fixed `SYCL_PROGRAM_COMPILE_OPTIONS` and `SYCL_PROGRAM_LINK_OPTIONS` to override compile and link options respectively. [7a8fa1a] + - Fixed incorrect handling of queue indexing for Level Zero backend. [1f4c9df] + - Fixed memory leak in the reductions that require additional resources (such as buffers) [9aefea0] + - Defined `get_property/has_property` in the queue for `property::queue::in_order`. [ca9fea6] + - Fixed memory leak in the scheduler for `run_on_host_intel` commands. [86cf56a] + - Fixed thread-safety issue caused by parallel access to the command list cache in the Level Zero plugin. [1f531c0] + - Fixed device code outlining for static local variables to avoid invalid device code generation. [66e207e] + - Fixed dynamic batching in the Level Zero plugin. [d6f115c] + - Fixed unsigned long warning in fallback cstring on Windows. [5fbe02b] + - Fixed sync of host task vs kernel for in-order queue. [c7ba937] + - Fixed include dependency in `fpga_lsu.hpp` and `pipes.hpp` headers. [ac6a4f5] + - Fixed kernel timestamp calculation in the Level Zero plugin. [c228f12] [7efb3e6] [76a3898] + - Fixed usage of copy-engines in the Level Zero interoperability queue. [847f8b6] + - Fixed kernel execution hang under large memory consumption by workarounding a bug in the Level Zero runtime. [b831bd0] + - Fixed the Level Zero plugin to honor `property::queue::enable_profiling`. [fb27c65] + - Fixed memory leak which existed when program build failed for the Level Zero backend. [beb7277] + - Fixed buffer creation from rvalue iterator. [a905a27] + - Fixed `queue::device_has()` to private. [279ef0d] + - Fixed crash for case when a device image has no kernels. [279ef89] + - Fixed dependency between host/device actions for unbundled FPGA specific archives. [9699575] + - Fixed PI CUDA plugin to avoid linking against libsycl which may cause issues on some systems. [6821e66] + - Fixed support of `bfloat16` for CUDA. [5231fe4] + - Fixed `interop_handle::get_native_mem` so that it can work with accessors that use non-empty accessor_property_list. [5452a5d] + - Fixed sub-device count calculation for numa partitioning. [ae284f1] + - Fixed `SYCL_ENABLE_PLUGINS` to enable both the OpenCL and the Level Zero PI plugins if it is unset. [ff384bb] + - Fixed BDF format on PCI query for the Level Zero backend. [ad6253e] + - Fixed `sycl::queue` XPTI instrumentation. [ec57cd7] + - Fixed interoperability return type for `sycl::buffer` to `std::vector` per SYCL 2020. [8b3c8c4] + - Fixed `SYCL_DUMP_IMAGES` handling to also dump when spec constants are on. [211ccda] + - Fixed failure in case of using zero-size local accessor on some backends. [1292532] + - Fixed flaky bug which might appear in multi-threaded applications with simultaneous access to the cache of device lib programs. [92cfd53] + - Fixed compfail issue with `-ffast-math` on CUDA backend. [90ac3ee] + - Fixed make_queue interoperability API for the Level Zero to accept device argument to properly associate queue with the right device. [29a5369] + - Fixed invalid handler issue by updating OpenCL ICD loader from community. + - Fixed "undefined symbol" error for `ldexpf`, `hypotf`, `frexpf` on SYCL GPU device when using 3rd-party math headers instead of MSVC math headers on Windows.[476a351] + - Fixed memory leak for interop events created from native handle. [3c1d342] + - Fixed alignment of the memory returned from USM allocation functions. [3114f02] + - Fixed Vendor ID for AMD devices. [a1b42aa] + - Fixed sporadic failure of in-order queue due to non-closed batch on the Level Zero backend. [e8bff05] + - Fixed possible deadlock in case of having dependent events from different queues in a multi-threaded application. [4c619e9] + - Fixed issue with delivery of assert message before abort'ing [6a32706] + - Fixed default value for the `Alignment` template parameter of the usm_allocator. [3a91cec] + - Fixed API to get maximum width/height/depth of an image for the Level Zero backend. [9ecc74b] + - Fixed runtime for CUDA backend to ignore usm mem advise and warn if device doesn't meet requirements. [ccaaa99] + +### Tools + - Fixed sycl-post-link tool to properly handle the offset in specialization constant descriptors. [12d7c1f] + - Fixed sycl-post-link tool to properly handle the padding at the end of composite types. [12d7c1f] + - Fixed translation of `Vector[Extract/Insert]Dynamic` instructions in llvm-spirv [bf43d7b] + - Fixed unconditional debug info generation for `libsycl_profiler_collector.so` [45784cd] + - Fixed sycl-post-link failure caused by incorrect removal of `llvm.used` in the case when specialization constant has 2+ users. [80e9148] + +### Documentation + - Removed documentation for `[[intel::reqd_work_group_size]]`. [28ffda1] + - Removed extension to set kernel cache configuration. [159a516] + - Disallowed `[[sycl_detail::uses_aspects()]]` attribute on type aliases in [OptionalDeviceFeatures](doc/design/OptionalDeviceFeatures.md). [df83271] + - Moved `properties` and property-related APIs into `sycl::ext::oneapi::experimental`. [`sycl_ext_oneapi_properties`](doc/extensions/experimental/sycl_ext_oneapi_properties.asciidoc) specification was updated to revision 2. [33fdc58][aacf541] + - Updated [`sycl_ext_oneapi_kernel_properties` extension](doc/extensions/proposed/sycl_ext_oneapi_kernel_properties.asciidoc). + - Fixed documentation of `__builtin_sycl_unique_stable_id` in [clang/docs/LanguageExtensions](https://github.com/intel/llvm/blob/sycl/clang/docs/LanguageExtensions.rst). [72ca49c] + - Aligned [`sycl_ext_intel_kernel_args_restrict` extension](doc/extensions/supported/sycl_ext_intel_kernel_args_restrict.asciidoc) extension with SYCL 2020. [4a794df] + +# API/ABI breakages + - Removed deprecated API from ESIMD headers. [ec0385d] + - Renamed `wi_slice` to `wi_data`. [f364e18] + - Renamed `nbarrier_*` API to `named_barrier_*` for ESIMD. [5023657] + - Moved a part of ESIMD APIs outside of experimental namespace. [c557d78][b2ee289] + - Moved `bfloat16` from `intel` namespace to `oneapi` namespace. [5231fe4] + +# Known issues + - Having MESA OpenCL implementation which provides no devices on a + system may cause incorrect device discovery. As a workaround such an OpenCL + implementation can be disabled by removing `/etc/OpenCL/vendor/mesa.icd`. + - Compilation may fail on Windows in debug mode if a kernel uses + `std::array`. This happens because debug version of `std::array` in + Microsoft STL C++ headers calls functions that are illegal for the device + code. As a workaround the following can be done: + 1. Dump compiler pipeline execution strings by passing `-###` option to the + compiler. The compiler will print the internal execution strings of + compilation tools. The actual compilation will not happen. + 2. Modify the (usually) first execution string (it should have + `-fsycl-is-device` option) by adding + `-D_CONTAINER_DEBUG_LEVEL=0 -D_ITERATOR_DEBUG_LEVEL=0` options to the + end of the string. Execute all string one by one. + - `-fsycl-dead-args-optimization` can't help eliminate offset of + accessor even though it's created with no offset specified + - SYCL 2020 barriers show worse performance than SYCL 1.2.1 do [18c80fa] + - When using fallback assert in separate compilation flow it requires explicit + linking against `lib/libsycl-fallback-cassert.o` or + `lib/libsycl-fallback-cassert.spv` + - Limit alignment of allocation requests at 64KB which is the only alignment + supported by Level Zero[7dfaf3bd] + - On the following scenario on Level Zero backend: + 1. Kernel A, which uses buffer A, is submitted to queue A. + 2. Kernel B, which uses buffer B, is submitted to queue B. + 3. `queueA.wait()`. + 4. `queueB.wait()`. + DPCPP runtime used to treat unmap/write commands for buffer A/B as host + dependencies (i.e. they were waited for prior to enqueueing any command + that's dependent on them). This allowed Level Zero plugin to detect that + each queue is idle on steps 1/2 and submit the command list right away. + This is no longer the case since we started passing these dependencies in an + event waitlist and Level Zero plugin attempts to batch these commands, so + the execution of kernel B starts only on step 4. The workaround restores the + old behavior in this case until this is resolved [2023e10d][6c137f87]. + - User-defined functions with the name and signature matching those of any + OpenCL C built-in function (i.e. an exact match of arguments, return type + doesn't matter) can lead to Undefined Behavior. + - A DPC++ system that has FPGAs installed does not support multi-process + execution. Creating a context opens the device associated with the context + and places a lock on it for that process. No other process may use that + device. Some queries about the device through device.get_info<>() also + open up the device and lock it to that process since the runtime needs + to query the actual device to obtain that information. + - The format of the object files produced by the compiler can change between + versions. The workaround is to rebuild the application. + - Using `sycl::program`/`sycl::kernel_bundle` API to refer to a kernel defined + in another translation unit leads to undefined behavior + - Linkage errors with the following message: + `error LNK2005: "bool const std::_Is_integral" (??$_Is_integral@_N@std@@3_NB) already defined` + can happen when a SYCL application is built using MS Visual Studio 2019 + version below 16.3.0 and user specifies `-std=c++14` or `/std:c++14`. + - Printing internal defines isn't supported on Windows [50628db] + # December'21 release notes Release notes for commit range 23ca0c2..27f59d8