Skip to content

[SYCL][XPTI] Pass plugin information to subscribers #4121

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jul 28, 2021

Conversation

alexbatashev
Copy link
Contributor

@alexbatashev alexbatashev commented Jul 16, 2021

This patch makes the following additional information available to XPTI subscribers.
All streams:

  • Actual major and minor versions of SYCL runtime (instead of dummy values) as well as their string variant.

sycl.pi.debug stream:

  • Backend type, which is defined as a uint8_t value of sycl::backend enum.
  • Pointer to PI plugin to provide some degree of application flow variance (e.g. query additional info about device, USM pointers, memory, etc).

@alexbatashev alexbatashev force-pushed the xpti_sycl_version_backend_plugin branch from 1ca3f66 to 8060a4e Compare July 16, 2021 18:34
@alexbatashev alexbatashev requested a review from tovinkere July 16, 2021 18:42
@alexbatashev alexbatashev marked this pull request as ready for review July 16, 2021 18:43
@alexbatashev alexbatashev requested review from smaslov-intel and a team as code owners July 16, 2021 18:43
@smaslov-intel
Copy link
Contributor

Backend type, which is defined as a uint8_t value of sycl::backend enum.

This creates new interface dependency between components, e.g. you can't reorder backend enum values now.
Why is this needed? Can you instead pass a string that identifies the backend?

Pointer to PI plugin to provide some degree of application flow variance (e.g. query additional info about device, USM pointers, memory, etc).

Is this normal for XPTI model to cross shared objects boundaries with some internal pointers? Shouldn't that come at least with some pointer lifetime management, such that it is not released before XPTI readers are done? Again, what is the motivation of this change?

@alexbatashev
Copy link
Contributor Author

This creates new interface dependency between components, e.g. you can't reorder backend enum values now.
Why is this needed? Can you instead pass a string that identifies the backend?

It’s part of SYCL ABI, you can’t reorder them anyway. As for strings, they can be expensive, if you need to do comparisons.

Is this normal for XPTI model to cross shared objects boundaries with some internal pointers? Shouldn't that come at least with some pointer lifetime management, such that it is not released before XPTI readers are done? Again, what is the motivation of this change?

I don’t see why you’d need to save pointer to plugin somewhere. And it is guaranteed to be valid throughout callback lifetime, because they’re synchronous. Same thing is true about any other callback argument.

My motivation is that memory transfers are asynchronous, and I need them to be synchronous in order to be able to store memory content. Thus, I put extra waits in callbacks.

@smaslov-intel
Copy link
Contributor

It’s part of SYCL ABI, you can’t reorder them anyway. As for strings, they can be expensive, if you need to do comparisons.

Why do you need backend ID at all in XPTI stream receiver?

And it is guaranteed to be valid throughout callback lifetime, because they’re synchronous. Same thing is true about any other callback argument.

Good point about other pointer arguments, so at least you aren't making things worse than they are already.

My motivation is that memory transfers are asynchronous, and I need them to be synchronous in order to be able to store memory content. Thus, I put extra waits in callbacks.

Can you point me to where you are doing so? I am curious how do you identify what to wait for, and avoid deadlocks (e.g. when some copies are dependent on some others, and waiting for them may cause no forward progress, I think).

@alexbatashev
Copy link
Contributor Author

Why do you need backend ID at all in XPTI stream receiver?

Seems like a useful information for debugging purpose. Applications may use more than one platform. Knowledge of particular backend, that executes PI call may help here.

Can you point me to where you are doing so? I am curious how do you identify what to wait for, and avoid deadlocks (e.g. when some copies are dependent on some others, and waiting for them may cause no forward progress, I think).

https://github.com/alexbatashev/dpcpp_trace/blob/c676824b3a5ad4da091dd639ae070d63c15bb3cb/lib/plugin_record/record_handler.cpp#L268

Scheduler wouldn't enqueue a command, unless all dependencies are satisfied. Thus, I don't expect deadlocks.

@smaslov-intel
Copy link
Contributor

Knowledge of particular backend, that executes PI call may help here.

Why not defer adding it until we know how to use it? Also, if we wanted it, I think we should have this info in the "pi_plugin" structure, pointer to which you are passing already.

Scheduler wouldn't enqueue a command, unless all dependencies are satisfied.

Really? Aren't we using events wait-lists for drivers to track/resolve dependencies?
Most PI enqueue calls have them:

const size_t *local_work_size, pi_uint32 num_events_in_wait_list,

@alexbatashev
Copy link
Contributor Author

Why not defer adding it until we know how to use it? Also, if we wanted it, I think we should have this info in the "pi_plugin" structure, pointer to which you are passing already.

I can remove it if nobody sees any use for it. Right now all I do with that argument is print it on the screen for debugging purpose. Maybe @tovinkere can suggest other usages here.

What I'd like to avoid, though, is messing with pi_plugin structure. Since it's part of cross-library interface, it would be great to have some degree of stability here.

Really? Aren't we using events wait-lists for drivers to track/resolve dependencies?
Most PI enqueue calls have them:

I don't see much use for those arguments. What if our dependency is a host task? Or an event from another plugin? So, we have to wait for particular events anyway. And even if we use this parameter somewhere, waiting on the output event will also wait on this list of events. So, it should be safe from that point of view.

Besides, there're other usages for additional PI calls. For example, one may want to query pointer type to figure out data flow direction in memcpy.

…ackend_plugin

* upstream/sycl: (26 commits)
  [SPIR-V][NFC] Move non-upstreamed FuncParam decorations into internal:: (intel#4138)
  [SYCL] Move free function queries to experimental namespace (intel#4090)
  [SYCL][XPTI] Enable PI calls notifications with arguments (intel#4148)
  [SYCL] Revert queue::wait() to its old behaviour with Level Zero (intel#4153)
  [SYCL] Add missing <cstring> header to spirv.hpp (intel#4157)
  [SYCL] Adds info query for atomic_memory_order_capabilities on device and context (intel#4105)
  [SYCL] Improve performance of generic shuffles (intel#3815)
  [SYCL] Fix the error with namespaces caused during rebase of intel#4014 (intel#4151)
  [ESIMD] Fix 'ambiguous operator' error with length 1 simd operands (intel#4149)
  [libdevice][NFC] Fix libdevice dependencies list (intel#4130)
  [SPIR-V] Reland Encode debug info producer in SPIR-V (intel#4082)
  [SYCL][ROCm] Add ROCm support to get_device_count_by_type (intel#4113)
  [SYCL] Fix sRGB device info (intel#4145)
  [SYCL][ROCm] Fix kernel launch with multiple dimensions (intel#4063)
  [SYCL][ROCm] Fix compilation for AMD GPU with -fsycl-dead-args-optimization (intel#4126)
  [SYCL][Level Zero] Enable multi-CCS support. (intel#4038)
  [SYCL] Pass bound arch to unbundler (intel#4112)
  [ESIMD][doc] Added documentation for some ESIMD math APIs (intel#3995)
  [ESIMD] rename gather4/scatter4 to gather_rgba/scatter_rgba (intel#4120)
  [SYCL][NFC] Remove unused variable. (intel#4131)
  ...
@alexbatashev
Copy link
Contributor Author

@smaslov-intel what would be the final decision here? Should I remove backend field? If yes, should I simply pass pi_plugin as user data?

@smaslov-intel
Copy link
Contributor

@smaslov-intel what would be the final decision here? Should I remove backend field? If yes, should I simply pass pi_plugin as user data?

Yes, please remove it and just pass the pi_plugin.

…ackend_plugin

* upstream/sycl: (755 commits)
  [SYCL] Add operator= to atomic_ref specializations (intel#4183)
  [SYCL] Make spelling of Debug value for CMAKE_BUILD_TYPE variable case insensitive (intel#4069)
  [SYCL][LIBCLC] Add atan and cbrt for amdgcn-amdhsa (intel#4180)
  [SYCL][CUDA] Correctly free managed memory (intel#4181)
  [SYCL] Revert barrier deprecation note (intel#4162)
  [SYCL][FPGA] Refactor of statement attributes (intel#4136)
  [Driver][SYCL] Enable way to emit int-footer source to a specific dir (intel#4167)
  [Driver] Fix default MSVC version setting for -fms-compatibilty-version (intel#4165)
  [BuildBot] Add llvm-enable-projects flag to configure.py (intel#4169)
  [Driver][SYCL][FPGA] Improve aocx archive processing for FPGA (intel#4160)
  [SYCL] Correct int-header emission with type aliases
  [SYCL] Fix name collisions in SYCL enums (intel#4154)
  [SYCL] Return the correct status info for host_task event (intel#4161)
  [ESIMD][NFC] Added tests for simd class type traits (intel#4146)
  [SYCL][ROCm] Fix missing parameter in ROCm plugin (intel#4166)
  [SYCL][L0] Add temporary option to allow user to use copy engine for device to device copy (intel#4127)
  Remove check for AMD HIP to fix Driver/cuda-arch-translation.cu
  Reapply after conflict resolution 418a6d6 "Fix nvptx_target_teams_distribute_parallel_for_simd_codegen failure"
  Revert "[SYCL] Removes redefinitions of macros in libclc (intel#3505)"
  [PGO] Change test-run line to check NewPM pass behavior
  ...
@bader bader merged commit 2af0599 into intel:sycl Jul 28, 2021
@alexbatashev alexbatashev deleted the xpti_sycl_version_backend_plugin branch August 3, 2021 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants