Skip to content

[SYCL] Merge USMAllocContext for Sub-Devices and Sub-Sub-Devices #8012

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jan 27, 2023

Conversation

uditagarwal97
Copy link
Contributor

@uditagarwal97 uditagarwal97 commented Jan 13, 2023

Problem:
Currently, sub-devices and sub-sub-devices have different USM allocation contexts (USMAllocContext) but they share the same ze_device_handle. This causes issues with USM memory allocation, for instance, double-free.

Proposed Fix:
I have merged the USMAllocContext of the sub-device and sub-sub-device.

Fix double free bug in USM (de)allocation for sub-sub-devices on PVC
@uditagarwal97 uditagarwal97 requested a review from a team as a code owner January 13, 2023 22:09
@uditagarwal97 uditagarwal97 changed the title Merge USMAllocContext for Sub-Devices and Sub-Sub-Devices [SYCL] Merge USMAllocContext for Sub-Devices and Sub-Sub-Devices Jan 13, 2023
@uditagarwal97 uditagarwal97 temporarily deployed to aws January 13, 2023 22:36 — with GitHub Actions Inactive
@uditagarwal97 uditagarwal97 temporarily deployed to aws January 13, 2023 23:11 — with GitHub Actions Inactive
@uditagarwal97
Copy link
Contributor Author

The issue is reproducible on the existing test (the one being updated in intel/llvm-test-suite#1515).
We've only seen it internally because the project's CI doesn't use PVC machines currently.

@uditagarwal97 uditagarwal97 temporarily deployed to aws January 16, 2023 18:40 — with GitHub Actions Inactive
@uditagarwal97 uditagarwal97 temporarily deployed to aws January 17, 2023 00:45 — with GitHub Actions Inactive
@smaslov-intel
Copy link
Contributor

Problem: Currently, sub-devices and sub-sub-devices have different USM allocation contexts (USMAllocContext) but they share the same ze_device_handle. This causes issues with USM memory allocation, for instance, double-free.

Proposed Fix: I have merged the USMAllocContext of the sub-device and sub-sub-device.

should we just avoid freeing the context of sub-sub-devices since they do not carry unique L0 device handle?

Tag @igchor , the new owner of USM allocators

@igchor
Copy link
Member

igchor commented Jan 25, 2023

Problem: Currently, sub-devices and sub-sub-devices have different USM allocation contexts (USMAllocContext) but they share the same ze_device_handle. This causes issues with USM memory allocation, for instance, double-free.

Proposed Fix: I have merged the USMAllocContext of the sub-device and sub-sub-device.

If the ze_device_handle is the same why not just use it as a key for DeviceMemAllocContexts, SharedMemAllocContexts and SharedReadOnlyMemAllocContexts instead of pi_device? This way you wouldn't have to create any shared ptrs - you could just call insert on unordered_map and if the key already exists, do nothing.

@smaslov-intel
Copy link
Contributor

Problem: Currently, sub-devices and sub-sub-devices have different USM allocation contexts (USMAllocContext) but they share the same ze_device_handle. This causes issues with USM memory allocation, for instance, double-free.
Proposed Fix: I have merged the USMAllocContext of the sub-device and sub-sub-device.

If the ze_device_handle is the same why not just use it as a key for DeviceMemAllocContexts, SharedMemAllocContexts and SharedReadOnlyMemAllocContexts instead of pi_device? This way you wouldn't have to create any shared ptrs - you could just call insert on unordered_map and if the key already exists, do nothing.

Sounds good to me!

@uditagarwal97 uditagarwal97 temporarily deployed to aws January 26, 2023 23:18 — with GitHub Actions Inactive
@uditagarwal97
Copy link
Contributor Author

Thanks for the feedback, @igchor and @smaslov-intel. I have made the suggested changes.

@uditagarwal97 uditagarwal97 temporarily deployed to aws January 27, 2023 00:09 — with GitHub Actions Inactive
@uditagarwal97 uditagarwal97 temporarily deployed to aws January 27, 2023 00:55 — with GitHub Actions Inactive
Comment on lines 797 to 799
if (SubDevice->isCCS())
// CCS share USMAllocContext with its root device.
continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this intrinsic check here? I thought we would just add an USM allocator for each L0 device. For PI devices that don't have unique L0 device (like this CCS or sub-sub-devices) the allocator wouldn't be added since it would already be there (from sub-device)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I've removed the intrinsic check and instead added an extra comment to explicitly state the sharing of USM allocators between CCS and subdevices.

@uditagarwal97 uditagarwal97 temporarily deployed to aws January 27, 2023 04:47 — with GitHub Actions Inactive
@uditagarwal97 uditagarwal97 temporarily deployed to aws January 27, 2023 05:19 — with GitHub Actions Inactive
Copy link
Contributor

@smaslov-intel smaslov-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@igchor
Copy link
Member

igchor commented Jan 27, 2023

LGTM

@uditagarwal97
Copy link
Contributor Author

Failure in SYCL :: dword_atomic_smoke.cpp is known #8098

@uditagarwal97
Copy link
Contributor Author

@intel/llvm-gatekeepers The PR is ready!

@bader bader merged commit 86511c5 into intel:sycl Jan 27, 2023
@uditagarwal97 uditagarwal97 deleted the fix_usm_alloc_ctx_for_ccs branch January 27, 2023 23:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants