Skip to content

[SYCL] Don't select devices with no available images #6203

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jun 7, 2022

Conversation

npmiller
Copy link
Contributor

This patch solves:

And is related to:

In some cases the current selector may select a device for which we don't have an AOT or SPIR-V binary for, this patch ensures that such devices get skipped.

@npmiller npmiller requested a review from a team as a code owner May 26, 2022 16:52
@npmiller npmiller requested a review from cperkinsintel May 26, 2022 16:52
@@ -1450,6 +1450,26 @@ kernel_id ProgramManager::getSYCLKernelID(const std::string &KernelName) {
return KernelID->second;
}

bool ProgramManager::hasCompatibleImage(const device &Dev) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe instead of running two loops and creating a temporary set, would it be possible to use std::find or std::any_of on the DeviceImages collection ? That might be easier to read and quicker.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with a slightly easier to use map and use of std::any_of

@npmiller
Copy link
Contributor Author

So after having another look at this I think it's actually incorrect to reject a device if it doesn't have an image, first off as the lit tests show it breaks sycl-ls since that uses device selectors but obviously doesn't have any device images.

I'll re-work this patch to take availability of device image into account in the device scores instead of just rejecting the device.

@npmiller
Copy link
Contributor Author

npmiller commented May 27, 2022

So this turned into a bit of a larger refactoring of the device selector, here's the main changes:

  • Removed the initial score of 1000 when SYCL_DEVICE_FILTER is set. This is unnecessary as SYCL_DEVICE_FILTER filters the list of devices itself, so we can just use regular scoring on the already filtered list of devices
  • Replaced isDeviceOfPreferredSyclBe with a new getDevicePreference, which returns a "preference" score to add to the device score, this preference score includes preference of the level_zero backend, as well as a strong preference for devices that have images in the binary.
  • Remove use of isDeviceOfPreferredSyclBe as tie-breaker in select_device, this was redundant because it was already used to boost the score of preferred devices, so it should be included in the score.

@npmiller
Copy link
Contributor Author

I misunderstood the purpose of the tie-breaker, it's not helpful for the standard selectors but it is for custom selectors, so I re-introduced the tie-breaking logic, but this time using the device preference score to tie-break. So a device with an image would be preferred over a level zero device with no image for example.

cperkinsintel
cperkinsintel previously approved these changes May 31, 2022
Copy link
Contributor

@cperkinsintel cperkinsintel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

// device::get_devices returns filtered list of devices.
// Keep 1000 for default score when filters were applied.
if (FilterList)
Score = 1000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this removed? I am not completely sure, but it seems like this tries to ensure that no device is rejected if the user applied a filter. Maybe we could simply change it to a += 1 to ensure this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the history I got the feeling that it was more of a leftover from a time where the default selector was actually using the environment variable to do some filtering, and it felt a bit pointless since if there is a filter all the devices here would start at 1000 score, last PR changing this: #5349

You make a good point about a filtered device potentially being rejected, although I don't think it can happen at the moment since all defined device types are covered and add score, but what would you think about just starting the score for the default selector at 0 rather than REJECT_DEVICE_SCORE, instead of restoring this? I don't think it really makes sense for the default selector to reject any devices.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the patch as I suggested, let me know what you think

Copy link
Contributor

@bso-intel bso-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default_selector can throw an exception if no device is available to satisfy the SYCL_DEVICE_FILTER.
https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md

@npmiller
Copy link
Contributor Author

npmiller commented Jun 6, 2022

Yes and that still happens, SYCL_DEVICE_FILTER affects the list of devices, whereas the device selectors only score and rank the available devices, with the possibility of rejecting them.

If there is no device that satisfies SYCL_DEVICE_FILTER, the list of devices will be empty to begin with, so the default_selector won't have anything to score, and it will correctly throw a device not found exception.

Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM. @bso-intel approval required.

Copy link
Contributor

@bso-intel bso-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense. If SYCL_DEVICE_FILTER is set, no device that does not satisfy the filter can be created.
So, no additional score for filter is needed.
LGTM.

@pvchupin pvchupin merged commit 0e67db8 into intel:sycl Jun 7, 2022
npmiller added a commit to npmiller/llvm that referenced this pull request Apr 9, 2025
* Fix links to configure.py and compile.py
  * Linking to the file in tree caused the link in the docs to just
    download the python script. It makes more sense to link to the
    github web UI for these as they should be used in a checkout anyway.
* Remove CUDA requiring device selector
  * Since intel#6203 the device selector
    should handle this case well.
* Update HIP backend limitations
  * HIP is no longer in beta
  * Windows isn't supported but intel#17702
    made the build work with it so it might work for some users.
  * Global offset has been supported since intel#5855
  * Add common limitations
* Update HIP for Nvidia section
  * This might work but is not supported
* Update HIP section
  * Recommended HIP version + testing platforms
  * HIP is no longer in beta
* Add note on target aliases for CUDA and HIP
kbenzie pushed a commit that referenced this pull request Apr 28, 2025
* Fix links to configure.py and compile.py
* Linking to the file in tree caused the link in the docs to just
download the python script. It makes more sense to link to the github
web UI for these as they should be used in a checkout anyway.
* Remove CUDA requiring device selector
* Since #6203 the device selector
should handle this case well.
* Update HIP backend limitations
  * HIP is no longer in beta
* Windows isn't supported but #17702
made the build work with it so it might work for some users.
* Global offset has been supported since
#5855
  * Add common limitations
* Update HIP for Nvidia section
  * This might work but is not supported
* Update HIP section
  * Recommended HIP version + testing platforms
  * HIP is no longer in beta
* Add note on target aliases for CUDA and HIP
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants