-
Notifications
You must be signed in to change notification settings - Fork 797
[SYCL] Don't select devices with no available images #6203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -1450,6 +1450,26 @@ kernel_id ProgramManager::getSYCLKernelID(const std::string &KernelName) { | |||
return KernelID->second; | |||
} | |||
|
|||
bool ProgramManager::hasCompatibleImage(const device &Dev) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe instead of running two loops and creating a temporary set, would it be possible to use std::find
or std::any_of
on the DeviceImages collection ? That might be easier to read and quicker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated with a slightly easier to use map and use of std::any_of
So after having another look at this I think it's actually incorrect to reject a device if it doesn't have an image, first off as the lit tests show it breaks I'll re-work this patch to take availability of device image into account in the device scores instead of just rejecting the device. |
So this turned into a bit of a larger refactoring of the device selector, here's the main changes:
|
I misunderstood the purpose of the tie-breaker, it's not helpful for the standard selectors but it is for custom selectors, so I re-introduced the tie-breaking logic, but this time using the device preference score to tie-break. So a device with an image would be preferred over a level zero device with no image for example. |
Co-authored-by: Joe Todd <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
// device::get_devices returns filtered list of devices. | ||
// Keep 1000 for default score when filters were applied. | ||
if (FilterList) | ||
Score = 1000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this removed? I am not completely sure, but it seems like this tries to ensure that no device is rejected if the user applied a filter. Maybe we could simply change it to a += 1
to ensure this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the history I got the feeling that it was more of a leftover from a time where the default selector was actually using the environment variable to do some filtering, and it felt a bit pointless since if there is a filter all the devices here would start at 1000 score, last PR changing this: #5349
You make a good point about a filtered device potentially being rejected, although I don't think it can happen at the moment since all defined device types are covered and add score, but what would you think about just starting the score for the default selector at 0
rather than REJECT_DEVICE_SCORE
, instead of restoring this? I don't think it really makes sense for the default selector to reject any devices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the patch as I suggested, let me know what you think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default_selector can throw an exception if no device is available to satisfy the SYCL_DEVICE_FILTER.
https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md
Yes and that still happens, If there is no device that satisfies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM. @bso-intel approval required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense. If SYCL_DEVICE_FILTER is set, no device that does not satisfy the filter can be created.
So, no additional score for filter is needed.
LGTM.
* Fix links to configure.py and compile.py * Linking to the file in tree caused the link in the docs to just download the python script. It makes more sense to link to the github web UI for these as they should be used in a checkout anyway. * Remove CUDA requiring device selector * Since intel#6203 the device selector should handle this case well. * Update HIP backend limitations * HIP is no longer in beta * Windows isn't supported but intel#17702 made the build work with it so it might work for some users. * Global offset has been supported since intel#5855 * Add common limitations * Update HIP for Nvidia section * This might work but is not supported * Update HIP section * Recommended HIP version + testing platforms * HIP is no longer in beta * Add note on target aliases for CUDA and HIP
* Fix links to configure.py and compile.py * Linking to the file in tree caused the link in the docs to just download the python script. It makes more sense to link to the github web UI for these as they should be used in a checkout anyway. * Remove CUDA requiring device selector * Since #6203 the device selector should handle this case well. * Update HIP backend limitations * HIP is no longer in beta * Windows isn't supported but #17702 made the build work with it so it might work for some users. * Global offset has been supported since #5855 * Add common limitations * Update HIP for Nvidia section * This might work but is not supported * Update HIP section * Recommended HIP version + testing platforms * HIP is no longer in beta * Add note on target aliases for CUDA and HIP
This patch solves:
And is related to:
In some cases the current selector may select a device for which we don't have an AOT or SPIR-V binary for, this patch ensures that such devices get skipped.