-
Notifications
You must be signed in to change notification settings - Fork 125
Ensure UR will clear contexts before unloading on windows #2024
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure UR will clear contexts before unloading on windows #2024
Conversation
dced7c2
to
bc14e90
Compare
bc14e90
to
39ffb59
Compare
df47657
to
70d5e20
Compare
70d5e20
to
569d0ca
Compare
569d0ca
to
38450c4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A simpler workaround would be for the kernel object to hold a reference to the adapter. This way, the adapter won't delete state until the kernel object is deleted.
If the problem is that the kernel object is used after UR is destroyed with urTeardown
, then this is unsolvable in the general case in UR, unless we want to hold a virtual reference to the UR with every object handle. Which is a potential solution, but a fairly expensive one in my opinion, since every object access is likely to go through a shared atomic counter (which is a false-sharing problem).
I wouldn't want us to now be making arduous workarounds like this one for application-level problems. Can't this be solved more easily in SYCL or the application itself?
The problem is purely that UR adapter is unloaded before clearing all intel/llvm side objects, this is because sycl-rt will only start the destruction on a call to destruct sycl.dll on DllMain here. windows unload the dependencies first before calling DllMain process unloading. So, UR adapter dll will be unloaded before even sycl DllMain destruction is called. So, if we tried at sycl-rt side to clean the default context, this will result in access violation problems. This ofc doesn't stop at sycl-rt but also UR dependencies is still same problem. umf for example had the same problem in UR but luckily umf is using urInit/urTearDown and uses a refcount and only unloads when it reaches 0. That is what have been fixed in this PR: #2007 . The problem was not consistent as it depends how fast windows have unloaded UR to how much it would call the default context destruction. NOTE: the default context destruction is not currently happening on windows and it is let to be a memory leak. Destructing it manually is what leads to this. The proposed solution here is to use DllMain in UR L0 adapter, cache any created contexts/kernels and let it clean any context/kernels left before it unloads. I choosed this context object as it holds the usm pools so it is important to clean that, and kernels needed to fix some tests so also need to be cleaned. a perfect solution here would be that applications would control that it won't unload the dlls like UR before it clean all UR objects. but didn't find a way to do that on windows at the current moment thats why I made this workaround as a temporary solution for this. |
I had to ensure |
Is this a regression after PI removal? If so, how did this work before?
From what I can tell, the flow here causes Or you mean that What I'm concerned about with your patch is that it is a specialized workaround for one specific problem. This would be a whack-a-mole game of creating similar workarounds for other functions when someone submits a bug for their app that just happens to trigger a similar problem but for a different adapter/function. |
SYCL-RT starts the destruction flow on
when we reach the forth point, the adapter is already unloaded so calling any UR function will give access violation as it is calling at this point a function pointer that is removed. but that is also not happening momentarily, as unloading the dll is taking time so it depends how fast the adapter dll is unloaded, this access violation happens. So at the point of calling |
I think this problem was one of two scenarios :
|
This was replaced by a more proper fix here: intel/llvm#15262 |
UR adapter is unloaded before clearing all intel/llvm side objects, this is because sycl-rt will only start the destruction on a call to destruct sycl.dll on DllMain here. windows unload the dependencies first before calling DllMain process unloading. So, UR adapter dll will be unloaded before even sycl DllMain destruction is called. So, if we tried at sycl-rt side to clean the default context, this will result in access violation problems.
The proposed solution here is to use DllMain in UR L0 adapter, cache any created contexts/kernels and let it clean any context/kernels left before it unloads. I choosed this context object as it holds the usm pools so it is important to clean that, and kernels needed to fix some tests so also need to be cleaned.