-
Notifications
You must be signed in to change notification settings - Fork 795
[SYCL] Fix kernel program cache for multiple devices and refactor some unit tests #5017
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: mdimakov <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The patch looks good.
@maximdimakov , could you, please, add regression test in unit-tests or intel/llvm-test-suite?
/verify with intel/llvm-test-suite#594 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The patch, doesn't pass the test.
There is problem in the test that it can't create multiple devices. I am investigating to it |
@maximdimakov , I suggest having a unit-test for this change as it doesn't require specific H/W or its features. |
deaa97c
@bader unit test passes checks |
Good, but I'm expected you to fix llvm-test-suite tests you added for this patch. |
@bader I closed llvm-test-suite PR |
Why? I don't think merging untested patch is a good idea. |
@bader I replaced test in test suite by unit test. I can emulate multiple devices with CreateMultipleRootDevices environment variable locally on machine, but Jenkins precommit machine can't. I think it is better to write unittest that doesn't depend on machine on which it runs. |
Withdraw my request for changes and let Chris and Sergei to validate how well the patch is covered by the unittest.
|
||
class MultTestKernel { | ||
public: | ||
void operator()(cl::sycl::item<1>){}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a const operator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this operator
devices[0] = reinterpret_cast<pi_device>(1111); | ||
devices[1] = reinterpret_cast<pi_device>(2222); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should only be performed if devices
is not nil.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And not just not nullptr. If I'm not mistaken, the maximum possible device entries is bound by num_entries
so the subscripts should check to make sure they are less than num_entries. Though, in the case of mocks and unit tests, it's less clear to me where this value for num_entries
comes from and if it is a problem to blindly set devices[1]
.
Would this test fail if run with the SYCL_DEVICE_FILTER
set to a single device? (And, do we even care?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added check for nullptr and for num_entries.
piDevicesGet
first time is called with num_entries=0
and not null num_devices
. Ather the call num_entries
takes value of num_devices
. With second call num_devices
parameter passes with nullptr, so we can create num_entries
devices. I rewrite num_devices
to 2, so when redefinedDevicesGet
is called a second time RT thinks that we have two devices.
This test emulates two devices, so we don't care about real count of devices.
EXPECT_EQ(KernelCache.size(), (size_t)2) << "Expect 2 kernels in cache"; | ||
} | ||
// Cache is cleared here, check kernel release | ||
EXPECT_EQ(KernelReleaseCounter, 3) << "Expect 3 piKernelRelease calls"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, add comment describing how number 3
was retrieved. Why is it expected and not 2 or four?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And why not test to see if KernelReleaseCounter is equal to the RetainCounter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added clarifying comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And why not test to see if KernelReleaseCounter is equal to the RetainCounter?
AFAIR, there's some mem-leak test which checks for retain-release parity.
Although, I'm not sure it checks for kernels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@cperkinsintel, ping |
Apply LLVM's coding style rule - include as little as possible. https://llvm.org/docs/CodingStandards.html#include-as-little-as-possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
When multiple devices are used in the same context, the problem of piProgram double free is arised.
In program_manager::build() created program is shared with multiple devices and is cached in KernelProgramCache as many times as there are devices. When cache is destructed, the same program will be released for every device, but piProgramRetain was called only for one device, which leads to double free. Adding piProgramRetain for each device solves this problem.
Deleting kernel from cache after its release prevents double free in the case a kernel was created for a program that shared with multiple devices.
Unittest with emulating multiple gpu is added.
Signed-off-by: mdimakov [email protected]