Skip to content

[SYCL][ESIMD][EMU] Queue/Event handling change for esimd_cpu #4201

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Sep 3, 2021

Conversation

dongkyunahn-intel
Copy link
Contributor

@dongkyunahn-intel dongkyunahn-intel commented Jul 28, 2021

  • Supporting only in-order queue for pi_esimd_cpu
  • Event creation is changed to fail-safe one. Imported from pi_cuda.cpp

- Including esimd_cpu in work-around for revised queue::wait() logic
same as level-zero, introduced in PR#4044

- Returning 'false' from ExecCGCommand:: producesPiEvent() predicate
for esimd_cpu as no effective event is created during esimd_cpu kernel
execution
@dongkyunahn-intel dongkyunahn-intel requested a review from a team as a code owner July 28, 2021 01:08
@romanovvlad romanovvlad requested a review from kbobrovs July 28, 2021 09:06
@romanovvlad
Copy link
Contributor

Including esimd_cpu in work-around for revised queue::wait() logic
same as level-zero, introduced in PR#4044
Returning 'false' from ExecCGCommand:: producesPiEvent() predicate...

Could you please clarify why this is necessary? What is the problem with "dummy" events if all operations are completed during piEnqueue*?

Tagging @sergey-semenov

@dongkyunahn-intel
Copy link
Contributor Author

Some ESIMD samples from intel/llvm-test-suite fails without this change - e.g. vadd_1d.

What is the problem with "dummy" events if all operations are completed during piEnqueue*?

'DummyEvents' are still observed by piEventsWait. In order to completely eliminate events from piEnqueueMemBufferRead and piEnqueueMemImageRead done by CPU, the event returned from those PI_APIs must be nullptr (#4011 (comment)). Even if nullptr is returned from those APIs, some ESIMD samples are still failing - e.g. accessor, PrefixSum.

Meanwhile, I have used ESIMD samples from intel/llvm-test-suite for enabling ESIMD_CPU support. So far, none of those samples has invoked piEventCreate function (this is why piEventCreate in PR#4011 is still empty). I would need an ESIMD sample actively using events for fully enabling this event handling.

@romanovvlad
Copy link
Contributor

Some ESIMD samples from intel/llvm-test-suite fails without this change - e.g. vadd_1d.

What is the problem with "dummy" events if all operations are completed during piEnqueue*?

'DummyEvents' are still observed by piEventsWait. In order to completely eliminate events from piEnqueueMemBufferRead and piEnqueueMemImageRead done by CPU, the event returned from those PI_APIs must be nullptr (#4011 (comment)). Even if nullptr is returned from those APIs, some ESIMD samples are still failing - e.g. accessor, PrefixSum.

Meanwhile, I have used ESIMD samples from intel/llvm-test-suite for enabling ESIMD_CPU support. So far, none of those samples has invoked piEventCreate function (this is why piEventCreate in PR#4011 is still empty). I would need an ESIMD sample actively using events for fully enabling this event handling.

Still not sure I understand. Can we return "dummy" events from piEnqueue* APIs and then just ignore them in all PI APIs that take events?

@dongkyunahn-intel
Copy link
Contributor Author

Still not sure I understand. Can we return "dummy" events from piEnqueue* APIs and then just ignore them in all PI APIs that take events?

piEnqueue* APIs and piEventsWait are implemented as your description. Some kernels are still failing without patch from this PR. I was investigating the failures after making kernels work but still cannot understand why.

piEnqueueMemBufferRead- https://github.com/intel/llvm/pull/4011/files#diff-d023d374bba88549a2b0637e35fb5d6d463b1fc885e86460bec7a563da17fa91R1198

piEnqueueMemImageRead - https://github.com/intel/llvm/pull/4011/files#diff-d023d374bba88549a2b0637e35fb5d6d463b1fc885e86460bec7a563da17fa91R1293

piEventsWait - https://github.com/intel/llvm/pull/4011/files#diff-d023d374bba88549a2b0637e35fb5d6d463b1fc885e86460bec7a563da17fa91R1078

@romanovvlad
Copy link
Contributor

Still not sure I understand. Can we return "dummy" events from piEnqueue* APIs and then just ignore them in all PI APIs that take events?

piEnqueue* APIs and piEventsWait are implemented as your description. Some kernels are still failing without patch from this PR. I was investigating the failures after making kernels work but still cannot understand why.

piEnqueueMemBufferRead- https://github.com/intel/llvm/pull/4011/files#diff-d023d374bba88549a2b0637e35fb5d6d463b1fc885e86460bec7a563da17fa91R1198

piEnqueueMemImageRead - https://github.com/intel/llvm/pull/4011/files#diff-d023d374bba88549a2b0637e35fb5d6d463b1fc885e86460bec7a563da17fa91R1293

piEventsWait - https://github.com/intel/llvm/pull/4011/files#diff-d023d374bba88549a2b0637e35fb5d6d463b1fc885e86460bec7a563da17fa91R1078

I think we need to find the root cause before committing the patch.

@dongkyunahn-intel
Copy link
Contributor Author

I think I found out solution. It has to be applied on top of the change set from PR#4011. As of August/19/2021, the PR is approved, but buildbot jobs fail due to missing packages required for online-building libva which is pre-requisite for CM. As soon as PR#4011 is merged into origin/sycl branch, I'll update this PR.

@romanovvlad
Copy link
Contributor

@dongkyunahn-intel

Returning 'false' from ExecCGCommand:: producesPiEvent() predicate
for esimd_cpu as no effective event is created during esimd_cpu kernel
execution

Could you please tell if we can create a dummy event during esimd_cpu kernel execution?

@dongkyunahn-intel
Copy link
Contributor Author

Could you please tell if we can create a dummy event during esimd_cpu kernel execution?

Dummy event is already applied for piEnqueueMemBufferRead and piEnqueueMemImageRead in pi_esimd_cpu.cpp

if (EventList[i]->IsDummyEvent) {

(*Event)->IsDummyEvent = true;

(*Event)->IsDummyEvent = true;

@romanovvlad
Copy link
Contributor

Dummy event is already applied for piEnqueueMemBufferRead and piEnqueueMemImageRead in pi_esimd_cpu.cpp

So, why we need to return false from ExecCGCommand::producesPiEvent() for esimd_cpu then?

- Event creation is imported from pi_cuda.cpp

- Special handling for esimd_cpu in ExecCGCommand::producesPiEvent()
(commands.cpp) is removed
@dongkyunahn-intel
Copy link
Contributor Author

@romanovvlad, your suggestion worked with my fully working branch for ESIMD_CPU. After suggestion applied, only one file is changed.

Due to dependency on PR#4430, buildbots are failing during merge. I'll merge change sets from the PR to this PR when it is merged to origin/sycl branch.

@dongkyunahn-intel
Copy link
Contributor Author

@romanovvlad / @kbobrovs ,

Can I apply change from this PR to PR#4430 (#4430)? As this change set is limited to one file, and by applying the change, we could prevent possible failures in PR#4430 caused by not having changes from this PR.

@kbobrovs
Copy link
Contributor

@romanovvlad / @kbobrovs ,

Can I apply change from this PR to PR#4430 (#4430)? As this change set is limited to one file, and by applying the change, we could prevent possible failures in PR#4430 caused by not having changes from this PR.

Just curious - why not merge this PR first? I'm OK with uniting the two PRs.

@dongkyunahn-intel
Copy link
Contributor Author

Just curious - why not merge this PR first? I'm OK with uniting the two PRs.

It still has dependency on PR#4430 (4011 previously) as it changes some PI_API implementation.

@@ -729,7 +736,9 @@ pi_result piQueueRelease(pi_queue Queue) {
}

pi_result piQueueFinish(pi_queue) {
DIE_NO_IMPLEMENTATION;
// TODO/FIXME : Only in-order queue is called for this API. Support
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

piQueueFinish must guarantee that all previously enqueued commands had completed. The fact that the queue is in-order does not guarantee that. In-order queue just means enqueued commands will execute in the enqueue order. Still, they can can execute asynchronously with the thread which enqueues the commands.

As far as I understand your current implementation, it is safe for piQueueFinish to be no-op just because all the commands like kernel execution or memory copy are blocking - enqueue of any command with ESIMD EMU plugin does not return until the command is actually finished.

If so, please change the comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the comment.

std::unique_ptr<_pi_event> RetEv{nullptr};
if (Event) {
RetEv = std::unique_ptr<_pi_event>(new _pi_event());
RetEv->IsDummyEvent = true;
Copy link
Contributor

@kbobrovs kbobrovs Aug 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand your current implementation, with all commands being blocking, all events returned from enqueue operations are effectively in "completed" state once control is returned back to the caller of the enqueue operation.

  • Given that I don't see the need in IsDummyEvent field.
  • Second, the above code is not needed, instead, the line 1218 should be '*Event = new _pi_event();' plus maybe setting the event to 'completed' state.

The same applies to other places where events are created.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@romanovvlad suggested this implementation from CUDA PI implementation from recent phone call. Also, he was fine with this 'DummyEvent' approach.

Copy link
Contributor

@kbobrovs kbobrovs Sep 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@romanovvlad suggested this implementation from CUDA PI implementation from recent phone call.

OK, now I see the point - unique_ptr is useful to automatically release the memory allocated by new _pi_even in case buf->CmBufferPtr->ReadSurface throws an exception. Disregard this my comment then.

Also, he was fine with this 'DummyEvent' approach.

DummyEvent seems is redundant, as I explained above, even though harmless. So I don't mind leaving it as is for now.

@dongkyunahn-intel
Copy link
Contributor Author

dongkyunahn-intel commented Sep 1, 2021

As PR#4430 is almost complete without change from this PR, I'll keep this active and merge later.

@kbobrovs kbobrovs merged commit 6d8abb9 into intel:sycl Sep 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants