Skip to content

[SYCL][CUDA][PI] Improve performance of piQueueFinish #6201

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 1, 2022

Conversation

t4c1
Copy link
Contributor

@t4c1 t4c1 commented May 26, 2022

Improves performance of piQueueFinish and therefore of queue::wait() on CUDA backend by reducing the number of cuStreamSynchronize() calls invoked. This in most use cases fixes the slowdown to queue::wait() introduced in #6102.

This does not change any interface so there are no changes to the test suite.

@t4c1 t4c1 requested a review from a team as a code owner May 26, 2022 12:39
@t4c1 t4c1 requested a review from smaslov-intel May 26, 2022 12:39
Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ingenious! LGTM!

@zjin-lcf
Copy link
Contributor

Improves performance of piQueueFinish and therefore of queue::wait() on CUDA backend by reducing the number of cuStreamSynchronize() calls invoked. This in for most use cases fixes the slowdown to queue::wait() introduced in #6102.

This does not change any interface so there are no changes to the test suite.

Could your improvement be migrated directly to the HIP plugin interface ?

@t4c1
Copy link
Contributor Author

t4c1 commented May 30, 2022

This could be migrated to HIP plugin once it also uses multiple streams.

@steffenlarsen
Copy link
Contributor

@smaslov-intel - Would you like to have a look or should we merge this?

Copy link
Contributor

@smaslov-intel smaslov-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@steffenlarsen steffenlarsen merged commit 8b85a3c into intel:sycl Jun 1, 2022
steffenlarsen pushed a commit that referenced this pull request Jun 21, 2022
Fixed off-by-one error introduced in #6201 that would cause queue synchronization to synchronize all streams when no stream has been used. The code worked correctly, but this can in some cases impact performance.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants