Skip to content

[SYCL][CUDA][PI] Improve performance of event synchronization #6224

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jun 15, 2022

Conversation

t4c1
Copy link
Contributor

@t4c1 t4c1 commented Jun 1, 2022

Improve performance of event synchronization by reducing the number of calls to cuStreamWaitEvent. This call is now skipped for the stream, the event is coming from. Also when enqueueing a new command with a dependency on a previous one an attempt to use the same stream will be made, so both can be waited on by only one call to cuStreamWaitEvent.

@t4c1 t4c1 requested a review from a team as a code owner June 1, 2022 09:18
@t4c1 t4c1 requested a review from steffenlarsen June 1, 2022 09:18
Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionally this looks good, but the whole multiple-stream functionality is getting more and mroe complex so I would really like for it to be documented more. Could you please add some more comments detailing how and why some streams are "delayed"?

@pvchupin pvchupin requested a review from steffenlarsen June 7, 2022 23:42
Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@pvchupin pvchupin merged commit c4f326a into intel:sycl Jun 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants