-
Notifications
You must be signed in to change notification settings - Fork 795
[SYCL][CUDA][PI] Improve performance of event synchronization #6224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functionally this looks good, but the whole multiple-stream functionality is getting more and mroe complex so I would really like for it to be documented more. Could you please add some more comments detailing how and why some streams are "delayed"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Improve performance of event synchronization by reducing the number of calls to
cuStreamWaitEvent
. This call is now skipped for the stream, the event is coming from. Also when enqueueing a new command with a dependency on a previous one an attempt to use the same stream will be made, so both can be waited on by only one call tocuStreamWaitEvent
.