Skip to content

[CI][Bench] Implement exponentially weighted moving average for SYCL nightly regression CI #18766

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Jul 8, 2025

Conversation

ianayl
Copy link
Contributor

@ianayl ianayl commented Jun 2, 2025

Median does not respond very fast to changes in performance, and thus is not a suitable metric to be used for regression checking. This PR implements an option to use exponentially weighted moving average instead.

The hope is that this implementation could also be used on CPU instructions retired metric when SYCL compute-benchmark tests start reporting instructions retired; this would create a far more robust metric to use in order to spot regressions.

Note to llvm-reviewers-benchmarking: Observe that changes here are removed from the core benchmarking scripts -- This change should be functionally NFC for the core benchmarking scripts.

@ianayl ianayl temporarily deployed to WindowsCILock June 2, 2025 19:58 — with GitHub Actions Inactive
@ianayl ianayl temporarily deployed to WindowsCILock June 2, 2025 20:21 — with GitHub Actions Inactive
@ianayl ianayl temporarily deployed to WindowsCILock June 2, 2025 20:21 — with GitHub Actions Inactive
@ianayl ianayl temporarily deployed to WindowsCILock June 5, 2025 21:13 — with GitHub Actions Inactive
@ianayl ianayl temporarily deployed to WindowsCILock June 5, 2025 22:10 — with GitHub Actions Inactive
@ianayl ianayl temporarily deployed to WindowsCILock June 5, 2025 22:10 — with GitHub Actions Inactive
@ianayl ianayl temporarily deployed to WindowsCILock June 6, 2025 15:10 — with GitHub Actions Inactive
@ianayl ianayl temporarily deployed to WindowsCILock June 6, 2025 15:36 — with GitHub Actions Inactive
@ianayl ianayl temporarily deployed to WindowsCILock June 6, 2025 15:36 — with GitHub Actions Inactive
@ianayl ianayl temporarily deployed to WindowsCILock June 6, 2025 16:32 — with GitHub Actions Inactive
@ianayl ianayl temporarily deployed to WindowsCILock June 6, 2025 17:06 — with GitHub Actions Inactive
@ianayl ianayl temporarily deployed to WindowsCILock June 6, 2025 17:06 — with GitHub Actions Inactive
@ianayl ianayl temporarily deployed to WindowsCILock June 6, 2025 18:59 — with GitHub Actions Inactive
@ianayl ianayl temporarily deployed to WindowsCILock June 6, 2025 19:24 — with GitHub Actions Inactive
@ianayl ianayl temporarily deployed to WindowsCILock June 6, 2025 19:24 — with GitHub Actions Inactive
@ianayl ianayl temporarily deployed to WindowsCILock June 6, 2025 19:40 — with GitHub Actions Inactive
@ianayl ianayl temporarily deployed to WindowsCILock June 6, 2025 20:02 — with GitHub Actions Inactive
@ianayl ianayl temporarily deployed to WindowsCILock June 6, 2025 20:02 — with GitHub Actions Inactive
@ianayl ianayl temporarily deployed to WindowsCILock July 2, 2025 17:02 — with GitHub Actions Inactive
@ianayl ianayl temporarily deployed to WindowsCILock July 2, 2025 21:53 — with GitHub Actions Inactive
@ianayl ianayl temporarily deployed to WindowsCILock July 2, 2025 21:53 — with GitHub Actions Inactive
@ianayl ianayl temporarily deployed to WindowsCILock July 3, 2025 14:18 — with GitHub Actions Inactive
@ianayl ianayl temporarily deployed to WindowsCILock July 3, 2025 14:52 — with GitHub Actions Inactive
@ianayl ianayl temporarily deployed to WindowsCILock July 3, 2025 14:52 — with GitHub Actions Inactive
@ianayl ianayl temporarily deployed to WindowsCILock July 3, 2025 15:09 — with GitHub Actions Inactive
uditagarwal97 pushed a commit that referenced this pull request Jul 4, 2025
Performance of SYCL has regressed a lot recently, causing nightly to
fail consistently. However, if nightly is constantly failing, people may
start to ignore nightly. #18766 should
help with the amount of failures, but I am still figuring out the best
parameters to use here, and additionally often times regressions are
caused by driver changes, which is out of intel/llvm's control. Thus,
for future's sake, it might be best to move benchmarking out of
sycl-nightly.yml.

This PR moves nightly benchmarking into its own workflow/nightly job,
1hr before sycl-nightly.yml, in order to reduce the amount of failures
in sycl nightly.
@ianayl ianayl temporarily deployed to WindowsCILock July 7, 2025 20:36 — with GitHub Actions Inactive
@ianayl ianayl temporarily deployed to WindowsCILock July 7, 2025 20:58 — with GitHub Actions Inactive
@ianayl ianayl temporarily deployed to WindowsCILock July 7, 2025 20:58 — with GitHub Actions Inactive
@ianayl
Copy link
Contributor Author

ianayl commented Jul 8, 2025

@intel/llvm-gatekeepers PR is ready for merge, thanks!

@sarnex sarnex merged commit a9aeafa into sycl Jul 8, 2025
27 of 29 checks passed
@bader bader deleted the ianayl/benchmark-ci-EWMA branch July 10, 2025 01:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants