Skip to content

Get rid of post-commit testing #14383

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/sycl-linux-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ on:
build_image:
type: string
required: false
default: "ghcr.io/intel/llvm/ubuntu2204_build:latest"
default: "ghcr.io/intel/llvm/ubuntu2204_build:latest-0300ac924620a51f76c4929794637b82790f12ab"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this? At the very least, it should have a comment in the code explaining it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same as the other places that use this tag.

 $ ag 0300ac92462 -A 1 -B 1
workflows/sycl-linux-build.yml
16-        required: false
17:        default: "ghcr.io/intel/llvm/ubuntu2204_build:latest-0300ac924620a51f76c4929794637b82790f12ab"
18-      build_ref:

workflows/sycl-precommit.yml
81-            runner: '["Linux", "amdgpu"]'
82:            image: ghcr.io/intel/llvm/ubuntu2204_build:latest-0300ac924620a51f76c4929794637b82790f12ab
83-            image_options: -u 1001 --device=/dev/dri --device=/dev/kfd

workflows/sycl-linux-precommit-aws.yml
66-      runner: '["aws_cuda-${{ github.event.workflow_run.id }}-${{ github.event.workflow_run.run_attempt }}"]'
67:      image: ghcr.io/intel/llvm/ubuntu2204_build:latest-0300ac924620a51f76c4929794637b82790f12ab
68-      image_options: -u 1001 --gpus all --cap-add SYS_ADMIN --env NVIDIA_DISABLE_REQUIRE=1

What should the comment say? "Use stable tag 0300ac9"? Seems redundant to me

build_ref:
type: string
required: false
Expand Down
40 changes: 39 additions & 1 deletion .github/workflows/sycl-nightly.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,14 @@ jobs:
with:
build_cache_root: "/__w/"
build_artifact_suffix: default
build_configure_extra_args: '--hip --cuda'
build_configure_extra_args:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we build precommit linux with the same flags (It looks like we don't but maybe I can't read)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this for more diversity i.e. pre-commit is static build (llvm config default), and then nightly testing gets exercised with a shared config. I'm not married to it, but I figured it was any easy way to increase coverage

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One issue we actually do see relatively frequently is someone breaking the shared library build (usually missing a library dep in the cmake file) and today that is usually caught in post commit because that does shared lib build. With this, we would only catch it in the nightly.

However I think it's reasonable to assume that most developers are using the default config (configure.py) and assuming we have someone guarneteed to be checking the nightly and quickly report and/or fix issues, I don't have a huge problem with this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could swap it round, so pre-commit testing catches the sharelib issues?

I wonder if it's possible to break static library build and not shared library build. If that's extremely unlikely such that either both are broken or shlib is broken, doing shlib in precommit would be fine with me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should just drop "sharedlib" build configuration. The primary use case for this configuration is to speed-up incremental build of LLVM libraries. This build configuration is not use in the production environment - it's used only by LLVM developers. If everyone uses standard build, there is no point to build "sharedlib" configuration in CI.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I use it rather frequently and I would be annoyed if it was failing and not caught in CI.

Copy link
Contributor

@aelovikov-intel aelovikov-intel Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our downstream uses it a lot. It would be very bad if the issues will only be discovered when syncing downstream with upstream and not in upstream CI.

--shared-libs
--no-assertions
--hip
--cuda
--native_cpu
--cmake-opt="-DSYCL_ENABLE_STACK_PRINTING=ON"
--cmake-opt="-DSYCL_LIB_WITH_DEBUG_SYMBOL=ON"
merge_ref: ''
retention-days: 90

Expand Down Expand Up @@ -80,6 +87,37 @@ jobs:
image_options: -u 1001 --device=/dev/dri --privileged --cap-add SYS_ADMIN
target_devices: opencl:cpu
tests_selector: cts

- name: E2E tests with dev igc on Intel Arc A-Series Graphics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update your commit message with the summary of changes related to Arc GPU testing. @sarnex and @YuriPlyakhin would have to approve the changes to the devigc usage.

Copy link
Contributor

@sarnex sarnex Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems as part of this PR we stop doing arc dev igc testing in precommit and only do it in the nightly. This is fine with me but we need @jsji's approval, he worked on the dev igc CI and is more familiar with the use case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ldrumm , why do you suggest to remove dev igc testing from precommit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the time spent in running the dev igc testing is fairly small, we should try to keep it in pre-commit if it is possible -- moving it from pre-commit to post-commit (nightly) would make things complicated unless we can identify the culprit commit and ping author automatically in post-commit (nightly).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to what @jsji said

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The time testing the dev driver might be small but it results in an instability. The principle here is to have a stable testing platform where the user's PR is tested against a stable config, not where a users PR is tested against unstable GPU drivers. If we're to have any unstable test plastforms available in pre-commit they should be explicitly opt-in only

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It basically boils down to "GPU driver developers should do their own testing, not experiment on dpc++ contributors"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ldrumm , understand the principle. However, the dev igc testing is not trying to do testing for GPU driver, but rather doing the continuous integration tests for Joint Matrix/ESIMD features, which require IGC changes time-to-time. The testing already restricted to Joint Matrix and ESIMD only.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, @ldrumm , dev igc driver is being updated only after Joint Matrix/ESIMD E2E tests passed with it, so it is stable in context of precommit testing.

runner: '["Linux", "arc"]'
image: ghcr.io/intel/llvm/ubuntu2204_intel_drivers:devigc
image_options: -u 1001 --device=/dev/dri --privileged --cap-add SYS_ADMIN
target_devices: ext_oneapi_level_zero:gpu;opencl:gpu
reset_gpu: true
install_drivers: >-
${{ contains(needs.detect_changes.outputs.filters, 'drivers') ||
contains(needs.detect_changes.outputs.filters, 'devigccfg') }}
use_dev_igc: ${{ contains(needs.detect_changes.outputs.filters, 'devigccfg') }}
extra_lit_opts: --param matrix-xmx8=True --param gpu-intel-dg2=True

# Performance tests below. Specifics:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are very dumb for now and the idea behind them is to be able to access the logs to look through performance number on a per-commit basis. I don't see a reason to add this into nightly...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I'll move them back

# - only run performance tests (use LIT_FILTER env)
# - ask llvm-lit to show all the output, even for PASS (-a)
# - run in single thread (-j 1)
# - enable the tests in LIT (--param enable-perf-tests=True)
# - run on all available devices.
- name: Perf tests on Intel GEN12 Graphics system
runner: '["Linux", "gen12"]'
env: '{"LIT_FILTER":"PerformanceTests/"}'
extra_lit_opts: -a -j 1 --param enable-perf-tests=True --param gpu-intel-gen12=True
target_devices: all

- name: Perf tests on Intel Arc A-Series Graphics system
runner: '["Linux", "arc"]'
env: '{"LIT_FILTER":"PerformanceTests/"}'
extra_lit_opts: -a -j 1 --param enable-perf-tests=True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need --param gpu-intel-dg2=True

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Will fix

target_devices: all

uses: ./.github/workflows/sycl-linux-run-tests.yml
with:
name: ${{ matrix.name }}
Expand Down
113 changes: 0 additions & 113 deletions .github/workflows/sycl-post-commit.yml

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: SYCL Pre Commit on Linux
name: SYCL Pre Commit

on:
# We rely on "Fork pull request workflows from outside collaborators" -
Expand All @@ -18,9 +18,6 @@ on:
- 'clang/docs/**'
- '**.md'
- '**.rst'
- '.github/workflows/sycl-windows-*.yml'
- '.github/workflows/sycl-macos-*.yml'
- '.github/workflows/sycl-nightly.yml'
- 'devops/containers/**'
- 'devops/actions/build_container/**'

Expand All @@ -35,7 +32,7 @@ jobs:
detect_changes:
uses: ./.github/workflows/sycl-detect-changes.yml

build:
build_linux:
needs: [detect_changes]
if: always() && success()
uses: ./.github/workflows/sycl-linux-build.yml
Expand All @@ -46,12 +43,11 @@ jobs:
build_artifact_suffix: "default"
build_cache_suffix: "default"
changes: ${{ needs.detect_changes.outputs.filters }}
build_image: "ghcr.io/intel/llvm/ubuntu2204_build:latest-0300ac924620a51f76c4929794637b82790f12ab"

determine_arc_tests:
name: Decide which Arc tests to run
needs: [build, detect_changes]
if: ${{ always() && !cancelled() && needs.build.outputs.build_conclusion == 'success' }}
needs: [build_linux, detect_changes]
if: ${{ always() && !cancelled() && needs.build_linux.outputs.build_conclusion == 'success' }}
runs-on: [Linux, aux-tasks]
timeout-minutes: 3
outputs:
Expand All @@ -69,27 +65,38 @@ jobs:
else
echo 'arc_tests="Matrix/"' >> "$GITHUB_OUTPUT"
fi
test:
needs: [build, detect_changes, determine_arc_tests]
if: ${{ always() && !cancelled() && needs.build.outputs.build_conclusion == 'success' }}

test_linux:
needs: [build_linux, detect_changes, determine_arc_tests]
if: ${{ always() && !cancelled() && needs.build_linux.outputs.build_conclusion == 'success' }}
strategy:
fail-fast: false
matrix:
include:
- name: AMD/HIP
# XXX CUDA testing is in a separate workflow because it runs on AWS
# which has different access restrictions/requirements. See
# sycl-linux-precommit.yml if you're looking for the CUDA precommit
# definition
- name: E2E on AMD/HIP
runner: '["Linux", "amdgpu"]'
image: ghcr.io/intel/llvm/ubuntu2204_build:latest-0300ac924620a51f76c4929794637b82790f12ab
image_options: -u 1001 --device=/dev/dri --device=/dev/kfd
target_devices: ext_oneapi_hip:gpu
- name: Intel

- name: E2E on Intel GEN12; level zero GPU, OpenCL CPU
runner: '["Linux", "gen12"]'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everywhere in the code: Level Zero - both words should start with capital letters

image: ghcr.io/intel/llvm/ubuntu2204_intel_drivers:latest
image_options: -u 1001 --device=/dev/dri --privileged --cap-add SYS_ADMIN
target_devices: ext_oneapi_level_zero:gpu;opencl:gpu;opencl:cpu
reset_gpu: true
install_drivers: ${{ contains(needs.detect_changes.outputs.filters, 'drivers') }}
extra_lit_opts: --param gpu-intel-gen12=True
- name: E2E tests on Intel Arc A-Series Graphics

- name: E2E on Intel GEN12 Graphics; OpenCL GPU, OpenCL FPGA
runner: '["Linux", "gen12"]'
extra_lit_opts: --param gpu-intel-gen12=True
target_devices: opencl:gpu;opencl:fpga

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might not be an issue introduced by this PR and may already exist, but I think having gpu-intel-gen12=True and multiple target devices means that the variable is true for all target device (unless we somehow set it to false in the lit python scripts if it's not gpu). Probably we only want that variable to be true for GPU targets.

@aelovikov-intel Any idea?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a parallel activity to auto-detect architecture. I think that would solve the issue, but I don't remember the status of it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be already available: #13976

Copy link
Contributor

@sarnex sarnex Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks! Probably we need to either map the results there to the gpu-intel-* we have been using, or the other way around. Having two ways to specify the gpu seems bad.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks! Probably we need to either map the results there to the gpu-intel-* we have been using, or the other way around. Having two ways to specific the gpu seems bad.

That's the "in progress" part that I'm unaware about the status of. @AlexeySachkov , someone from your team is looking into it, right?

- name: E2E on Intel Arc A-Series Graphics; Level zero GPU, OpenCL CPU
runner: '["Linux", "arc"]'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- name: E2E on Intel Arc A-Series Graphics; Level zero GPU, OpenCL CPU
- name: E2E on Intel Arc A-Series Graphics; Level zero GPU, OpenCL GPU

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused. Isn't this FGPA and not GPU?

Copy link
Contributor

@YuriPlyakhin YuriPlyakhin Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally on ARC we tested with 2 GPU backends: level 0 and OpenCL: target_devices: ext_oneapi_level_zero:gpu;opencl:gpu
Not sure, where FPGA or CPU is coming from...

image: ghcr.io/intel/llvm/ubuntu2204_intel_drivers:latest
image_options: -u 1001 --device=/dev/dri --privileged --cap-add SYS_ADMIN
Expand All @@ -98,23 +105,16 @@ jobs:
install_drivers: ${{ contains(needs.detect_changes.outputs.filters, 'drivers') }}
extra_lit_opts: --param matrix-xmx8=True --param gpu-intel-dg2=True
env: '{"LIT_FILTER":${{ needs.determine_arc_tests.outputs.arc_tests }} }'
- name: E2E tests with dev igc on Intel Arc A-Series Graphics

- name: E2E on Intel Arc A-Series Graphics
runner: '["Linux", "arc"]'
Copy link
Contributor

@YuriPlyakhin YuriPlyakhin Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the name be more specific on what targets are being tested, for example: OpenCL FPGA, OpenCL GPU, etc....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, I'll have a look

image: ghcr.io/intel/llvm/ubuntu2204_intel_drivers:${{ contains(github.event.pull_request.labels.*.name, 'ci-no-devigc') && 'latest' || 'devigc' }}
image_options: -u 1001 --device=/dev/dri --privileged --cap-add SYS_ADMIN
target_devices: ext_oneapi_level_zero:gpu;opencl:gpu
reset_gpu: true
install_drivers: >-
${{ contains(needs.detect_changes.outputs.filters, 'drivers') ||
contains(needs.detect_changes.outputs.filters, 'devigccfg') }}
use_dev_igc: ${{ contains(needs.detect_changes.outputs.filters, 'devigccfg') }}
extra_lit_opts: --param matrix-xmx8=True --param gpu-intel-dg2=True
env: '{"LIT_FILTER":${{ needs.determine_arc_tests.outputs.arc_tests }} }'
target_devices: opencl:gpu;opencl:opencl:fpga

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenCL GPU is being tested with previous "E2E on Intel Arc A-Series Graphics" run, right?
If yes, we should not repeat it again probably.

Also, what is the point to test OpenCL FPGA on HW with Intel ARC? I would use more powerful system for that, and GPU is not necessary for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on this i think fpga testing on any system is the same, it just needs to be powerful enough to not timeout/etc

uses: ./.github/workflows/sycl-linux-run-tests.yml
with:
name: ${{ matrix.name }}
runner: ${{ matrix. runner }}
runner: ${{ matrix.runner }}
image: ${{ matrix.image }}
image_options: ${{ matrix.image_options }}
target_devices: ${{ matrix.target_devices }}
Expand All @@ -128,15 +128,40 @@ jobs:
merge_ref: ''

sycl_toolchain_artifact: sycl_linux_default
sycl_toolchain_archive: ${{ needs.build.outputs.artifact_archive_name }}
sycl_toolchain_decompress_command: ${{ needs.build.outputs.artifact_decompress_command }}
sycl_toolchain_archive: ${{ needs.build_linux.outputs.artifact_archive_name }}
sycl_toolchain_decompress_command: ${{ needs.build_linux.outputs.artifact_decompress_command }}

build_win:
if: |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you removing the split between lin/win tasks? I would have expected you to link the commit that introduced the split in the PR's description together with your arguments on why you think that was wrong.

Also, you're not removing sycl-windows-precommit.yml, why?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, even if we decide to keep that change, it should be done in a separate PR and not here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you removing the split between lin/win tasks? I would have expected you to link the commit that introduced the split in the PR's description together with your arguments on why you think that was wrong.

So people can see all precommit tests together. It never occurred to me that they might have been together at one point and then separated out, so I didn't consider something was done "wrong", thus no need to focus on

Also, you're not removing sycl-windows-precommit.yml, why?

That's a mistake. Will fix.

Also, even if we decide to keep that change, it should be done in a separate PR and not here.

Which change? Why does it need a separate merge request?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which change?

Merging win/lin into a single YML, obviously...

Why does it need a separate merge request?

Because that would be an atomic change and per good software development practices PRs have to represent atomic changes, not a bunch of different fixes at once.

It never occurred to me that they might have been together at one point and then separated out

They were. If you decide to merge them back, please make it clear in your new atomic PR why original reasoning doesn't take place anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

obviously

Not obviously. Don't patronise me. Your description was unclear and I'm not a mind reader

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first comment in this thread was attributed to a single change:

Why are you removing the split between lin/win tasks?

The second (and the only other comment at that time) in this thread was

if we decide to keep that change...

What else could it possibly refer to, in your opinion?

always()
&& success()
&& github.repository == 'intel/llvm'
uses: ./.github/workflows/sycl-windows-build.yml

test_e2e_win:
needs: build_win
# Continue if build was successful.
if: |
always()
&& !cancelled()
&& needs.build_win.outputs.build_conclusion == 'success'
uses: ./.github/workflows/sycl-windows-run-tests.yml
with:
name: Intel GEN12 Graphics with Level Zero
runner: '["Windows","gen12"]'
sycl_toolchain_archive: ${{ needs.build_win.outputs.artifact_archive_name }}
extra_lit_opts: --param gpu-intel-gen12=True

build_macos:
name: macOS Build and Test
if: github.repository == 'intel/llvm'
uses: ./.github/workflows/sycl-macos-build-and-test.yml

test-perf:
needs: [build, detect_changes]
test_perf:
needs: [build_linux, detect_changes]
if: |
always() && !cancelled()
&& needs.build.outputs.build_conclusion == 'success'
&& needs.build_linux.outputs.build_conclusion == 'success'
&& (contains(github.event.pull_request.labels.*.name, 'run-perf-tests')
|| contains(needs.detect_changes.outputs.filters, 'perf-tests'))
strategy:
Expand Down Expand Up @@ -165,7 +190,7 @@ jobs:
uses: ./.github/workflows/sycl-linux-run-tests.yml
with:
name: Perf tests on ${{ matrix.name }}
runner: ${{ matrix. runner }}
runner: ${{ matrix.runner }}
image: ${{ matrix.image }}
image_options: -u 1001 --privileged --cap-add SYS_ADMIN ${{ matrix.image_extra_opts }}
target_devices: all
Expand All @@ -179,5 +204,5 @@ jobs:
merge_ref: ''

sycl_toolchain_artifact: sycl_linux_default
sycl_toolchain_archive: ${{ needs.build.outputs.artifact_archive_name }}
sycl_toolchain_decompress_command: ${{ needs.build.outputs.artifact_decompress_command }}
sycl_toolchain_archive: ${{ needs.build_linux.outputs.artifact_archive_name }}
sycl_toolchain_decompress_command: ${{ needs.build_linux.outputs.artifact_decompress_command }}
Loading
Loading