Skip to content

[Backend Test] Backend test reporting skeleton #12296

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 15, 2025

Conversation

GregoryComer
Copy link
Member

@GregoryComer GregoryComer commented Jul 9, 2025

Summary

Add the initial skeleton of reporting code for the backend tester. This PR is primarily focused on putting the hooks and runner structure in place. Follow-up work will expand the scope of collection and reporting outputs.

This PR adds the following:

  • CLI runner for the test suite.
  • Basic test result breakdown by success / fail and cause (failing in lowering vs output mismatch, for example).
  • Refactoring of test suite logic to clean things up.

Next steps:

  • Aggregate results by flow (backend).
  • Add additional CLI flags to allow filtering backends and dtypes.
  • Land more of the operator test suite.
  • Wire up flows for quantized operators.

Note that this PR is stacked on (and thus includes) #11960. I accidently broke my ghstack, so I'm converting this to a normal PR.

Sample output (XNNPACK):

Test Session Summary:

   84 Passed / 95
   11 Failed / 95
    0 Skipped / 95

[Success]
   66 Delegated
   18 Undelegated

[Failure]
    4 Lowering Fail
    0 PTE Load Fail
    0 PTE Run Fail
    6 Output Mismatch Fail

Reproduce with ET_TEST_ENABLED_BACKENDS=xnnpack python -m executorch.backends.test.suite.runner.executorch.backends.test.suite. I've temporarily commented out non-f32 dtypes to work around some crashes in XNNPACK, which are non-recoverable from Python.

Copy link

pytorch-bot bot commented Jul 9, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/12296

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 330d48e with merge base dd4488d (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 9, 2025
@GregoryComer GregoryComer force-pushed the backend-test-reporting branch from ac4a118 to 05f933d Compare July 9, 2025 03:54
@GregoryComer GregoryComer marked this pull request as ready for review July 9, 2025 03:55
@GregoryComer GregoryComer added the release notes: none Do not include this in the release notes label Jul 9, 2025
@GregoryComer GregoryComer requested a review from digantdesai July 9, 2025 03:59
Copy link
Contributor

@digantdesai digantdesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for a starting point. Left some comments.

# We can do this if we ever see to_executorch() or serialize() fail due a backend issue.
return build_result(TestResult.UNKNOWN_FAIL, e)

# TODO We should consider refactoring the tester slightly to return more signal on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


def print_summary(summary: RunSummary):
print()
print("Test Session Summary:")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some version number? And make sure we are parsable

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll add a version number. I was intending to add a machine-readable file output, such as in json format. Is that sufficient for the use cases you're envisioning, or do you think we need machine parsable text output, as well?

@GregoryComer GregoryComer force-pushed the backend-test-reporting branch 3 times, most recently from fbe335b to 6ee85f7 Compare July 15, 2025 00:15
@facebook-github-bot
Copy link
Contributor

@GregoryComer has imported this pull request. If you are a Meta employee, you can view this in D78311621.

@GregoryComer GregoryComer force-pushed the backend-test-reporting branch from 6ee85f7 to 330d48e Compare July 15, 2025 00:20
@facebook-github-bot
Copy link
Contributor

@GregoryComer has imported this pull request. If you are a Meta employee, you can view this in D78311621.

@GregoryComer GregoryComer merged commit 615404f into pytorch:main Jul 15, 2025
100 checks passed
lucylq pushed a commit that referenced this pull request Jul 17, 2025
### Summary
Add the initial skeleton of reporting code for the backend tester. This
PR is primarily focused on putting the hooks and runner structure in
place. Follow-up work will expand the scope of collection and reporting
outputs.

This PR adds the following:
- CLI runner for the test suite.
- Basic test result breakdown by success / fail and cause (failing in
lowering vs output mismatch, for example).
- Refactoring of test suite logic to clean things up.

Next steps:
- Aggregate results by flow (backend).
- Add additional CLI flags to allow filtering backends and dtypes.
- Land more of the operator test suite.
- Wire up flows for quantized operators.

Note that this PR is stacked on (and thus includes)
#11960. I accidently broke my
ghstack, so I'm converting this to a normal PR.

Sample output (XNNPACK):
```
Test Session Summary:

   84 Passed / 95
   11 Failed / 95
    0 Skipped / 95

[Success]
   66 Delegated
   18 Undelegated

[Failure]
    4 Lowering Fail
    0 PTE Load Fail
    0 PTE Run Fail
    6 Output Mismatch Fail
```
Reproduce with `ET_TEST_ENABLED_BACKENDS=xnnpack python -m
executorch.backends.test.suite.runner.executorch.backends.test.suite`.
I've temporarily commented out non-f32 dtypes to work around some
crashes in XNNPACK, which are non-recoverable from Python.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. release notes: none Do not include this in the release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants