[Backend Tester] Add backend test suite skeleton #11960

GregoryComer · 2025-06-25T06:36:31Z

This PR lays the foundation for the new backend test suite. It includes a skeleton of the runner (iterated on in the stack) and some basic tests for some elementwise operators.

GregoryComer · 2025-06-25T06:36:32Z

Stack from ghstack (oldest at bottom):

pytorch-bot · 2025-06-25T06:36:35Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11960

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 2eb2407 with merge base fee2bd9 ():

NEW FAILURE - The following job has failed:

pull / test-llava-runner-linux / linux-job (gh)
RuntimeError: Command docker exec -t 927a68b9595592865a4ca6f393f3ea2a395a4fed3ea3bfdc7bff627669d4cb2b /exec failed with exit code 139

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: a21d704 ghstack-comment-id: 3003538522 Pull-Request: #11960

cccclai · 2025-06-25T18:25:50Z

What will happen for the following two scenarios?

the backend doesn't support this particular operator, but support the decomposed version instead?
the backend supports a particular aten op, that will be decomposed later?

GregoryComer · 2025-06-25T19:57:21Z

What will happen for the following two scenarios?

the backend doesn't support this particular operator, but support the decomposed version instead?

the backend supports a particular aten op, that will be decomposed later?

In both cases, it should be fine. These tests focus on the operators end to end essentially, in the same way they appear in the model. If they are decomposed internally, it will still validate the correctness. If they're not partitioned, it's also fine, as the tests will pass.

cccclai · 2025-06-25T20:31:32Z

In

What will happen for the following two scenarios?

the backend doesn't support this particular operator, but support the decomposed version instead?

the backend supports a particular aten op, that will be decomposed later?

In both cases, it should be fine. These tests focus on the operators end to end essentially, in the same way they appear in the model. If they are decomposed internally, it will still validate the correctness. If they're not partitioned, it's also fine, as the tests will pass.

How about the case that they should partition and delegated, but it was missing some cases, hence not partition and fall back to cpu? Also, how do we measure the closeness between the baseline and the backend op?

GregoryComer · 2025-06-26T17:58:20Z

In

What will happen for the following two scenarios?

the backend doesn't support this particular operator, but support the decomposed version instead?

the backend supports a particular aten op, that will be decomposed later?

In both cases, it should be fine. These tests focus on the operators end to end essentially, in the same way they appear in the model. If they are decomposed internally, it will still validate the correctness. If they're not partitioned, it's also fine, as the tests will pass.

How about the case that they should partition and delegated, but it was missing some cases, hence not partition and fall back to cpu? Also, how do we measure the closeness between the baseline and the backend op?

I might not be fully understanding the question, but this should also be fine. The purpose of these tests are twofold: (1) does the partitioner/lowering error out instead of not partitioning things it can't handle, and (2) if something is partitioned, are the outputs reasonably close to the eager value?

On numerical accuracy, I've set very loose tolerances, as my intent is mainly to catch things that are blatantly incorrect, rather than get into specific numerical deviation. It looks at the full graph output, so if something is decomposed and/or partially lowered, it will consider the net output regardless of how it's handled internally.

cccclai · 2025-06-26T22:22:57Z

I think the following will help

make it configurable, meaning that backend can note what op is supported and what op is not, and being able to inject their own ops
check the graph (or the node) after partition, and make sure the op is lowered. This will help us knowing if the backend is doing the right thing.

digantdesai

Too many comments, I will try to keep them structured.

I would like to better understand and document what gets in/out of this suite but I like this idea.
I hope we can figure out the reporting for this.
Another consideration is versioning, i.e. XNNPACK is complient to v1.2 vs. coreml to v1.1 etc. We can tie to to ET versions.
FC/BC is also a pain with this. So we have to think this through a bit more.
Lastly, what's the status of the suite itself, is it ready or WIP, what is the update cadence etc.

digantdesai · 2025-07-02T16:43:44Z

backends/test/compliance_suite/README.md

@@ -0,0 +1,15 @@
+# Operator Compliance Test Suite
+
+This directory contains operator tests that all backends are expected to pass. While not every backend will implement every operator or permutation, the expectation is that backend partitioners will only partition nodes that the backend can support. The partitioner should never error out due to not supporting an input node.


say why? I.e. what makes these tests a must pass. I know it but document it.

I imagine we will build up a "qualifying criteria" for adding new things or removing things from this compliance suite. And what does it mean for a backend to be compliant, when a user reads it. Also indicate this is a purely functional and no bar on performance. Also we need to distinguish between compliant and partitioned vs. not-partitioned, when we present this, but that's for later.

I've re-written this section to word it a bit more generally while we align on the exact expectations for backends.

backends/test/compliance_suite/__init__.py

digantdesai · 2025-07-02T16:48:00Z

backends/test/compliance_suite/__init__.py

+
+ALL_TEST_FLOWS = []
+
+if is_backend_enabled("xnnpack"):


Reverse of this would be something like is_backend_ready_for_complience_testing() and someone has to whitelist their backend for this method to return no, and defaults to yes. And ofc you can override for testing etc.

digantdesai · 2025-07-02T16:49:03Z

backends/test/compliance_suite/__init__.py

+    torch.float64,
+]
+
+FLOAT_DTYPES = [


missing bloat16?

I guess no complex types or fp8 cases yet.

ET seems to have a bit of difficulty (outside of the backends) with bf16, so I excluded it for now. I can add a note. For example, I can't lower a graph adding 2 bf16 tensors with no delegation.

What's the issue with lowering add(bf16, bf16)? Portable op support or export or et issue?

backends/test/compliance_suite/__init__.py

digantdesai · 2025-07-02T16:58:09Z

backends/test/compliance_suite/operators/test_add.py

+class Add(OperatorTest):
+    @dtype_test
+    def test_add_dtype(self, dtype, tester_factory: Callable) -> None:
+        self._test_op(


I was wondering instead of calling this method here, why don't we just get an instance of "TestInputs" class which gives you logical_name, module definition, test inputs, etc. This is to disconnect the inputs from consuming those. This will allow us to repurpose these if we want to do different things with them. What do you think?

digantdesai · 2025-07-02T17:09:02Z

I think the following will help

make it configurable, meaning that backend can note what op is supported and what op is not, and being able to inject their own ops

The whole idea is backends can't choose the test module, the suite will. Backend can use the ET APIs, if they partition it - it is expected to work correctly. If they don't partition is that is also OK.

cccclai · 2025-07-02T17:44:56Z

I think the following will help

make it configurable, meaning that backend can note what op is supported and what op is not, and being able to inject their own ops

The whole idea is backends can't choose the test module, the suite will. Backend can use the ET APIs, if they partition it - it is expected to work correctly. If they don't partition is that is also OK.

The idea is that, if I have an op A, and the backend partitioner makes a mistake and doesn't partition, and fall back to portable, the test will still pass, but it won't detect the mistakes.

And also, say the backend supports an aten op (not portable), how do we include them here.

My goal is to make the test suit more generic.

facebook-github-bot · 2025-07-08T22:40:45Z

@GregoryComer has imported this pull request. If you are a Meta employee, you can view this in D77967739.

digantdesai · 2025-07-09T15:32:14Z

I think the following will help

make it configurable, meaning that backend can note what op is supported and what op is not, and being able to inject their own ops

The whole idea is backends can't choose the test module, the suite will. Backend can use the ET APIs, if they partition it - it is expected to work correctly. If they don't partition is that is also OK.

The idea is that, if I have an op A, and the backend partitioner makes a mistake and doesn't partition, and fall back to portable, the test will still pass, but it won't detect the mistakes.

And also, say the backend supports an aten op (not portable), how do we include them here.

My goal is to make the test suit more generic.

it won't detect the mistakes.

Because it is not a mistake, from ET point of view this is still an OK outcome, we could generate a valid PTE. From backend point of view, there should be a report which says this backend didn't partition op A with such and such args. You or someone looking at the report from a backend partition behavior point of view should still be able to "find" it. Making a test fail requires you to know what is expected to work and that I feel is backend's job. I.e. to test a given backend partition for a "this is expected to work" is a job for individual backend's tests. I hope this helps.

And also, say the backend supports an aten op (not portable), how do we include them here.

Gregory had a simple FACTO configuration for Linear op. In that case I would imagine if one doesn't partition we just stop the test then. I am not in favor of listing every single "do-no-decompose" op in FACTO config just a common ones. And the burden for testing all of them falls on the backend tests.

My goal is to make the test suit more generic.

I think I understand. But I don't want this to be seen as a substitute for individual backend tests. If you say, let's add enough configurability so we can repurpose this framework to test my backend. I think that can happen, organically like XNNPACK Tester but I don't want that to be the goal for this effort esp not for GA.

digantdesai

Looks good. Left some comments.

I am glad we droped "compliance" word from this. :)

backends/test/suite/operators/test_add.py

backends/test/suite/README.md

backends/test/suite/__init__.py

digantdesai · 2025-07-10T10:08:52Z

backends/test/suite/__init__.py

+
+# Generate test cases for each backend flow.
+def _create_tests(cls):
+    for key in dir(cls):


assert cls type?

backends/test/suite/__init__.py

backends/test/suite/README.md

backends/test/suite/operators/test_add.py

facebook-github-bot · 2025-07-10T21:18:44Z

@GregoryComer has imported this pull request. If you are a Meta employee, you can view this in D77967739.

facebook-github-bot · 2025-07-10T23:54:35Z

@GregoryComer has imported this pull request. If you are a Meta employee, you can view this in D77967739.

facebook-github-bot · 2025-07-11T01:01:12Z

@GregoryComer has imported this pull request. If you are a Meta employee, you can view this in D77967739.

facebook-github-bot · 2025-07-11T02:04:39Z

@GregoryComer has imported this pull request. If you are a Meta employee, you can view this in D77967739.

facebook-github-bot · 2025-07-13T19:15:20Z

@GregoryComer has imported this pull request. If you are a Meta employee, you can view this in D77967739.

### Summary Add the initial skeleton of reporting code for the backend tester. This PR is primarily focused on putting the hooks and runner structure in place. Follow-up work will expand the scope of collection and reporting outputs. This PR adds the following: - CLI runner for the test suite. - Basic test result breakdown by success / fail and cause (failing in lowering vs output mismatch, for example). - Refactoring of test suite logic to clean things up. Next steps: - Aggregate results by flow (backend). - Add additional CLI flags to allow filtering backends and dtypes. - Land more of the operator test suite. - Wire up flows for quantized operators. Note that this PR is stacked on (and thus includes) #11960. I accidently broke my ghstack, so I'm converting this to a normal PR. Sample output (XNNPACK): ``` Test Session Summary: 84 Passed / 95 11 Failed / 95 0 Skipped / 95 [Success] 66 Delegated 18 Undelegated [Failure] 4 Lowering Fail 0 PTE Load Fail 0 PTE Run Fail 6 Output Mismatch Fail ``` Reproduce with `ET_TEST_ENABLED_BACKENDS=xnnpack python -m executorch.backends.test.suite.runner.executorch.backends.test.suite`. I've temporarily commented out non-f32 dtypes to work around some crashes in XNNPACK, which are non-recoverable from Python.

GregoryComer requested a review from cccclai as a code owner June 25, 2025 06:36

This was referenced Jun 25, 2025

[Backend Tester] Add FACTO operator test skeleton #11953

Merged

[Backend Tester] Add CoreML tester implementation #11959

Merged

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 25, 2025

GregoryComer mentioned this pull request Jun 25, 2025

[Backend Tester] Add tests for activation functions #11961

Open

GregoryComer changed the title ~~Add compliance suite skeleton and operator tests~~ [[Backend Tester] Add compliance suite skeleton and operator tests Jun 25, 2025

GregoryComer added a commit that referenced this pull request Jun 25, 2025

Add compliance suite skeleton and operator tests

abb9b7b

ghstack-source-id: a21d704 ghstack-comment-id: 3003538522 Pull-Request: #11960

GregoryComer changed the title ~~[[Backend Tester] Add compliance suite skeleton and operator tests~~ [Backend Tester] Add compliance suite skeleton and operator tests Jun 25, 2025

GregoryComer requested a review from digantdesai June 25, 2025 20:42

digantdesai reviewed Jul 2, 2025

View reviewed changes

GregoryComer force-pushed the gh/GregoryComer/69/head branch from a931676 to 4408e9a Compare July 8, 2025 17:23

GregoryComer requested review from mcr229 and shoumikhin as code owners July 8, 2025 17:23

GregoryComer force-pushed the gh/GregoryComer/69/head branch from 4408e9a to d5041c9 Compare July 8, 2025 17:42

Base automatically changed from gh/GregoryComer/69/head to main July 8, 2025 18:01

GregoryComer force-pushed the gh/GregoryComer/70/head branch 2 times, most recently from 8cbc094 to 15c73c6 Compare July 8, 2025 21:38

GregoryComer changed the title ~~[Backend Tester] Add compliance suite skeleton and operator tests~~ [Backend Tester] Add backend test suite skeleton Jul 8, 2025

GregoryComer force-pushed the gh/GregoryComer/70/head branch from 15c73c6 to 096347f Compare July 8, 2025 22:27

GregoryComer requested review from larryliu0820 and kirklandsign as code owners July 8, 2025 22:27

GregoryComer force-pushed the gh/GregoryComer/70/head branch from 096347f to 164ec42 Compare July 8, 2025 22:30

GregoryComer force-pushed the gh/GregoryComer/70/head branch from 164ec42 to ea773b1 Compare July 8, 2025 23:06

GregoryComer added the release notes: none Do not include this in the release notes label Jul 8, 2025

GregoryComer mentioned this pull request Jul 9, 2025

[Backend Test] Backend test reporting skeleton #12296

Merged

digantdesai approved these changes Jul 10, 2025

View reviewed changes

GregoryComer force-pushed the gh/GregoryComer/70/head branch from ea773b1 to 0cea25b Compare July 10, 2025 21:12

GregoryComer force-pushed the gh/GregoryComer/70/head branch from 0cea25b to ced51a3 Compare July 10, 2025 23:53

GregoryComer force-pushed the gh/GregoryComer/70/head branch 2 times, most recently from 1d09ca2 to 1adb38f Compare July 11, 2025 00:28

GregoryComer force-pushed the gh/GregoryComer/70/head branch from 1adb38f to 8eab1d9 Compare July 11, 2025 02:00

GregoryComer force-pushed the gh/GregoryComer/70/head branch from 8eab1d9 to 3e9930e Compare July 13, 2025 02:45

[Backend Tester] Add backend operator test suite skeleton

2eb2407

GregoryComer force-pushed the gh/GregoryComer/70/head branch from 3e9930e to 2eb2407 Compare July 13, 2025 05:48

GregoryComer merged commit effbb0e into main Jul 14, 2025
197 of 200 checks passed

GregoryComer deleted the gh/GregoryComer/70/head branch July 14, 2025 18:23

		@@ -0,0 +1,15 @@
		# Operator Compliance Test Suite

		This directory contains operator tests that all backends are expected to pass. While not every backend will implement every operator or permutation, the expectation is that backend partitioners will only partition nodes that the backend can support. The partitioner should never error out due to not supporting an input node.


		ALL_TEST_FLOWS = []

		if is_backend_enabled("xnnpack"):

[Backend Tester] Add backend test suite skeleton #11960

[Backend Tester] Add backend test suite skeleton #11960

Uh oh!

Conversation

GregoryComer commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GregoryComer commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11960

❌ 1 New Failure

Uh oh!

cccclai commented Jun 25, 2025

Uh oh!

GregoryComer commented Jun 25, 2025

Uh oh!

cccclai commented Jun 25, 2025

Uh oh!

GregoryComer commented Jun 26, 2025

Uh oh!

cccclai commented Jun 26, 2025

Uh oh!

digantdesai left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

digantdesai commented Jul 2, 2025

Uh oh!

cccclai commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Jul 8, 2025

Uh oh!

digantdesai commented Jul 9, 2025

Uh oh!

digantdesai left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

facebook-github-bot commented Jul 10, 2025

Uh oh!

facebook-github-bot commented Jul 10, 2025

Uh oh!

facebook-github-bot commented Jul 11, 2025

Uh oh!

facebook-github-bot commented Jul 11, 2025

Uh oh!

facebook-github-bot commented Jul 13, 2025

Uh oh!

Uh oh!

Uh oh!

GregoryComer commented Jun 25, 2025 •

edited

Loading

GregoryComer commented Jun 25, 2025 •

edited

Loading

pytorch-bot bot commented Jun 25, 2025 •

edited

Loading

cccclai commented Jul 2, 2025 •

edited

Loading

digantdesai left a comment •

edited

Loading