Enable CPU nightly performance benchmark and its Markdown report #18444

louie-tsai · 2025-05-21T00:30:59Z

Need to standardize vLLM CPU benchmarks among customers and intel users by using vLLM benchmark suite.
Also hope to enable CPU perf numbers on vLLM performance dashboard.

Enable vLLM benchmark suite for CPU and below are snapshot of serving benchmark report.
numbers are aligned with our Xeon EMR numbers.

How to run it on CPU under vllm folder:
ON_CPU=1 bash .buildkite/nightly-benchmarks/scripts/run-performance-benchmarks.sh

also added a new section "Platform Information" section to list out CPU info

Here is the full report.
benchmark_results_0527_3.md

overall, it took ~2 hours for current tests.

it also needs to have auto OMP thread binding from below PR.
#17930

Also added a compare-json-results.py to compare among different benchmark-results.json files

github-actions · 2025-05-21T00:31:06Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

louie-tsai · 2025-06-04T01:10:59Z

@bigPYJ1151 could you help to review this PR?

louie-tsai · 2025-06-04T22:16:15Z

@xuechendi please help to review and merge the PR. thanks

xuechendi · 2025-06-09T19:01:12Z

@louie-tsai , It looks like the rename for GPU script is not necessary, do you think it is OK to drop changes to existing GPU script and only add for CPU?

xuechendi · 2025-06-09T19:03:56Z

@bigPYJ1151 , please take a look of this PR, the custom facing team want to provide CPU benchmark script from VLLM upstream repo, so customer can reproduce the number easily. Please check if current test settings makes sense to you.

Signed-off-by: Tsai, Louie <[email protected]>

bigPYJ1151

Frankly speaking, I'm not sure reusing the benchmark script between CPU and GPU is a good idea.

Meanwhile, this PR also relates to vllm CI infra, needs to setup benchmark machines and pipelines. Do we have plan to establish these?

bigPYJ1151 · 2025-06-11T09:07:01Z

.buildkite/nightly-benchmarks/README.md

 Nightly benchmark will be triggered when:
 - Every commit for those PRs with `perf-benchmarks` label and `nightly-benchmarks` label.

 ## Performance benchmark details

-See [performance-benchmarks-descriptions.md](performance-benchmarks-descriptions.md) for detailed descriptions, and use `tests/latency-tests.json`, `tests/throughput-tests.json`, `tests/serving-tests.json` to configure the test cases.
-
+See [performance-benchmarks-descriptions.md](performance-benchmarks-descriptions.md) for detailed descriptions, and use `tests/latency-tests-gpu.json`, `tests/throughput-tests-gpu.json`, `tests/serving-tests-gpu.json` to configure the test cases.


I also think adding -gpu suffix is not necessary as it will introduce lots of changes.

changed it accordingly.

bigPYJ1151 · 2025-06-11T09:26:51Z

.buildkite/nightly-benchmarks/scripts/convert-results-json-to-markdown.py

+    import pandas as pd
+    import psutil
+    from cpuinfo import get_cpu_info
+    from numa import info


Perhaps numa should be added to requirements like test.in.

added them accordingly.
We are targeting on release docker image. should we use test image instead?
if yes, do you have build command for test image?

Right, the test image included more dictionaries and packages for testing/benchmarking.

To build it, just add --target=vllm-test when building the CPU image.

bigPYJ1151 · 2025-06-11T09:28:22Z

.buildkite/nightly-benchmarks/scripts/convert-results-json-to-markdown.py

@@ -187,7 +222,7 @@ def results_to_json(latency, throughput, serving):
        # The GPUs sometimes come in format of "GPUTYPE\nGPUTYPE\n...",
        # we want to turn it into "8xGPUTYPE"
        df["GPU"] = df["GPU"].apply(
-            lambda x: f"{len(x.split('\n'))}x{x.split('\n')[0]}"
+            lambda x: f"{len(x.split(chg_line_char))}x{x.split(chg_line_char)[0]}"


Why is this change required?

used to face a parsing issue on CPU results, but no issue on the latest code. remove the changes.

louie-tsai · 2025-06-11T21:54:04Z

Frankly speaking, I'm not sure reusing the benchmark script between CPU and GPU is a good idea.

Meanwhile, this PR also relates to vllm CI infra, needs to setup benchmark machines and pipelines. Do we have plan to establish these?

for the similar user experiences among different architectures, might be good to have same benchmark scripts between CPU and GPU. we separate the run arguments into different json files, so bash script mostly are general to both cpu and gpu. Hope that explanations could make more sense to you for this change.
Yes. Chendi and I plan to use EMR as the benchmark machine.

bigPYJ1151 · 2025-06-12T02:14:13Z

.buildkite/nightly-benchmarks/scripts/convert-results-json-to-markdown.py

+    import pandas as pd
+    import psutil
+    from cpuinfo import get_cpu_info
+    from numa import info


Right, the test image included more dictionaries and packages for testing/benchmarking.

To build it, just add --target=vllm-test when building the CPU image.

bigPYJ1151 · 2025-06-12T02:16:59Z

docker/Dockerfile.cpu

@@ -134,6 +134,8 @@ ENTRYPOINT ["bash"]
 FROM base AS vllm-openai

 WORKDIR /workspace/
+ADD ./benchmarks/ ./benchmarks/
+ADD ./.buildkite/ ./.buildkite/


If using the test image, these change will not be required.

removed them accordingly. thanks

bigPYJ1151 · 2025-06-12T02:20:44Z

.buildkite/nightly-benchmarks/tests/latency-tests-cpu.json

+    {
+        "test_name": "latency_llama8B_tp1",
+        "environment_variables": {
+	    "VLLM_USE_V1": 1,


After the CPU V1 merged, VLLM_USE_V1 is not required.

addressed it accordingly.

bigPYJ1151 · 2025-06-12T02:26:26Z

.buildkite/nightly-benchmarks/tests/latency-tests-cpu.json

+            "VLLM_RPC_TIMEOUT": 1000000,
+	    "VLLM_ALLOW_LONG_MAX_MODEL_LEN": 1,
+	    "VLLM_ENGINE_ITERATION_TIMEOUT_S": 600,


VLLM_RPC_TIMEOUT and VLLM_ENGINE_ITERATION_TIMEOUT_S only effect for serving benchmarks, so setting them for latency/throughput benchmarks is not necessary.

Meanwhile, even for serving benchmarks, I think setting them with large value is not a good idea. If a benchmark setting exceeds CPU device capability and takes too much time, the result may be less meaningful, and we should adjust the benchmark/engine settings to be more practical. So with effective timeouts we can know whether our benchmark/engine settings are good.

thanks for the suggestions. addressed it accordingly.

louie-tsai · 2025-06-17T04:28:27Z

RESERVE 0 CPU per NUMA node due to better performance without reservation for llama3.1 8b on AWS c7i.48xlarge instance.

Signed-off-by: Tsai, Louie <[email protected]>

bigPYJ1151 · 2025-06-19T02:47:40Z

.buildkite/nightly-benchmarks/scripts/convert-results-json-to-markdown.py

+    from cpuinfo import get_cpu_info
+    from numa import info


cpuinfo and numa will break the CUDA benchmarking as they are missing in the CUDA containers.

We should check them here and only capture platform_data when the packages are available.

addressed it accordingly

Signed-off-by: Tsai, Louie <[email protected]> fix

Signed-off-by: Tsai, Louie <[email protected]>

mergify bot added the ci/build label May 21, 2025

louie-tsai force-pushed the nightly_cpu_benchmark branch 9 times, most recently from 913611c to 298aba1 Compare June 3, 2025 00:21

louie-tsai force-pushed the nightly_cpu_benchmark branch 3 times, most recently from 1299794 to 51ede3a Compare June 4, 2025 01:09

louie-tsai force-pushed the nightly_cpu_benchmark branch 2 times, most recently from 2f11d5e to 21ce351 Compare June 4, 2025 21:22

louie-tsai force-pushed the nightly_cpu_benchmark branch 2 times, most recently from 5bf0667 to 433ed5c Compare June 9, 2025 17:53

louie-tsai force-pushed the nightly_cpu_benchmark branch 3 times, most recently from 5164476 to 7eb926e Compare June 10, 2025 01:58

louie-tsai added 5 commits June 10, 2025 13:02

Enable CPU nightly performance benchmark and its Markdown report

239912c

Signed-off-by: Tsai, Louie <[email protected]>

remove V1 feature for CPU

5251551

Signed-off-by: Tsai, Louie <[email protected]>

add platform info into performance.md report

057c106

Signed-off-by: Tsai, Louie <[email protected]>

remove one cpu test case

36e855d

Signed-off-by: Tsai, Louie <[email protected]>

add benchmark scripts

d632e35

Signed-off-by: Tsai, Louie <[email protected]>

bigPYJ1151 reviewed Jun 11, 2025

View reviewed changes

louie-tsai force-pushed the nightly_cpu_benchmark branch 2 times, most recently from a876145 to 2353aa2 Compare June 11, 2025 21:47

louie-tsai force-pushed the nightly_cpu_benchmark branch from 23bda57 to 1f565b5 Compare June 11, 2025 21:54

louie-tsai requested a review from bigPYJ1151 June 11, 2025 21:54

bigPYJ1151 reviewed Jun 12, 2025

View reviewed changes

louie-tsai force-pushed the nightly_cpu_benchmark branch from ad0345c to f12ac3e Compare June 12, 2025 04:32

louie-tsai requested a review from bigPYJ1151 June 12, 2025 04:33

louie-tsai force-pushed the nightly_cpu_benchmark branch 6 times, most recently from 22060b1 to 1d0dc62 Compare June 13, 2025 17:53

louie-tsai mentioned this pull request Jun 13, 2025

[WIP] Draft to enable CPU benchmark for VLLM Perf Dashboard. pytorch/pytorch-integration-testing#39

Open

louie-tsai force-pushed the nightly_cpu_benchmark branch from 1d0dc62 to ee193b7 Compare June 16, 2025 05:50

louie-tsai force-pushed the nightly_cpu_benchmark branch 3 times, most recently from 12f7c6d to 5836136 Compare June 17, 2025 17:43

louie-tsai added 2 commits June 17, 2025 22:36

Add a json comparison script to compare two benchmark_results.json files

0765c91

Signed-off-by: Tsai, Louie <[email protected]>

add more tests to cover different number of numa nodes on CPU/Xeon

a601ab4

Signed-off-by: Tsai, Louie <[email protected]>

louie-tsai force-pushed the nightly_cpu_benchmark branch 2 times, most recently from 3f045db to ea7ab42 Compare June 18, 2025 15:12

bigPYJ1151 reviewed Jun 19, 2025

View reviewed changes

louie-tsai added 2 commits June 19, 2025 23:31

Addressed review feedback and restore original json file names for GPU

7321bb6

Signed-off-by: Tsai, Louie <[email protected]> fix

optimiztions for serving

d9d64e9

Signed-off-by: Tsai, Louie <[email protected]>

louie-tsai force-pushed the nightly_cpu_benchmark branch from ea7ab42 to d9d64e9 Compare June 20, 2025 06:34

louie-tsai requested a review from bigPYJ1151 June 20, 2025 06:34

Uh oh!

Enable CPU nightly performance benchmark and its Markdown report #18444

Are you sure you want to change the base?

Enable CPU nightly performance benchmark and its Markdown report #18444

Conversation

louie-tsai commented May 21, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 21, 2025

Uh oh!

louie-tsai commented Jun 4, 2025

Uh oh!

louie-tsai commented Jun 4, 2025

Uh oh!

xuechendi commented Jun 9, 2025

Uh oh!

xuechendi commented Jun 9, 2025

Uh oh!

bigPYJ1151 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

louie-tsai commented Jun 11, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bigPYJ1151 Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

louie-tsai commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

louie-tsai commented May 21, 2025 •

edited by github-actions bot

Loading

bigPYJ1151 Jun 12, 2025 •

edited

Loading

louie-tsai commented Jun 17, 2025 •

edited

Loading