-
-
Notifications
You must be signed in to change notification settings - Fork 8.2k
Enable CPU nightly performance benchmark and its Markdown report #18444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
913611c
to
298aba1
Compare
1299794
to
51ede3a
Compare
@bigPYJ1151 could you help to review this PR? |
2f11d5e
to
21ce351
Compare
@xuechendi please help to review and merge the PR. thanks |
5bf0667
to
433ed5c
Compare
@louie-tsai , It looks like the rename for GPU script is not necessary, do you think it is OK to drop changes to existing GPU script and only add for CPU? |
@bigPYJ1151 , please take a look of this PR, the custom facing team want to provide CPU benchmark script from VLLM upstream repo, so customer can reproduce the number easily. Please check if current test settings makes sense to you. |
5164476
to
7eb926e
Compare
Signed-off-by: Tsai, Louie <[email protected]>
Signed-off-by: Tsai, Louie <[email protected]>
Signed-off-by: Tsai, Louie <[email protected]>
Signed-off-by: Tsai, Louie <[email protected]>
Signed-off-by: Tsai, Louie <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Frankly speaking, I'm not sure reusing the benchmark script between CPU and GPU is a good idea.
Meanwhile, this PR also relates to vllm CI infra, needs to setup benchmark machines and pipelines. Do we have plan to establish these?
Nightly benchmark will be triggered when: | ||
- Every commit for those PRs with `perf-benchmarks` label and `nightly-benchmarks` label. | ||
|
||
## Performance benchmark details | ||
|
||
See [performance-benchmarks-descriptions.md](performance-benchmarks-descriptions.md) for detailed descriptions, and use `tests/latency-tests.json`, `tests/throughput-tests.json`, `tests/serving-tests.json` to configure the test cases. | ||
|
||
See [performance-benchmarks-descriptions.md](performance-benchmarks-descriptions.md) for detailed descriptions, and use `tests/latency-tests-gpu.json`, `tests/throughput-tests-gpu.json`, `tests/serving-tests-gpu.json` to configure the test cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think adding -gpu
suffix is not necessary as it will introduce lots of changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed it accordingly.
import pandas as pd | ||
import psutil | ||
from cpuinfo import get_cpu_info | ||
from numa import info |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps numa
should be added to requirements like test.in
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added them accordingly.
We are targeting on release docker image. should we use test image instead?
if yes, do you have build command for test image?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, the test image included more dictionaries and packages for testing/benchmarking.
To build it, just add --target=vllm-test
when building the CPU image.
@@ -187,7 +222,7 @@ def results_to_json(latency, throughput, serving): | |||
# The GPUs sometimes come in format of "GPUTYPE\nGPUTYPE\n...", | |||
# we want to turn it into "8xGPUTYPE" | |||
df["GPU"] = df["GPU"].apply( | |||
lambda x: f"{len(x.split('\n'))}x{x.split('\n')[0]}" | |||
lambda x: f"{len(x.split(chg_line_char))}x{x.split(chg_line_char)[0]}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this change required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
used to face a parsing issue on CPU results, but no issue on the latest code. remove the changes.
a876145
to
2353aa2
Compare
for the similar user experiences among different architectures, might be good to have same benchmark scripts between CPU and GPU. we separate the run arguments into different json files, so bash script mostly are general to both cpu and gpu. Hope that explanations could make more sense to you for this change. |
23bda57
to
1f565b5
Compare
import pandas as pd | ||
import psutil | ||
from cpuinfo import get_cpu_info | ||
from numa import info |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, the test image included more dictionaries and packages for testing/benchmarking.
To build it, just add --target=vllm-test
when building the CPU image.
docker/Dockerfile.cpu
Outdated
@@ -134,6 +134,8 @@ ENTRYPOINT ["bash"] | |||
FROM base AS vllm-openai | |||
|
|||
WORKDIR /workspace/ | |||
ADD ./benchmarks/ ./benchmarks/ | |||
ADD ./.buildkite/ ./.buildkite/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If using the test image, these change will not be required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed them accordingly. thanks
{ | ||
"test_name": "latency_llama8B_tp1", | ||
"environment_variables": { | ||
"VLLM_USE_V1": 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the CPU V1 merged, VLLM_USE_V1
is not required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed it accordingly.
"VLLM_RPC_TIMEOUT": 1000000, | ||
"VLLM_ALLOW_LONG_MAX_MODEL_LEN": 1, | ||
"VLLM_ENGINE_ITERATION_TIMEOUT_S": 600, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VLLM_RPC_TIMEOUT
and VLLM_ENGINE_ITERATION_TIMEOUT_S
only effect for serving benchmarks, so setting them for latency/throughput benchmarks is not necessary.
Meanwhile, even for serving benchmarks, I think setting them with large value is not a good idea. If a benchmark setting exceeds CPU device capability and takes too much time, the result may be less meaningful, and we should adjust the benchmark/engine settings to be more practical. So with effective timeouts we can know whether our benchmark/engine settings are good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the suggestions. addressed it accordingly.
ad0345c
to
f12ac3e
Compare
22060b1
to
1d0dc62
Compare
1d0dc62
to
ee193b7
Compare
RESERVE 0 CPU per NUMA node due to better performance without reservation for llama3.1 8b on AWS c7i.48xlarge instance. |
12f7c6d
to
5836136
Compare
Signed-off-by: Tsai, Louie <[email protected]>
Signed-off-by: Tsai, Louie <[email protected]>
3f045db
to
ea7ab42
Compare
from cpuinfo import get_cpu_info | ||
from numa import info |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cpuinfo
and numa
will break the CUDA benchmarking as they are missing in the CUDA containers.
We should check them here and only capture platform_data
when the packages are available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed it accordingly
Signed-off-by: Tsai, Louie <[email protected]> fix
Signed-off-by: Tsai, Louie <[email protected]>
ea7ab42
to
d9d64e9
Compare
Need to standardize vLLM CPU benchmarks among customers and intel users by using vLLM benchmark suite.
Also hope to enable CPU perf numbers on vLLM performance dashboard.
Enable vLLM benchmark suite for CPU and below are snapshot of serving benchmark report.
numbers are aligned with our Xeon EMR numbers.
How to run it on CPU under vllm folder:
ON_CPU=1 bash .buildkite/nightly-benchmarks/scripts/run-performance-benchmarks.sh
also added a new section "Platform Information" section to list out CPU info
Here is the full report.
benchmark_results_0527_3.md
overall, it took ~2 hours for current tests.
it also needs to have auto OMP thread binding from below PR.
#17930
Also added a compare-json-results.py to compare among different benchmark-results.json files
