[Platform] Add custom default max tokens #18557

gmarinho2 · 2025-05-22T17:11:45Z

Currently the default max tokens for the completions API is set to max_model_len - prompt_len. The changes in this PR make so that when a platform needs to use a different value for default_max_tokens it can be altered simply by overriding the maybe_update_max_tokens method in the class Plataform. When it is not needed it returns the current default. Edit: typo in commit message: class Plataform is meant to be class Platform.

Signed-off-by: Gabriel Marinho <[email protected]>

github-actions · 2025-05-22T17:11:56Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

maxdebayser

LGTM

joerunde · 2025-05-23T17:52:27Z

Is this something that can be handled by --generation-config?

--generation-config
The folder path to the generation config. Defaults to "auto", the generation config will be loaded from model path. If set to "vllm", no generation config is loaded, vLLM defaults will be used. If set to a folder path, the generation config will be loaded from the specified folder path. If max_new_tokens is specified in generation config, then it sets a server-wide limit on the number of output tokens for all requests.

Default: 'auto'

Should the chat api be respecting a max_new_tokens override from the generation config instead of setting the default to max_model_len - prompt_len? That would allow a default override to be set regardless of platforms.

That said, I do like the code hook to be able to write whatever code you want too...

maxdebayser · 2025-05-28T17:40:54Z

Is this something that can be handled by --generation-config?
Unfortunately not because in vllm-spyre the permissible max_new_tokens depends on the request.

vllm/platforms/interface.py

NickLucche

Thanks for the contribution!

Shouldn't we update serving_completion too?

Signed-off-by: Gabriel Marinho <[email protected]>

gmarinho2 · 2025-05-29T18:20:13Z

Thanks for the contribution!

Shouldn't we update serving_completion too?

Done. Since class CompletionRequest has 16 as default it will probably be selected most of the time because the default is set to be the minimum between context window, user request & server limit: REF1, REF2, REF3, REF4.

joerunde · 2025-06-09T17:01:59Z

@youkaichao any thoughts on getting this merged?

NickLucche

lgtm! Apologies for the delay.

joerunde · 2025-06-12T14:05:15Z

Let's get the full CI running then and see if we can get a maintainer to get this merged 👍

joerunde · 2025-06-16T16:50:06Z

@njhill can you hit the big green merge button for us?

vllm/platforms/interface.py

…ams() Signed-off-by: Gabriel Marinho <[email protected]>

Signed-off-by: Gabriel Marinho <[email protected]>

vllm/entrypoints/openai/serving_completion.py

Signed-off-by: Gabriel Marinho <[email protected]>

aarnphm

One formatting comment, otherwise lgtm

vllm/entrypoints/openai/protocol.py

Signed-off-by: Gabriel Marinho <[email protected]>

add maybe_update_max_tokens method to class Plataform

552e08a

Signed-off-by: Gabriel Marinho <[email protected]>

mergify bot added the frontend label May 22, 2025

wallashss mentioned this pull request May 22, 2025

Add maybe_update_max_tokens for class SpyrePlatform vllm-project/vllm-spyre#179

Draft

maxdebayser approved these changes May 22, 2025

View reviewed changes

youkaichao reviewed May 29, 2025

View reviewed changes

vllm/platforms/interface.py Outdated Show resolved Hide resolved

NickLucche suggested changes May 29, 2025

View reviewed changes

gmarinho2 added 2 commits May 29, 2025 14:46

add maybe_update_max_tokens to serving completions

7109936

Signed-off-by: Gabriel Marinho <[email protected]>

add comment for clarity

317e553

Signed-off-by: Gabriel Marinho <[email protected]>

gmarinho2 requested a review from NickLucche May 30, 2025 18:53

NickLucche approved these changes Jun 10, 2025

View reviewed changes

joerunde added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 12, 2025

njhill reviewed Jun 16, 2025

View reviewed changes

vllm/platforms/interface.py Outdated Show resolved Hide resolved

remove check for min from to_beam_search_params() and to_sampling_par…

063fd0e

…ams() Signed-off-by: Gabriel Marinho <[email protected]>

gmarinho2 requested a review from aarnphm as a code owner June 17, 2025 20:45

gmarinho2 and others added 2 commits June 17, 2025 17:46

Merge branch 'main' into default-tokens

2c66dda

fix mypy type errors

61e91cb

Signed-off-by: Gabriel Marinho <[email protected]>

maxdebayser reviewed Jun 18, 2025

View reviewed changes

vllm/entrypoints/openai/serving_completion.py Show resolved Hide resolved

put repeated operations in a function in utils.py

780789d

Signed-off-by: Gabriel Marinho <[email protected]>

aarnphm approved these changes Jun 18, 2025

View reviewed changes

vllm/entrypoints/openai/protocol.py Show resolved Hide resolved

vllm/entrypoints/openai/protocol.py Outdated Show resolved Hide resolved

add end comma

0329a53

Signed-off-by: Gabriel Marinho <[email protected]>

aarnphm approved these changes Jun 19, 2025

View reviewed changes

aarnphm changed the title ~~Add custom default max tokens for different plataforms~~ [Platform] Add custom default max tokens Jun 19, 2025

aarnphm enabled auto-merge (squash) June 19, 2025 05:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Platform] Add custom default max tokens #18557

[Platform] Add custom default max tokens #18557

gmarinho2 commented May 22, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented May 22, 2025

Uh oh!

maxdebayser left a comment

Uh oh!

joerunde commented May 23, 2025

Uh oh!

maxdebayser commented May 28, 2025

Uh oh!

Uh oh!

NickLucche left a comment

Uh oh!

gmarinho2 commented May 29, 2025

Uh oh!

joerunde commented Jun 9, 2025

Uh oh!

NickLucche left a comment •

edited

Loading

Uh oh!

joerunde commented Jun 12, 2025

Uh oh!

joerunde commented Jun 16, 2025

Uh oh!

Uh oh!

Uh oh!

aarnphm left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[Platform] Add custom default max tokens #18557

Are you sure you want to change the base?

[Platform] Add custom default max tokens #18557

Conversation

gmarinho2 commented May 22, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 22, 2025

Uh oh!

maxdebayser left a comment

Choose a reason for hiding this comment

Uh oh!

joerunde commented May 23, 2025

Uh oh!

maxdebayser commented May 28, 2025

Uh oh!

Uh oh!

NickLucche left a comment

Choose a reason for hiding this comment

Uh oh!

gmarinho2 commented May 29, 2025

Uh oh!

joerunde commented Jun 9, 2025

Uh oh!

NickLucche left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joerunde commented Jun 12, 2025

Uh oh!

joerunde commented Jun 16, 2025

Uh oh!

Uh oh!

Uh oh!

aarnphm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gmarinho2 commented May 22, 2025 •

edited by github-actions bot

Loading

NickLucche left a comment •

edited

Loading