Description
Changes
Trying to run vLLM with VLLM_SPYRE_WARMUP_NEW_TOKENS
set to X number always results in error:
No applicable warmup shape exists for combination of prompt length (Z tokens) and maximum number of output tokens to be generated (Y tokens)
Where Z
is input token count and Y
is Z + VLLM_SPYRE_WARMUP_NEW_TOKENS
Metadata
Metadata
Assignees
Labels
No labels