RuntimeError('chunked prefill cannot be used with prefix caching now.')

Tried following command and the engine crashes.

vllm serve Qwen/Qwen2.5-7B-Instruct \
	--dtype auto \
	--enable-chunked-prefill \
	--enable-prefix-caching \
	--api-key token-abc123

Error Observed
----------------

ERROR 05-28 06:51:06 [engine.py:160] RuntimeError('chunked prefill cannot be used with prefix caching now.')
ERROR 05-28 06:51:06 [engine.py:160] Traceback (most recent call last):
ERROR 05-28 06:51:06 [engine.py:160]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/engine.py", line 158, in start
ERROR 05-28 06:51:06 [engine.py:160]     self.run_engine_loop()
ERROR 05-28 06:51:06 [engine.py:160]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/engine.py", line 221, in run_engine_loop
ERROR 05-28 06:51:06 [engine.py:160]     request_outputs = self.engine_step()
ERROR 05-28 06:51:06 [engine.py:160]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/engine.py", line 247, in engine_step
ERROR 05-28 06:51:06 [engine.py:160]     raise e
ERROR 05-28 06:51:06 [engine.py:160]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/engine.py", line 230, in engine_step
ERROR 05-28 06:51:06 [engine.py:160]     return self.engine.step()
ERROR 05-28 06:51:06 [engine.py:160]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 1431, in step
ERROR 05-28 06:51:06 [engine.py:160]     outputs = self.model_executor.execute_model(
ERROR 05-28 06:51:06 [engine.py:160]   File "/usr/local/lib/python3.10/dist-packages/vllm/executor/executor_base.py", line 140, in execute_model
ERROR 05-28 06:51:06 [engine.py:160]     output = self.collective_rpc("execute_model",
ERROR 05-28 06:51:06 [engine.py:160]   File "/usr/local/lib/python3.10/dist-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
ERROR 05-28 06:51:06 [engine.py:160]     answer = run_method(self.driver_worker, method, args, kwargs)
ERROR 05-28 06:51:06 [engine.py:160]   File "/usr/local/lib/python3.10/dist-packages/vllm/utils.py", line 2378, in run_method
ERROR 05-28 06:51:06 [engine.py:160]     return func(*args, **kwargs)
ERROR 05-28 06:51:06 [engine.py:160]   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 05-28 06:51:06 [engine.py:160]     return func(*args, **kwargs)
ERROR 05-28 06:51:06 [engine.py:160]   File "/usr/local/lib/python3.10/dist-packages/vllm_openvino/worker/openvino_worker.py", line 409, in execute_model
ERROR 05-28 06:51:06 [engine.py:160]     output = self.model_runner.execute_model(seq_group_metadata_list,
ERROR 05-28 06:51:06 [engine.py:160]   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 05-28 06:51:06 [engine.py:160]     return func(*args, **kwargs)
ERROR 05-28 06:51:06 [engine.py:160]   File "/usr/local/lib/python3.10/dist-packages/vllm_openvino/worker/openvino_model_runner.py", line 328, in execute_model
ERROR 05-28 06:51:06 [engine.py:160]     ) = self.prepare_input_tensors(seq_group_metadata_list)
ERROR 05-28 06:51:06 [engine.py:160]   File "/usr/local/lib/python3.10/dist-packages/vllm_openvino/worker/openvino_model_runner.py", line 298, in prepare_input_tensors
ERROR 05-28 06:51:06 [engine.py:160]     ) = self._prepare_model_input(seq_group_metadata_list)
ERROR 05-28 06:51:06 [engine.py:160]   File "/usr/local/lib/python3.10/dist-packages/vllm_openvino/worker/openvino_model_runner.py", line 133, in _prepare_model_input
ERROR 05-28 06:51:06 [engine.py:160]     raise RuntimeError(
ERROR 05-28 06:51:06 [engine.py:160] RuntimeError: chunked prefill cannot be used with prefix caching now.
CRITICAL 05-28 06:51:06 [launcher.py:116] MQLLMEngine is already dead, terminating server process
CRITICAL 05-28 06:51:06 [launcher.py:116] MQLLMEngine is already dead, terminating server process
CRITICAL 05-28 06:51:06 [launcher.py:116] MQLLMEngine is already dead, terminating server process
CRITICAL 05-28 06:51:06 [launcher.py:116] MQLLMEngine is already dead, terminating server process
CRITICAL 05-28 06:51:06 [launcher.py:116] MQLLMEngine is already dead, terminating server process
CRITICAL 05-28 06:51:06 [launcher.py:116] MQLLMEngine is already dead, terminating server process
CRITICAL 05-28 06:51:06 [launcher.py:116] MQLLMEngine is already dead, terminating server process
CRITICAL 05-28 06:51:06 [launcher.py:116] MQLLMEngine is already dead, terminating server process
CRITICAL 05-28 06:51:06 [launcher.py:116] MQLLMEngine is already dead, terminating server process
CRITICAL 05-28 06:51:06 [launcher.py:116] MQLLMEngine is already dead, terminating server process
CRITICAL 05-28 06:51:06 [launcher.py:116] MQLLMEngine is already dead, terminating server process
CRITICAL 05-28 06:51:06 [launcher.py:116] MQLLMEngine is already dead, terminating server process
CRITICAL 05-28 06:51:06 [launcher.py:116] MQLLMEngine is already dead, terminating server process
CRITICAL 05-28 06:51:06 [launcher.py:116] MQLLMEngine is already dead, terminating server process
CRITICAL 05-28 06:51:06 [launcher.py:116] MQLLMEngine is already dead, terminating server process
CRITICAL 05-28 06:51:06 [launcher.py:116] MQLLMEngine is already dead, terminating server process
INFO:     127.0.0.1:38092 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO:     127.0.0.1:38094 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO:     127.0.0.1:38098 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO:     127.0.0.1:38114 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO:     127.0.0.1:38130 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO:     127.0.0.1:38138 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO:     127.0.0.1:38154 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO:     127.0.0.1:38162 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO:     127.0.0.1:38178 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO:     127.0.0.1:38184 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO:     127.0.0.1:38188 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO:     127.0.0.1:38192 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO:     127.0.0.1:38200 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO:     127.0.0.1:38206 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO:     127.0.0.1:38218 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO:     127.0.0.1:38232 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [13]


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RuntimeError('chunked prefill cannot be used with prefix caching now.') #16

Error Observed

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RuntimeError('chunked prefill cannot be used with prefix caching now.') #16

Description

Error Observed

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions