Skip to content

RuntimeError('chunked prefill cannot be used with prefix caching now.') #16

Open
@azhuvath

Description

@azhuvath

Tried following command and the engine crashes.

vllm serve Qwen/Qwen2.5-7B-Instruct
--dtype auto
--enable-chunked-prefill
--enable-prefix-caching
--api-key token-abc123

Error Observed

ERROR 05-28 06:51:06 [engine.py:160] RuntimeError('chunked prefill cannot be used with prefix caching now.')
ERROR 05-28 06:51:06 [engine.py:160] Traceback (most recent call last):
ERROR 05-28 06:51:06 [engine.py:160] File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/engine.py", line 158, in start
ERROR 05-28 06:51:06 [engine.py:160] self.run_engine_loop()
ERROR 05-28 06:51:06 [engine.py:160] File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/engine.py", line 221, in run_engine_loop
ERROR 05-28 06:51:06 [engine.py:160] request_outputs = self.engine_step()
ERROR 05-28 06:51:06 [engine.py:160] File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/engine.py", line 247, in engine_step
ERROR 05-28 06:51:06 [engine.py:160] raise e
ERROR 05-28 06:51:06 [engine.py:160] File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/engine.py", line 230, in engine_step
ERROR 05-28 06:51:06 [engine.py:160] return self.engine.step()
ERROR 05-28 06:51:06 [engine.py:160] File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 1431, in step
ERROR 05-28 06:51:06 [engine.py:160] outputs = self.model_executor.execute_model(
ERROR 05-28 06:51:06 [engine.py:160] File "/usr/local/lib/python3.10/dist-packages/vllm/executor/executor_base.py", line 140, in execute_model
ERROR 05-28 06:51:06 [engine.py:160] output = self.collective_rpc("execute_model",
ERROR 05-28 06:51:06 [engine.py:160] File "/usr/local/lib/python3.10/dist-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
ERROR 05-28 06:51:06 [engine.py:160] answer = run_method(self.driver_worker, method, args, kwargs)
ERROR 05-28 06:51:06 [engine.py:160] File "/usr/local/lib/python3.10/dist-packages/vllm/utils.py", line 2378, in run_method
ERROR 05-28 06:51:06 [engine.py:160] return func(*args, **kwargs)
ERROR 05-28 06:51:06 [engine.py:160] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 05-28 06:51:06 [engine.py:160] return func(*args, **kwargs)
ERROR 05-28 06:51:06 [engine.py:160] File "/usr/local/lib/python3.10/dist-packages/vllm_openvino/worker/openvino_worker.py", line 409, in execute_model
ERROR 05-28 06:51:06 [engine.py:160] output = self.model_runner.execute_model(seq_group_metadata_list,
ERROR 05-28 06:51:06 [engine.py:160] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 05-28 06:51:06 [engine.py:160] return func(*args, **kwargs)
ERROR 05-28 06:51:06 [engine.py:160] File "/usr/local/lib/python3.10/dist-packages/vllm_openvino/worker/openvino_model_runner.py", line 328, in execute_model
ERROR 05-28 06:51:06 [engine.py:160] ) = self.prepare_input_tensors(seq_group_metadata_list)
ERROR 05-28 06:51:06 [engine.py:160] File "/usr/local/lib/python3.10/dist-packages/vllm_openvino/worker/openvino_model_runner.py", line 298, in prepare_input_tensors
ERROR 05-28 06:51:06 [engine.py:160] ) = self._prepare_model_input(seq_group_metadata_list)
ERROR 05-28 06:51:06 [engine.py:160] File "/usr/local/lib/python3.10/dist-packages/vllm_openvino/worker/openvino_model_runner.py", line 133, in _prepare_model_input
ERROR 05-28 06:51:06 [engine.py:160] raise RuntimeError(
ERROR 05-28 06:51:06 [engine.py:160] RuntimeError: chunked prefill cannot be used with prefix caching now.
CRITICAL 05-28 06:51:06 [launcher.py:116] MQLLMEngine is already dead, terminating server process
CRITICAL 05-28 06:51:06 [launcher.py:116] MQLLMEngine is already dead, terminating server process
CRITICAL 05-28 06:51:06 [launcher.py:116] MQLLMEngine is already dead, terminating server process
CRITICAL 05-28 06:51:06 [launcher.py:116] MQLLMEngine is already dead, terminating server process
CRITICAL 05-28 06:51:06 [launcher.py:116] MQLLMEngine is already dead, terminating server process
CRITICAL 05-28 06:51:06 [launcher.py:116] MQLLMEngine is already dead, terminating server process
CRITICAL 05-28 06:51:06 [launcher.py:116] MQLLMEngine is already dead, terminating server process
CRITICAL 05-28 06:51:06 [launcher.py:116] MQLLMEngine is already dead, terminating server process
CRITICAL 05-28 06:51:06 [launcher.py:116] MQLLMEngine is already dead, terminating server process
CRITICAL 05-28 06:51:06 [launcher.py:116] MQLLMEngine is already dead, terminating server process
CRITICAL 05-28 06:51:06 [launcher.py:116] MQLLMEngine is already dead, terminating server process
CRITICAL 05-28 06:51:06 [launcher.py:116] MQLLMEngine is already dead, terminating server process
CRITICAL 05-28 06:51:06 [launcher.py:116] MQLLMEngine is already dead, terminating server process
CRITICAL 05-28 06:51:06 [launcher.py:116] MQLLMEngine is already dead, terminating server process
CRITICAL 05-28 06:51:06 [launcher.py:116] MQLLMEngine is already dead, terminating server process
CRITICAL 05-28 06:51:06 [launcher.py:116] MQLLMEngine is already dead, terminating server process
INFO: 127.0.0.1:38092 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO: 127.0.0.1:38094 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO: 127.0.0.1:38098 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO: 127.0.0.1:38114 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO: 127.0.0.1:38130 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO: 127.0.0.1:38138 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO: 127.0.0.1:38154 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO: 127.0.0.1:38162 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO: 127.0.0.1:38178 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO: 127.0.0.1:38184 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO: 127.0.0.1:38188 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO: 127.0.0.1:38192 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO: 127.0.0.1:38200 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO: 127.0.0.1:38206 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO: 127.0.0.1:38218 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO: 127.0.0.1:38232 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO: Shutting down
INFO: Waiting for application shutdown.
INFO: Application shutdown complete.
INFO: Finished server process [13]

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions