Closed
Description
When we run the sglang inference with --enable-deepep-moe
on Qwen3-30B-A3B, the following error occurs:
raceback (most recent call last):
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2546, in run_scheduler_process
scheduler = Scheduler(server_args, port_args, gpu_id, tp_rank, pp_rank, dp_rank)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 312, in __init__
self.tp_worker = TpWorkerClass(
^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/managers/tp_worker_overlap_thread.py", line 64, in __init__
self.worker = TpModelWorker(
^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/managers/tp_worker.py", line 78, in __init__
self.model_runner = ModelRunner(
^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 212, in __init__
self.initialize(min_per_gpu_memory)
File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 287, in initialize
self.init_cuda_graphs()
File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 1138, in init_cuda_graphs
self.cuda_graph_runner = CudaGraphRunner(self)
^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 320, in __init__
raise Exception(
Exception: Capture cuda graph failed: Failed: Assertion error /sgl-workspace/DeepEP/csrc/kernels/internode_ll.cu:381 'false && "Unsupported hidden"'
It is due to hidden size 2048 is not supported yet.
Metadata
Metadata
Assignees
Labels
No labels