vllm deployment failed

#11
by mondaylord - opened

I was using vllm to deploy kimi-k2-thinking, and the error shows like below. I was using vllm from commit https://github.com/vllm-project/vllm/commit/67a2da890eef2a6fd40384aa5ae80e03beb39490. The args are the same as in the guide. Could you show which is the correct base commit that I should use?

2025-11-08T05:12:31.799956291Z (APIServer pid=1) You are using a model of type kimi_k2 to instantiate a model of type deepseek_v3. This is not supported for all configurations of models and can yield errors.
2025-11-08T05:12:31.800772713Z (APIServer pid=1) INFO 11-07 21:12:31 [config.py:416] Replacing legacy 'type' key with 'rope_type'
2025-11-08T05:12:38.178480433Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706] Error in inspecting model architecture 'DeepseekV3ForCausalLM'
2025-11-08T05:12:38.178531689Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706] Traceback (most recent call last):
2025-11-08T05:12:38.178538843Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/registry.py", line 1128, in _run_in_subprocess
2025-11-08T05:12:38.178545045Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706]     returned.check_returncode()
2025-11-08T05:12:38.178550359Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706]   File "/usr/lib/python3.12/subprocess.py", line 502, in check_returncode
2025-11-08T05:12:38.178555576Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706]     raise CalledProcessError(self.returncode, self.args, self.stdout,
2025-11-08T05:12:38.178560565Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706] subprocess.CalledProcessError: Command '['/usr/bin/python3', '-m', 'vllm.model_executor.models.registry']' died with <Signals.SIGABRT: 6>.
2025-11-08T05:12:38.178565857Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706] 
2025-11-08T05:12:38.178570978Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706] The above exception was the direct cause of the following exception:
2025-11-08T05:12:38.178576239Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706] 
2025-11-08T05:12:38.178581284Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706] Traceback (most recent call last):
2025-11-08T05:12:38.178586124Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/registry.py", line 704, in _try_inspect_model_cls
2025-11-08T05:12:38.178590857Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706]     return model.inspect_model_cls()
2025-11-08T05:12:38.178595421Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^
2025-11-08T05:12:38.178600226Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/logging_utils/log_time.py", line 21, in _wrapper
2025-11-08T05:12:38.178606053Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706]     result = func(*args, **kwargs)
2025-11-08T05:12:38.178610812Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706]              ^^^^^^^^^^^^^^^^^^^^^
2025-11-08T05:12:38.178615284Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/registry.py", line 665, in inspect_model_cls
2025-11-08T05:12:38.178620403Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706]     mi = _run_in_subprocess(
2025-11-08T05:12:38.178625183Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706]          ^^^^^^^^^^^^^^^^^^^
2025-11-08T05:12:38.178629959Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/registry.py", line 1131, in _run_in_subprocess
2025-11-08T05:12:38.178634414Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706]     raise RuntimeError(
2025-11-08T05:12:38.178638871Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706] RuntimeError: Error raised in subprocess:
2025-11-08T05:12:38.178643769Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706] <frozen runpy>:128: RuntimeWarning: 'vllm.model_executor.models.registry' found in sys.modules after import of package 'vllm.model_executor.models', but prior to execution of 'vllm.model_executor.models.registry'; this may result in unpredictable behaviour
2025-11-08T05:12:38.178648755Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706] terminate called after throwing an instance of 'std::bad_alloc'
2025-11-08T05:12:38.178653439Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706]   what():  std::bad_alloc
2025-11-08T05:12:38.178657970Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706] 
Moonshot AI org

do you have the full log? you can post a vllm issue too, with detailed environment information.

do you have the full log? you can post a vllm issue too, with detailed environment information.

Sure, thanks for your reply. I will post a issue in vllm's github repo soon. The environment config is 8xH200, vllm's commit version 67a2da890eef2a6fd40384aa5ae80e03beb39490, and the env is like below

==============================
        System Info
==============================
OS                           : Ubuntu 22.04.5 LTS (x86_64)
GCC version                  : (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version                : Could not collect
CMake version                : version 4.1.0
Libc version                 : glibc-2.35

==============================
       PyTorch Info
==============================
PyTorch version              : 2.9.0+cu128
Is debug build               : False
CUDA used to build PyTorch   : 12.8
ROCM used to build PyTorch   : N/A

==============================
      Python Environment
==============================
Python version               : 3.12.11 (main, Jun  4 2025, 08:56:18) [GCC 11.4.0] (64-bit runtime)
Python platform              : Linux-6.9.0-dstack-x86_64-with-glibc2.35

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : False
CUDA runtime version         : 12.8.93
CUDA_MODULE_LOADING set to   : N/A
GPU models and configuration : Could not collect
Nvidia driver version        : Could not collect
cuDNN version                : Could not collect
HIP runtime version          : N/A
MIOpen runtime version       : N/A
Is XNNPACK available         : True

==============================
          CPU Info
==============================
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Address sizes:                        52 bits physical, 57 bits virtual
Byte Order:                           Little Endian
CPU(s):                               64
On-line CPU(s) list:                  0-63
Vendor ID:                            GenuineIntel
Model name:                           06/cf
CPU family:                           6
Model:                                207
Thread(s) per core:                   1
Core(s) per socket:                   64
Socket(s):                            1
Stepping:                             2
BogoMIPS:                             3800.00
Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush dts mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc pebs bts rep_good nopl tsc_reliable nonstop_tsc cpuid tsc_known_freq pni pclmulqdq dtes64 ds_cpl ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault ssbd ibrs ibpb stibp ibrs_enhanced tdx_guest fsgsbase bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx_vnni avx512_bf16 wbnoinvd avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid bus_lock_detect cldemote movdiri movdir64b fsrm md_clear serialize tsxldtrk amx_bf16 avx512_fp16 amx_tile amx_int8 flush_l1d arch_capabilities
Hypervisor vendor:                    KVM
Virtualization type:                  full
L1d cache:                            2 MiB (64 instances)
L1i cache:                            2 MiB (64 instances)
L2 cache:                             256 MiB (64 instances)
L3 cache:                             16 MiB (1 instance)
NUMA node(s):                         1
NUMA node0 CPU(s):                    0-63
Vulnerability Gather data sampling:   Not affected
Vulnerability Itlb multihit:          Not affected
Vulnerability L1tf:                   Not affected
Vulnerability Mds:                    Not affected
Vulnerability Meltdown:               Not affected
Vulnerability Mmio stale data:        Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed:               Not affected
Vulnerability Spec rstack overflow:   Not affected
Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:             Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence; BHI BHI_DIS_S
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Not affected

==============================
Versions of relevant libraries
==============================
[pip3] flashinfer-python==0.5.2
[pip3] numpy==2.2.6
[pip3] nvidia-cublas-cu12==12.8.4.1
[pip3] nvidia-cuda-cupti-cu12==12.8.90
[pip3] nvidia-cuda-nvrtc-cu12==12.8.93
[pip3] nvidia-cuda-runtime-cu12==12.8.90
[pip3] nvidia-cudnn-cu12==9.10.2.21
[pip3] nvidia-cudnn-frontend==1.16.0
[pip3] nvidia-cufft-cu12==11.3.3.83
[pip3] nvidia-cufile-cu12==1.13.1.3
[pip3] nvidia-curand-cu12==10.3.9.90
[pip3] nvidia-cusolver-cu12==11.7.3.90
[pip3] nvidia-cusparse-cu12==12.5.8.93
[pip3] nvidia-cusparselt-cu12==0.7.1
[pip3] nvidia-cutlass-dsl==4.3.0.dev0
[pip3] nvidia-ml-py==13.580.82
[pip3] nvidia-nccl-cu12==2.27.5
[pip3] nvidia-nvjitlink-cu12==12.8.93
[pip3] nvidia-nvshmem-cu12==3.3.20
[pip3] nvidia-nvtx-cu12==12.8.90
[pip3] pynvml==12.0.0
[pip3] pyzmq==27.1.0
[pip3] torch==2.9.0+cu128
[pip3] torchaudio==2.9.0+cu128
[pip3] torchvision==0.24.0+cu128
[pip3] transformers==4.57.1
[pip3] triton==3.5.0
[conda] Could not collect

==============================
         vLLM Info
==============================
ROCM Version                 : Could not collect
vLLM Version                 : 0.11.1rc6.dev214+g608bb1446 (git sha: 608bb1446)
vLLM Build Flags:
  CUDA Archs: Not Set; ROCm: Disabled
GPU Topology:
  Could not collect

==============================
     Environment Variables
==============================
NVIDIA_VISIBLE_DEVICES=all
NVIDIA_REQUIRE_CUDA=cuda>=12.8 brand=unknown,driver>=470,driver<471 brand=grid,driver>=470,driver<471 brand=tesla,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=quadro,driver>=470,driver<471 brand=quadrortx,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=vapps,driver>=470,driver<471 brand=vpc,driver>=470,driver<471 brand=vcs,driver>=470,driver<471 brand=vws,driver>=470,driver<471 brand=cloudgaming,driver>=470,driver<471 brand=unknown,driver>=535,driver<536 brand=grid,driver>=535,driver<536 brand=tesla,driver>=535,driver<536 brand=nvidia,driver>=535,driver<536 brand=quadro,driver>=535,driver<536 brand=quadrortx,driver>=535,driver<536 brand=nvidiartx,driver>=535,driver<536 brand=vapps,driver>=535,driver<536 brand=vpc,driver>=535,driver<536 brand=vcs,driver>=535,driver<536 brand=vws,driver>=535,driver<536 brand=cloudgaming,driver>=535,driver<536 brand=unknown,driver>=550,driver<551 brand=grid,driver>=550,driver<551 brand=tesla,driver>=550,driver<551 brand=nvidia,driver>=550,driver<551 brand=quadro,driver>=550,driver<551 brand=quadrortx,driver>=550,driver<551 brand=nvidiartx,driver>=550,driver<551 brand=vapps,driver>=550,driver<551 brand=vpc,driver>=550,driver<551 brand=vcs,driver>=550,driver<551 brand=vws,driver>=550,driver<551 brand=cloudgaming,driver>=550,driver<551 brand=unknown,driver>=560,driver<561 brand=grid,driver>=560,driver<561 brand=tesla,driver>=560,driver<561 brand=nvidia,driver>=560,driver<561 brand=quadro,driver>=560,driver<561 brand=quadrortx,driver>=560,driver<561 brand=nvidiartx,driver>=560,driver<561 brand=vapps,driver>=560,driver<561 brand=vpc,driver>=560,driver<561 brand=vcs,driver>=560,driver<561 brand=vws,driver>=560,driver<561 brand=cloudgaming,driver>=560,driver<561 brand=unknown,driver>=565,driver<566 brand=grid,driver>=565,driver<566 brand=tesla,driver>=565,driver<566 brand=nvidia,driver>=565,driver<566 brand=quadro,driver>=565,driver<566 brand=quadrortx,driver>=565,driver<566 brand=nvidiartx,driver>=565,driver<566 brand=vapps,driver>=565,driver<566 brand=vpc,driver>=565,driver<566 brand=vcs,driver>=565,driver<566 brand=vws,driver>=565,driver<566 brand=cloudgaming,driver>=565,driver<566
NCCL_VERSION=2.25.1-1
NVIDIA_DRIVER_CAPABILITIES=compute,utility
NVIDIA_PRODUCT_NAME=CUDA
VLLM_USAGE_SOURCE=production-docker-image
CUDA_VERSION=12.8.1
LD_LIBRARY_PATH=/usr/local/cuda/lib64
CUDA_HOME=/usr/local/cuda
CUDA_HOME=/usr/local/cuda
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1

And here is the full log

2025-11-08T10:44:31.135582505Z (APIServer pid=1) INFO 11-08 02:44:31 [api_server.py:1959] vLLM API server version 0.11.1rc6.dev214+g608bb1446
2025-11-08T10:44:31.138505493Z (APIServer pid=1) INFO 11-08 02:44:31 [utils.py:253] non-default args: {'enable_auto_tool_choice': True, 'tool_call_parser': 'kimi_k2', 'model': 'moonshotai/Kimi-K2-Thinking', 'trust_remote_code': True, 'max_model_len': 262144, 'served_model_name': ['moonshotai/Kimi-K2-Thinking'], 'reasoning_parser': 'kimi_k2', 'tensor_parallel_size': 8, 'max_num_batched_tokens': 32768, 'max_num_seqs': 100}
2025-11-08T10:44:31.138970855Z (APIServer pid=1) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
2025-11-08T10:44:31.390720150Z (APIServer pid=1) You are using a model of type kimi_k2 to instantiate a model of type deepseek_v3. This is not supported for all configurations of models and can yield errors.
2025-11-08T10:44:31.391569463Z (APIServer pid=1) INFO 11-08 02:44:31 [config.py:416] Replacing legacy 'type' key with 'rope_type'
2025-11-08T10:44:38.763838379Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706] Error in inspecting model architecture 'DeepseekV3ForCausalLM'
2025-11-08T10:44:38.763893915Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706] Traceback (most recent call last):
2025-11-08T10:44:38.763900233Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/registry.py", line 1128, in _run_in_subprocess
2025-11-08T10:44:38.763905590Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706]     returned.check_returncode()
2025-11-08T10:44:38.763915393Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706]   File "/usr/lib/python3.12/subprocess.py", line 502, in check_returncode
2025-11-08T10:44:38.763920085Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706]     raise CalledProcessError(self.returncode, self.args, self.stdout,
2025-11-08T10:44:38.763929615Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706] subprocess.CalledProcessError: Command '['/usr/bin/python3', '-m', 'vllm.model_executor.models.registry']' died with <Signals.SIGABRT: 6>.
2025-11-08T10:44:38.763934321Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706] 
2025-11-08T10:44:38.763943058Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706] The above exception was the direct cause of the following exception:
2025-11-08T10:44:38.763947503Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706] 
2025-11-08T10:44:38.763952337Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706] Traceback (most recent call last):
2025-11-08T10:44:38.763960863Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/registry.py", line 704, in _try_inspect_model_cls
2025-11-08T10:44:38.763965309Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706]     return model.inspect_model_cls()
2025-11-08T10:44:38.763974204Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^
2025-11-08T10:44:38.763978514Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/logging_utils/log_time.py", line 21, in _wrapper
2025-11-08T10:44:38.763984240Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706]     result = func(*args, **kwargs)
2025-11-08T10:44:38.763993474Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706]              ^^^^^^^^^^^^^^^^^^^^^
2025-11-08T10:44:38.763997823Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/registry.py", line 665, in inspect_model_cls
2025-11-08T10:44:38.764004844Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706]     mi = _run_in_subprocess(
2025-11-08T10:44:38.764010284Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706]          ^^^^^^^^^^^^^^^^^^^
2025-11-08T10:44:38.764015234Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/registry.py", line 1131, in _run_in_subprocess
2025-11-08T10:44:38.764022482Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706]     raise RuntimeError(
2025-11-08T10:44:38.764026784Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706] RuntimeError: Error raised in subprocess:
2025-11-08T10:44:38.764033191Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706] <frozen runpy>:128: RuntimeWarning: 'vllm.model_executor.models.registry' found in sys.modules after import of package 'vllm.model_executor.models', but prior to execution of 'vllm.model_executor.models.registry'; this may result in unpredictable behaviour
2025-11-08T10:44:38.764039731Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706] terminate called after throwing an instance of 'std::bad_alloc'
2025-11-08T10:44:38.764045288Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706]   what():  std::bad_alloc
2025-11-08T10:44:38.764049476Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706] 
2025-11-08T10:44:38.764593857Z (APIServer pid=1) Traceback (most recent call last):
2025-11-08T10:44:38.764601819Z (APIServer pid=1)   File "<frozen runpy>", line 198, in _run_module_as_main
2025-11-08T10:44:38.764606732Z (APIServer pid=1)   File "<frozen runpy>", line 88, in _run_code
2025-11-08T10:44:38.764618797Z (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 2078, in <module>
2025-11-08T10:44:38.765560470Z (APIServer pid=1)     uvloop.run(run_server(args))
2025-11-08T10:44:38.765595651Z (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 96, in run
2025-11-08T10:44:38.765614406Z (APIServer pid=1)     return __asyncio.run(
2025-11-08T10:44:38.765619130Z (APIServer pid=1)            ^^^^^^^^^^^^^^
2025-11-08T10:44:38.765623640Z (APIServer pid=1)   File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
2025-11-08T10:44:38.765633519Z (APIServer pid=1)     return runner.run(main)
2025-11-08T10:44:38.765637970Z (APIServer pid=1)            ^^^^^^^^^^^^^^^^
2025-11-08T10:44:38.765642461Z (APIServer pid=1)   File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
2025-11-08T10:44:38.765646960Z (APIServer pid=1)     return self._loop.run_until_complete(task)
2025-11-08T10:44:38.765655694Z (APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-11-08T10:44:38.765659704Z (APIServer pid=1)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
2025-11-08T10:44:38.765664151Z (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 48, in wrapper
2025-11-08T10:44:38.765671045Z (APIServer pid=1)     return await main
2025-11-08T10:44:38.765675257Z (APIServer pid=1)            ^^^^^^^^^^
2025-11-08T10:44:38.765679198Z (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 2006, in run_server
2025-11-08T10:44:38.765925293Z (APIServer pid=1)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
2025-11-08T10:44:38.765932199Z (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 2025, in run_server_worker
2025-11-08T10:44:38.766114364Z (APIServer pid=1)     async with build_async_engine_client(
2025-11-08T10:44:38.766120236Z (APIServer pid=1)                ^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-11-08T10:44:38.766124355Z (APIServer pid=1)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
2025-11-08T10:44:38.766197361Z (APIServer pid=1)     return await anext(self.gen)
2025-11-08T10:44:38.766203644Z (APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^
2025-11-08T10:44:38.766252834Z (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 195, in build_async_engine_client
2025-11-08T10:44:38.766279647Z (APIServer pid=1)     async with build_async_engine_client_from_engine_args(
2025-11-08T10:44:38.766345082Z (APIServer pid=1)                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-11-08T10:44:38.766350673Z (APIServer pid=1)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
2025-11-08T10:44:38.766361677Z (APIServer pid=1)     return await anext(self.gen)
2025-11-08T10:44:38.766415731Z (APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^
2025-11-08T10:44:38.766420647Z (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 221, in build_async_engine_client_from_engine_args
2025-11-08T10:44:38.766456848Z (APIServer pid=1)     vllm_config = engine_args.create_engine_config(usage_context=usage_context)
2025-11-08T10:44:38.766478052Z (APIServer pid=1)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-11-08T10:44:38.766483056Z (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 1315, in create_engine_config
2025-11-08T10:44:38.766705391Z (APIServer pid=1)     model_config = self.create_model_config()
2025-11-08T10:44:38.766717688Z (APIServer pid=1)                    ^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-11-08T10:44:38.766722432Z (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 1170, in create_model_config
2025-11-08T10:44:38.766915646Z (APIServer pid=1)     return ModelConfig(
2025-11-08T10:44:38.766926370Z (APIServer pid=1)            ^^^^^^^^^^^^
2025-11-08T10:44:38.766931276Z (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/pydantic/_internal/_dataclasses.py", line 121, in __init__
2025-11-08T10:44:38.766997659Z (APIServer pid=1)     s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
2025-11-08T10:44:38.767218514Z (APIServer pid=1) pydantic_core._pydantic_core.ValidationError: 1 validation error for ModelConfig
2025-11-08T10:44:38.767227060Z (APIServer pid=1)   Value error, Model architectures ['DeepseekV3ForCausalLM'] failed to be inspected. Please check the logs for more details. [type=value_error, input_value=ArgsKwargs((), {'model': ...rocessor_plugin': None}), input_type=ArgsKwargs]
2025-11-08T10:44:38.767232194Z (APIServer pid=1)     For further information visit https://errors.pydantic.dev/2.12/v/value_error
Moonshot AI org

2025-11-08T05:12:38.178643769Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706] :128: RuntimeWarning: 'vllm.model_executor.models.registry' found in sys.modules after import of package 'vllm.model_executor.models', but prior to execution of 'vllm.model_executor.models.registry'; this may result in unpredictable behaviour
2025-11-08T05:12:38.178648755Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706] terminate called after throwing an instance of 'std::bad_alloc'
2025-11-08T05:12:38.178653439Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706] what(): std::bad_alloc

from chatgpt, this seems to be CPU out of memory.

please check if you have more than 1 TiB memory to hold the model weights. It needs to live in the CPU memory before moving to GPU memory.

2025-11-08T05:12:38.178643769Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706] :128: RuntimeWarning: 'vllm.model_executor.models.registry' found in sys.modules after import of package 'vllm.model_executor.models', but prior to execution of 'vllm.model_executor.models.registry'; this may result in unpredictable behaviour
2025-11-08T05:12:38.178648755Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706] terminate called after throwing an instance of 'std::bad_alloc'
2025-11-08T05:12:38.178653439Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706] what(): std::bad_alloc

from chatgpt, this seems to be CPU out of memory.

please check if you have more than 1 TiB memory to hold the model weights. It needs to live in the CPU memory before moving to GPU memory.

Cool, I only allocated 512GB memory, I will try to increase the memory and see if it works. Thanks for replying. I will sync the results later.

It's weird, I allocated 1.5T memory for this model, but the error remains still.

2025-11-11T02:32:20.647899956Z (APIServer pid=1) INFO 11-10 18:32:20 [config.py:416] Replacing legacy 'type' key with 'rope_type'
2025-11-11T02:32:26.287715714Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706] Error in inspecting model architecture 'DeepseekV3ForCausalLM'
2025-11-11T02:32:26.287761825Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706] Traceback (most recent call last):
2025-11-11T02:32:26.287768666Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/registry.py", line 1128, in _run_in_subprocess
2025-11-11T02:32:26.287774350Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706]     returned.check_returncode()
2025-11-11T02:32:26.287779095Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706]   File "/usr/lib/python3.12/subprocess.py", line 502, in check_returncode
2025-11-11T02:32:26.287784225Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706]     raise CalledProcessError(self.returncode, self.args, self.stdout,
2025-11-11T02:32:26.287789315Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706] subprocess.CalledProcessError: Command '['/usr/bin/python3', '-m', 'vllm.model_executor.models.registry']' died with <Signals.SIGABRT: 6>.
2025-11-11T02:32:26.287794219Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706] 
2025-11-11T02:32:26.287798953Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706] The above exception was the direct cause of the following exception:
2025-11-11T02:32:26.287803401Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706] 
2025-11-11T02:32:26.287811737Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706] Traceback (most recent call last):
2025-11-11T02:32:26.287816593Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/registry.py", line 704, in _try_inspect_model_cls
2025-11-11T02:32:26.287821381Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706]     return model.inspect_model_cls()
2025-11-11T02:32:26.287826345Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^
2025-11-11T02:32:26.287831326Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/logging_utils/log_time.py", line 21, in _wrapper
2025-11-11T02:32:26.287837300Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706]     result = func(*args, **kwargs)
2025-11-11T02:32:26.287842043Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706]              ^^^^^^^^^^^^^^^^^^^^^
2025-11-11T02:32:26.287846551Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/registry.py", line 665, in inspect_model_cls
2025-11-11T02:32:26.287851174Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706]     mi = _run_in_subprocess(
2025-11-11T02:32:26.287855680Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706]          ^^^^^^^^^^^^^^^^^^^
2025-11-11T02:32:26.287860328Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/registry.py", line 1131, in _run_in_subprocess
2025-11-11T02:32:26.287865208Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706]     raise RuntimeError(
2025-11-11T02:32:26.287869694Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706] RuntimeError: Error raised in subprocess:
2025-11-11T02:32:26.287874571Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706] <frozen runpy>:128: RuntimeWarning: 'vllm.model_executor.models.registry' found in sys.modules after import of package 'vllm.model_executor.models', but prior to execution of 'vllm.model_executor.models.registry'; this may result in unpredictable behaviour
2025-11-11T02:32:26.287879443Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706] terminate called after throwing an instance of 'std::bad_alloc'
2025-11-11T02:32:26.287884079Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706]   what():  std::bad_alloc
2025-11-11T02:32:26.287888632Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706] 

I solved this problem by building my custom vllm image using python3.13. It seems that this python version mismatch is the core reason.

Moonshot AI org

hmm i didn't expect python version matters here. is it a python bug?

It seems that this python version mismatch is the core reason.

what's the mismatch?

I used python3.12 to install latest vllm from nightly, but it fails as above. Then I used python3.13 to install vllm from nightly, it worked.

Sign up or log in to comment