vllm deployment failed

#11

by mondaylord - opened 12 days ago

12 days ago

I was using vllm to deploy kimi-k2-thinking, and the error shows like below. I was using vllm from commit https://github.com/vllm-project/vllm/commit/67a2da890eef2a6fd40384aa5ae80e03beb39490. The args are the same as in the guide. Could you show which is the correct base commit that I should use?

2025-11-08T05:12:31.799956291Z (APIServer pid=1) You are using a model of type kimi_k2 to instantiate a model of type deepseek_v3. This is not supported for all configurations of models and can yield errors.
2025-11-08T05:12:31.800772713Z (APIServer pid=1) INFO 11-07 21:12:31 [config.py:416] Replacing legacy 'type' key with 'rope_type'
2025-11-08T05:12:38.178480433Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706] Error in inspecting model architecture 'DeepseekV3ForCausalLM'
2025-11-08T05:12:38.178531689Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706] Traceback (most recent call last):
2025-11-08T05:12:38.178538843Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/registry.py", line 1128, in _run_in_subprocess
2025-11-08T05:12:38.178545045Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706]     returned.check_returncode()
2025-11-08T05:12:38.178550359Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706]   File "/usr/lib/python3.12/subprocess.py", line 502, in check_returncode
2025-11-08T05:12:38.178555576Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706]     raise CalledProcessError(self.returncode, self.args, self.stdout,
2025-11-08T05:12:38.178560565Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706] subprocess.CalledProcessError: Command '['/usr/bin/python3', '-m', 'vllm.model_executor.models.registry']' died with <Signals.SIGABRT: 6>.
2025-11-08T05:12:38.178565857Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706] 
2025-11-08T05:12:38.178570978Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706] The above exception was the direct cause of the following exception:
2025-11-08T05:12:38.178576239Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706] 
2025-11-08T05:12:38.178581284Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706] Traceback (most recent call last):
2025-11-08T05:12:38.178586124Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/registry.py", line 704, in _try_inspect_model_cls
2025-11-08T05:12:38.178590857Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706]     return model.inspect_model_cls()
2025-11-08T05:12:38.178595421Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^
2025-11-08T05:12:38.178600226Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/logging_utils/log_time.py", line 21, in _wrapper
2025-11-08T05:12:38.178606053Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706]     result = func(*args, **kwargs)
2025-11-08T05:12:38.178610812Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706]              ^^^^^^^^^^^^^^^^^^^^^
2025-11-08T05:12:38.178615284Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/registry.py", line 665, in inspect_model_cls
2025-11-08T05:12:38.178620403Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706]     mi = _run_in_subprocess(
2025-11-08T05:12:38.178625183Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706]          ^^^^^^^^^^^^^^^^^^^
2025-11-08T05:12:38.178629959Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/registry.py", line 1131, in _run_in_subprocess
2025-11-08T05:12:38.178634414Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706]     raise RuntimeError(
2025-11-08T05:12:38.178638871Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706] RuntimeError: Error raised in subprocess:
2025-11-08T05:12:38.178643769Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706] <frozen runpy>:128: RuntimeWarning: 'vllm.model_executor.models.registry' found in sys.modules after import of package 'vllm.model_executor.models', but prior to execution of 'vllm.model_executor.models.registry'; this may result in unpredictable behaviour
2025-11-08T05:12:38.178648755Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706] terminate called after throwing an instance of 'std::bad_alloc'
2025-11-08T05:12:38.178653439Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706]   what():  std::bad_alloc
2025-11-08T05:12:38.178657970Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706]

youkaichao

Moonshot AI org 12 days ago

do you have the full log? you can post a vllm issue too, with detailed environment information.

mondaylord

11 days ago

do you have the full log? you can post a vllm issue too, with detailed environment information.

Sure, thanks for your reply. I will post a issue in vllm's github repo soon. The environment config is 8xH200, vllm's commit version 67a2da890eef2a6fd40384aa5ae80e03beb39490, and the env is like below

==============================
        System Info
==============================
OS                           : Ubuntu 22.04.5 LTS (x86_64)
GCC version                  : (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version                : Could not collect
CMake version                : version 4.1.0
Libc version                 : glibc-2.35

==============================
       PyTorch Info
==============================
PyTorch version              : 2.9.0+cu128
Is debug build               : False
CUDA used to build PyTorch   : 12.8
ROCM used to build PyTorch   : N/A

==============================
      Python Environment
==============================
Python version               : 3.12.11 (main, Jun  4 2025, 08:56:18) [GCC 11.4.0] (64-bit runtime)
Python platform              : Linux-6.9.0-dstack-x86_64-with-glibc2.35

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : False
CUDA runtime version         : 12.8.93
CUDA_MODULE_LOADING set to   : N/A
GPU models and configuration : Could not collect
Nvidia driver version        : Could not collect
cuDNN version                : Could not collect
HIP runtime version          : N/A
MIOpen runtime version       : N/A
Is XNNPACK available         : True

==============================
          CPU Info
==============================
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Address sizes:                        52 bits physical, 57 bits virtual
Byte Order:                           Little Endian
CPU(s):                               64
On-line CPU(s) list:                  0-63
Vendor ID:                            GenuineIntel
Model name:                           06/cf
CPU family:                           6
Model:                                207
Thread(s) per core:                   1
Core(s) per socket:                   64
Socket(s):                            1
Stepping:                             2
BogoMIPS:                             3800.00
Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush dts mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc pebs bts rep_good nopl tsc_reliable nonstop_tsc cpuid tsc_known_freq pni pclmulqdq dtes64 ds_cpl ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault ssbd ibrs ibpb stibp ibrs_enhanced tdx_guest fsgsbase bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx_vnni avx512_bf16 wbnoinvd avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid bus_lock_detect cldemote movdiri movdir64b fsrm md_clear serialize tsxldtrk amx_bf16 avx512_fp16 amx_tile amx_int8 flush_l1d arch_capabilities
Hypervisor vendor:                    KVM
Virtualization type:                  full
L1d cache:                            2 MiB (64 instances)
L1i cache:                            2 MiB (64 instances)
L2 cache:                             256 MiB (64 instances)
L3 cache:                             16 MiB (1 instance)
NUMA node(s):                         1
NUMA node0 CPU(s):                    0-63
Vulnerability Gather data sampling:   Not affected
Vulnerability Itlb multihit:          Not affected
Vulnerability L1tf:                   Not affected
Vulnerability Mds:                    Not affected
Vulnerability Meltdown:               Not affected
Vulnerability Mmio stale data:        Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed:               Not affected
Vulnerability Spec rstack overflow:   Not affected
Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:             Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence; BHI BHI_DIS_S
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Not affected

==============================
Versions of relevant libraries
==============================
[pip3] flashinfer-python==0.5.2
[pip3] numpy==2.2.6
[pip3] nvidia-cublas-cu12==12.8.4.1
[pip3] nvidia-cuda-cupti-cu12==12.8.90
[pip3] nvidia-cuda-nvrtc-cu12==12.8.93
[pip3] nvidia-cuda-runtime-cu12==12.8.90
[pip3] nvidia-cudnn-cu12==9.10.2.21
[pip3] nvidia-cudnn-frontend==1.16.0
[pip3] nvidia-cufft-cu12==11.3.3.83
[pip3] nvidia-cufile-cu12==1.13.1.3
[pip3] nvidia-curand-cu12==10.3.9.90
[pip3] nvidia-cusolver-cu12==11.7.3.90
[pip3] nvidia-cusparse-cu12==12.5.8.93
[pip3] nvidia-cusparselt-cu12==0.7.1
[pip3] nvidia-cutlass-dsl==4.3.0.dev0
[pip3] nvidia-ml-py==13.580.82
[pip3] nvidia-nccl-cu12==2.27.5
[pip3] nvidia-nvjitlink-cu12==12.8.93
[pip3] nvidia-nvshmem-cu12==3.3.20
[pip3] nvidia-nvtx-cu12==12.8.90
[pip3] pynvml==12.0.0
[pip3] pyzmq==27.1.0
[pip3] torch==2.9.0+cu128
[pip3] torchaudio==2.9.0+cu128
[pip3] torchvision==0.24.0+cu128
[pip3] transformers==4.57.1
[pip3] triton==3.5.0
[conda] Could not collect

==============================
         vLLM Info
==============================
ROCM Version                 : Could not collect
vLLM Version                 : 0.11.1rc6.dev214+g608bb1446 (git sha: 608bb1446)
vLLM Build Flags:
  CUDA Archs: Not Set; ROCm: Disabled
GPU Topology:
  Could not collect

==============================
     Environment Variables
==============================
NVIDIA_VISIBLE_DEVICES=all
NVIDIA_REQUIRE_CUDA=cuda>=12.8 brand=unknown,driver>=470,driver<471 brand=grid,driver>=470,driver<471 brand=tesla,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=quadro,driver>=470,driver<471 brand=quadrortx,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=vapps,driver>=470,driver<471 brand=vpc,driver>=470,driver<471 brand=vcs,driver>=470,driver<471 brand=vws,driver>=470,driver<471 brand=cloudgaming,driver>=470,driver<471 brand=unknown,driver>=535,driver<536 brand=grid,driver>=535,driver<536 brand=tesla,driver>=535,driver<536 brand=nvidia,driver>=535,driver<536 brand=quadro,driver>=535,driver<536 brand=quadrortx,driver>=535,driver<536 brand=nvidiartx,driver>=535,driver<536 brand=vapps,driver>=535,driver<536 brand=vpc,driver>=535,driver<536 brand=vcs,driver>=535,driver<536 brand=vws,driver>=535,driver<536 brand=cloudgaming,driver>=535,driver<536 brand=unknown,driver>=550,driver<551 brand=grid,driver>=550,driver<551 brand=tesla,driver>=550,driver<551 brand=nvidia,driver>=550,driver<551 brand=quadro,driver>=550,driver<551 brand=quadrortx,driver>=550,driver<551 brand=nvidiartx,driver>=550,driver<551 brand=vapps,driver>=550,driver<551 brand=vpc,driver>=550,driver<551 brand=vcs,driver>=550,driver<551 brand=vws,driver>=550,driver<551 brand=cloudgaming,driver>=550,driver<551 brand=unknown,driver>=560,driver<561 brand=grid,driver>=560,driver<561 brand=tesla,driver>=560,driver<561 brand=nvidia,driver>=560,driver<561 brand=quadro,driver>=560,driver<561 brand=quadrortx,driver>=560,driver<561 brand=nvidiartx,driver>=560,driver<561 brand=vapps,driver>=560,driver<561 brand=vpc,driver>=560,driver<561 brand=vcs,driver>=560,driver<561 brand=vws,driver>=560,driver<561 brand=cloudgaming,driver>=560,driver<561 brand=unknown,driver>=565,driver<566 brand=grid,driver>=565,driver<566 brand=tesla,driver>=565,driver<566 brand=nvidia,driver>=565,driver<566 brand=quadro,driver>=565,driver<566 brand=quadrortx,driver>=565,driver<566 brand=nvidiartx,driver>=565,driver<566 brand=vapps,driver>=565,driver<566 brand=vpc,driver>=565,driver<566 brand=vcs,driver>=565,driver<566 brand=vws,driver>=565,driver<566 brand=cloudgaming,driver>=565,driver<566
NCCL_VERSION=2.25.1-1
NVIDIA_DRIVER_CAPABILITIES=compute,utility
NVIDIA_PRODUCT_NAME=CUDA
VLLM_USAGE_SOURCE=production-docker-image
CUDA_VERSION=12.8.1
LD_LIBRARY_PATH=/usr/local/cuda/lib64
CUDA_HOME=/usr/local/cuda
CUDA_HOME=/usr/local/cuda
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1

And here is the full log

2025-11-08T10:44:31.135582505Z (APIServer pid=1) INFO 11-08 02:44:31 [api_server.py:1959] vLLM API server version 0.11.1rc6.dev214+g608bb1446
2025-11-08T10:44:31.138505493Z (APIServer pid=1) INFO 11-08 02:44:31 [utils.py:253] non-default args: {'enable_auto_tool_choice': True, 'tool_call_parser': 'kimi_k2', 'model': 'moonshotai/Kimi-K2-Thinking', 'trust_remote_code': True, 'max_model_len': 262144, 'served_model_name': ['moonshotai/Kimi-K2-Thinking'], 'reasoning_parser': 'kimi_k2', 'tensor_parallel_size': 8, 'max_num_batched_tokens': 32768, 'max_num_seqs': 100}
2025-11-08T10:44:31.138970855Z (APIServer pid=1) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
2025-11-08T10:44:31.390720150Z (APIServer pid=1) You are using a model of type kimi_k2 to instantiate a model of type deepseek_v3. This is not supported for all configurations of models and can yield errors.
2025-11-08T10:44:31.391569463Z (APIServer pid=1) INFO 11-08 02:44:31 [config.py:416] Replacing legacy 'type' key with 'rope_type'
2025-11-08T10:44:38.763838379Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706] Error in inspecting model architecture 'DeepseekV3ForCausalLM'
2025-11-08T10:44:38.763893915Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706] Traceback (most recent call last):
2025-11-08T10:44:38.763900233Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/registry.py", line 1128, in _run_in_subprocess
2025-11-08T10:44:38.763905590Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706]     returned.check_returncode()
2025-11-08T10:44:38.763915393Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706]   File "/usr/lib/python3.12/subprocess.py", line 502, in check_returncode
2025-11-08T10:44:38.763920085Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706]     raise CalledProcessError(self.returncode, self.args, self.stdout,
2025-11-08T10:44:38.763929615Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706] subprocess.CalledProcessError: Command '['/usr/bin/python3', '-m', 'vllm.model_executor.models.registry']' died with <Signals.SIGABRT: 6>.
2025-11-08T10:44:38.763934321Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706] 
2025-11-08T10:44:38.763943058Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706] The above exception was the direct cause of the following exception:
2025-11-08T10:44:38.763947503Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706] 
2025-11-08T10:44:38.763952337Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706] Traceback (most recent call last):
2025-11-08T10:44:38.763960863Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/registry.py", line 704, in _try_inspect_model_cls
2025-11-08T10:44:38.763965309Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706]     return model.inspect_model_cls()
2025-11-08T10:44:38.763974204Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^
2025-11-08T10:44:38.763978514Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/logging_utils/log_time.py", line 21, in _wrapper
2025-11-08T10:44:38.763984240Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706]     result = func(*args, **kwargs)
2025-11-08T10:44:38.763993474Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706]              ^^^^^^^^^^^^^^^^^^^^^
2025-11-08T10:44:38.763997823Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/registry.py", line 665, in inspect_model_cls
2025-11-08T10:44:38.764004844Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706]     mi = _run_in_subprocess(
2025-11-08T10:44:38.764010284Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706]          ^^^^^^^^^^^^^^^^^^^
2025-11-08T10:44:38.764015234Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/registry.py", line 1131, in _run_in_subprocess
2025-11-08T10:44:38.764022482Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706]     raise RuntimeError(
2025-11-08T10:44:38.764026784Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706] RuntimeError: Error raised in subprocess:
2025-11-08T10:44:38.764033191Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706] <frozen runpy>:128: RuntimeWarning: 'vllm.model_executor.models.registry' found in sys.modules after import of package 'vllm.model_executor.models', but prior to execution of 'vllm.model_executor.models.registry'; this may result in unpredictable behaviour
2025-11-08T10:44:38.764039731Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706] terminate called after throwing an instance of 'std::bad_alloc'
2025-11-08T10:44:38.764045288Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706]   what():  std::bad_alloc
2025-11-08T10:44:38.764049476Z (APIServer pid=1) ERROR 11-08 02:44:38 [registry.py:706] 
2025-11-08T10:44:38.764593857Z (APIServer pid=1) Traceback (most recent call last):
2025-11-08T10:44:38.764601819Z (APIServer pid=1)   File "<frozen runpy>", line 198, in _run_module_as_main
2025-11-08T10:44:38.764606732Z (APIServer pid=1)   File "<frozen runpy>", line 88, in _run_code
2025-11-08T10:44:38.764618797Z (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 2078, in <module>
2025-11-08T10:44:38.765560470Z (APIServer pid=1)     uvloop.run(run_server(args))
2025-11-08T10:44:38.765595651Z (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 96, in run
2025-11-08T10:44:38.765614406Z (APIServer pid=1)     return __asyncio.run(
2025-11-08T10:44:38.765619130Z (APIServer pid=1)            ^^^^^^^^^^^^^^
2025-11-08T10:44:38.765623640Z (APIServer pid=1)   File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
2025-11-08T10:44:38.765633519Z (APIServer pid=1)     return runner.run(main)
2025-11-08T10:44:38.765637970Z (APIServer pid=1)            ^^^^^^^^^^^^^^^^
2025-11-08T10:44:38.765642461Z (APIServer pid=1)   File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
2025-11-08T10:44:38.765646960Z (APIServer pid=1)     return self._loop.run_until_complete(task)
2025-11-08T10:44:38.765655694Z (APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-11-08T10:44:38.765659704Z (APIServer pid=1)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
2025-11-08T10:44:38.765664151Z (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 48, in wrapper
2025-11-08T10:44:38.765671045Z (APIServer pid=1)     return await main
2025-11-08T10:44:38.765675257Z (APIServer pid=1)            ^^^^^^^^^^
2025-11-08T10:44:38.765679198Z (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 2006, in run_server
2025-11-08T10:44:38.765925293Z (APIServer pid=1)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
2025-11-08T10:44:38.765932199Z (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 2025, in run_server_worker
2025-11-08T10:44:38.766114364Z (APIServer pid=1)     async with build_async_engine_client(
2025-11-08T10:44:38.766120236Z (APIServer pid=1)                ^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-11-08T10:44:38.766124355Z (APIServer pid=1)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
2025-11-08T10:44:38.766197361Z (APIServer pid=1)     return await anext(self.gen)
2025-11-08T10:44:38.766203644Z (APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^
2025-11-08T10:44:38.766252834Z (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 195, in build_async_engine_client
2025-11-08T10:44:38.766279647Z (APIServer pid=1)     async with build_async_engine_client_from_engine_args(
2025-11-08T10:44:38.766345082Z (APIServer pid=1)                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-11-08T10:44:38.766350673Z (APIServer pid=1)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
2025-11-08T10:44:38.766361677Z (APIServer pid=1)     return await anext(self.gen)
2025-11-08T10:44:38.766415731Z (APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^
2025-11-08T10:44:38.766420647Z (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 221, in build_async_engine_client_from_engine_args
2025-11-08T10:44:38.766456848Z (APIServer pid=1)     vllm_config = engine_args.create_engine_config(usage_context=usage_context)
2025-11-08T10:44:38.766478052Z (APIServer pid=1)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-11-08T10:44:38.766483056Z (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 1315, in create_engine_config
2025-11-08T10:44:38.766705391Z (APIServer pid=1)     model_config = self.create_model_config()
2025-11-08T10:44:38.766717688Z (APIServer pid=1)                    ^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-11-08T10:44:38.766722432Z (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 1170, in create_model_config
2025-11-08T10:44:38.766915646Z (APIServer pid=1)     return ModelConfig(
2025-11-08T10:44:38.766926370Z (APIServer pid=1)            ^^^^^^^^^^^^
2025-11-08T10:44:38.766931276Z (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/pydantic/_internal/_dataclasses.py", line 121, in __init__
2025-11-08T10:44:38.766997659Z (APIServer pid=1)     s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
2025-11-08T10:44:38.767218514Z (APIServer pid=1) pydantic_core._pydantic_core.ValidationError: 1 validation error for ModelConfig
2025-11-08T10:44:38.767227060Z (APIServer pid=1)   Value error, Model architectures ['DeepseekV3ForCausalLM'] failed to be inspected. Please check the logs for more details. [type=value_error, input_value=ArgsKwargs((), {'model': ...rocessor_plugin': None}), input_type=ArgsKwargs]
2025-11-08T10:44:38.767232194Z (APIServer pid=1)     For further information visit https://errors.pydantic.dev/2.12/v/value_error

youkaichao

Moonshot AI org 11 days ago

2025-11-08T05:12:38.178643769Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706] :128: RuntimeWarning: 'vllm.model_executor.models.registry' found in sys.modules after import of package 'vllm.model_executor.models', but prior to execution of 'vllm.model_executor.models.registry'; this may result in unpredictable behaviour
2025-11-08T05:12:38.178648755Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706] terminate called after throwing an instance of 'std::bad_alloc'
2025-11-08T05:12:38.178653439Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706] what(): std::bad_alloc

from chatgpt, this seems to be CPU out of memory.

please check if you have more than 1 TiB memory to hold the model weights. It needs to live in the CPU memory before moving to GPU memory.

mondaylord

11 days ago

2025-11-08T05:12:38.178643769Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706] :128: RuntimeWarning: 'vllm.model_executor.models.registry' found in sys.modules after import of package 'vllm.model_executor.models', but prior to execution of 'vllm.model_executor.models.registry'; this may result in unpredictable behaviour
2025-11-08T05:12:38.178648755Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706] terminate called after throwing an instance of 'std::bad_alloc'
2025-11-08T05:12:38.178653439Z (APIServer pid=1) ERROR 11-07 21:12:38 [registry.py:706] what(): std::bad_alloc

from chatgpt, this seems to be CPU out of memory.

please check if you have more than 1 TiB memory to hold the model weights. It needs to live in the CPU memory before moving to GPU memory.

Cool, I only allocated 512GB memory, I will try to increase the memory and see if it works. Thanks for replying. I will sync the results later.

mondaylord

9 days ago

It's weird, I allocated 1.5T memory for this model, but the error remains still.

2025-11-11T02:32:20.647899956Z (APIServer pid=1) INFO 11-10 18:32:20 [config.py:416] Replacing legacy 'type' key with 'rope_type'
2025-11-11T02:32:26.287715714Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706] Error in inspecting model architecture 'DeepseekV3ForCausalLM'
2025-11-11T02:32:26.287761825Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706] Traceback (most recent call last):
2025-11-11T02:32:26.287768666Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/registry.py", line 1128, in _run_in_subprocess
2025-11-11T02:32:26.287774350Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706]     returned.check_returncode()
2025-11-11T02:32:26.287779095Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706]   File "/usr/lib/python3.12/subprocess.py", line 502, in check_returncode
2025-11-11T02:32:26.287784225Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706]     raise CalledProcessError(self.returncode, self.args, self.stdout,
2025-11-11T02:32:26.287789315Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706] subprocess.CalledProcessError: Command '['/usr/bin/python3', '-m', 'vllm.model_executor.models.registry']' died with <Signals.SIGABRT: 6>.
2025-11-11T02:32:26.287794219Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706] 
2025-11-11T02:32:26.287798953Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706] The above exception was the direct cause of the following exception:
2025-11-11T02:32:26.287803401Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706] 
2025-11-11T02:32:26.287811737Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706] Traceback (most recent call last):
2025-11-11T02:32:26.287816593Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/registry.py", line 704, in _try_inspect_model_cls
2025-11-11T02:32:26.287821381Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706]     return model.inspect_model_cls()
2025-11-11T02:32:26.287826345Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^
2025-11-11T02:32:26.287831326Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/logging_utils/log_time.py", line 21, in _wrapper
2025-11-11T02:32:26.287837300Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706]     result = func(*args, **kwargs)
2025-11-11T02:32:26.287842043Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706]              ^^^^^^^^^^^^^^^^^^^^^
2025-11-11T02:32:26.287846551Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/registry.py", line 665, in inspect_model_cls
2025-11-11T02:32:26.287851174Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706]     mi = _run_in_subprocess(
2025-11-11T02:32:26.287855680Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706]          ^^^^^^^^^^^^^^^^^^^
2025-11-11T02:32:26.287860328Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/registry.py", line 1131, in _run_in_subprocess
2025-11-11T02:32:26.287865208Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706]     raise RuntimeError(
2025-11-11T02:32:26.287869694Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706] RuntimeError: Error raised in subprocess:
2025-11-11T02:32:26.287874571Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706] <frozen runpy>:128: RuntimeWarning: 'vllm.model_executor.models.registry' found in sys.modules after import of package 'vllm.model_executor.models', but prior to execution of 'vllm.model_executor.models.registry'; this may result in unpredictable behaviour
2025-11-11T02:32:26.287879443Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706] terminate called after throwing an instance of 'std::bad_alloc'
2025-11-11T02:32:26.287884079Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706]   what():  std::bad_alloc
2025-11-11T02:32:26.287888632Z (APIServer pid=1) ERROR 11-10 18:32:26 [registry.py:706]

mondaylord

8 days ago

I solved this problem by building my custom vllm image using python3.13. It seems that this python version mismatch is the core reason.

youkaichao

Moonshot AI org 8 days ago

hmm i didn't expect python version matters here. is it a python bug?

It seems that this python version mismatch is the core reason.

what's the mismatch?

mondaylord

7 days ago

I used python3.12 to install latest vllm from nightly, but it fails as above. Then I used python3.13 to install vllm from nightly, it worked.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment