If someone struggling to run rubert-tiny2 with vLLM

#5
by WpythonW - opened

Environment (tested on Google Colab T4):

vLLM: 0.11.2
Transformers: 4.57.2
Safetensors: 0.7.0
PyTorch: 2.9.0+cu126
CUDA: 12.6

Modifications:

  1. Changed architecture from BertForPreTraining to BertModel in config.json
  2. Fused separate Q/K/V weights into qkv_proj format (vLLM's fused attention optimization)
  3. Removed pretraining heads (MLM/NSP) and pooler weights
  4. Stripped bert. prefix from weight keys (vLLM adds model. automatically via mapper)

Full conversion notebook: https://colab.research.google.com/drive/1SS9qEayvwZU1r1khxq9tWf7iEZcxw2yW?usp=sharing

Thanks a lot, Andrew!
If you upload the converted model separately to HF, I would be happy to share the link to it in the rubert-tiny2 model card!

Hi! The vLLM-optimized version is now ready: https://huggingface.co/WpythonW/rubert-tiny2-vllm
Feel free to link it in the rubert-tiny2 card!

Sign up or log in to comment