If someone struggling to run rubert-tiny2 with vLLM
#5
by
WpythonW
- opened
Environment (tested on Google Colab T4):
vLLM: 0.11.2
Transformers: 4.57.2
Safetensors: 0.7.0
PyTorch: 2.9.0+cu126
CUDA: 12.6
Modifications:
- Changed architecture from
BertForPreTrainingtoBertModelin config.json - Fused separate Q/K/V weights into
qkv_projformat (vLLM's fused attention optimization) - Removed pretraining heads (MLM/NSP) and pooler weights
- Stripped
bert.prefix from weight keys (vLLM addsmodel.automatically via mapper)
Full conversion notebook: https://colab.research.google.com/drive/1SS9qEayvwZU1r1khxq9tWf7iEZcxw2yW?usp=sharing
Thanks a lot, Andrew!
If you upload the converted model separately to HF, I would be happy to share the link to it in the rubert-tiny2 model card!
Hi! The vLLM-optimized version is now ready: https://huggingface.co/WpythonW/rubert-tiny2-vllm
Feel free to link it in the rubert-tiny2 card!