cointegrated/rubert-tiny2 · If someone struggling to run rubert-tiny2 with vLLM

If someone struggling to run rubert-tiny2 with vLLM

by WpythonW - opened 3 days ago

3 days ago

Environment (tested on Google Colab T4):

vLLM: 0.11.2
Transformers: 4.57.2
Safetensors: 0.7.0
PyTorch: 2.9.0+cu126
CUDA: 12.6

Modifications:

Changed architecture from BertForPreTraining to BertModel in config.json
Fused separate Q/K/V weights into qkv_proj format (vLLM's fused attention optimization)
Removed pretraining heads (MLM/NSP) and pooler weights
Stripped bert. prefix from weight keys (vLLM adds model. automatically via mapper)

Full conversion notebook: https://colab.research.google.com/drive/1SS9qEayvwZU1r1khxq9tWf7iEZcxw2yW?usp=sharing

Owner 3 days ago

•

Thanks a lot, Andrew!
If you upload the converted model separately to HF, I would be happy to share the link to it in the rubert-tiny2 model card!

3 days ago

Hi! The vLLM-optimized version is now ready: https://huggingface.co/WpythonW/rubert-tiny2-vllm
Feel free to link it in the rubert-tiny2 card!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment