EXL3 quant with 3bpw MLP projection layer and 4bpw for all other layers, to fit in 24GB cards with 16K context. Original description:

Merged jukofyork/command-r-35b-writer-v3-multiplicative-lora into CohereLabs/c4ai-command-r-v01 using jukofyork/merge-lora.

Untested... But appears to have worked:

✓ Successfully merged and uploaded model!
Model URL: https://huggingface.co/jukofyork/command-r-35b-writer-v3
Merge mode: Multiplicative
Scale factor: 1
Processed 15 shards
Merged 72 layers with LoRA weights
Downloads last month
11
Safetensors
Model size
11B params
Tensor type
F16
·
I16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Downtown-Case/jukofyork_command-r-35b-writer-v3-exl3-3.75bpw-hb6

Quantized
(14)
this model