EXL3 quant with 3bpw MLP projection layer and 4bpw for all other layers, to fit in 24GB cards with 16K context. Original description:
Merged jukofyork/command-r-35b-writer-v3-multiplicative-lora into CohereLabs/c4ai-command-r-v01 using jukofyork/merge-lora.
Untested... But appears to have worked:
✓ Successfully merged and uploaded model!
Model URL: https://huggingface.co/jukofyork/command-r-35b-writer-v3
Merge mode: Multiplicative
Scale factor: 1
Processed 15 shards
Merged 72 layers with LoRA weights
- Downloads last month
- 11
Model tree for Downtown-Case/jukofyork_command-r-35b-writer-v3-exl3-3.75bpw-hb6
Base model
CohereLabs/c4ai-command-r-v01