Model Card
- Base model:
meta-llama/Llama-3.3-70B-Instruct
- Quantization method: SqueezeLLM
- Target bit-width: 4
- Backend kernel: Any-Precision-LLM kernel (
ap-gemv
)
- Calibration data: RedPajama (1024 sentences / 4096 tokens)
- Calibration objective: Next-token prediction
How to run
References
Model tree for jusjinuk/Llama-3.3-70B-Instruct-4bit-SqueezeLLM
Collection including
jusjinuk/Llama-3.3-70B-Instruct-4bit-SqueezeLLM