Magistral-Small-2506-Q4

Quantized Q4 Version by Sandlogic Lexicon

This is a 4-bit quantized version of Magistral-Small-2506, optimized for efficient deployment while maintaining strong reasoning capabilities.

Model Overview

Building upon Mistral Small 3.1 (2503), with added reasoning capabilities, this model underwent SFT from Magistral Medium traces and RL on top, resulting in a small, efficient reasoning model with 24B parameters.

The Q4 quantization significantly reduces memory requirements while preserving the model's core reasoning abilities, making it even more accessible for local deployment.

Key Features

🧠 Reasoning: Capable of long chains of reasoning traces before providing an answer
🌍 Multilingual: Supports dozens of languages, including English, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Nepali, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Turkish, Ukrainian, Vietnamese, Arabic, Bengali, Chinese, and Farsi
📄 Apache 2.0 License: Open license allowing usage and modification for both commercial and non-commercial purposes
📖 Context Window: 128k context window (performance optimized for up to 40k tokens)
⚡ Quantized: Q4 quantization for reduced memory footprint and faster inference

Performance Notes

Context Length: While the model supports up to 128k tokens, we recommend limiting to 40k tokens for optimal performance
Quantization Impact: Q4 quantization reduces model size by ~75% with minimal impact on reasoning quality
Inference Speed: 2-3x faster inference compared to the full precision model

Model Architecture

Base Model: Mistral Small 3.1 (2503)
Parameters: 24B
Quantization: Q4_K_M
Context Window: 128k tokens (40k recommended)
License: Apache 2.0

Training Details

The base model underwent:

Supervised Fine-Tuning (SFT) from Magistral Medium traces
Reinforcement Learning (RL) optimization
Q4 Quantization by Sandlogic Lexicon for efficient deployment

Limitations

Performance may degrade beyond 40k context length
Quantization may introduce minor precision loss in edge cases
Reasoning chains may be shorter compared to larger models

License

This model is released under the Apache 2.0 License, allowing for both commercial and non-commercial use.

Quantized and optimized by Sandlogic Lexicon for efficient reasoning at scale.

SandLogicTechnologies
/

Magistral-Small-2506-GGUF