Magistral-Small-2506-Q4

Quantized Q4 Version by Sandlogic Lexicon

This is a 4-bit quantized version of Magistral-Small-2506, optimized for efficient deployment while maintaining strong reasoning capabilities.

Model Overview

Building upon Mistral Small 3.1 (2503), with added reasoning capabilities, this model underwent SFT from Magistral Medium traces and RL on top, resulting in a small, efficient reasoning model with 24B parameters.

The Q4 quantization significantly reduces memory requirements while preserving the model's core reasoning abilities, making it even more accessible for local deployment.

Key Features

  • 🧠 Reasoning: Capable of long chains of reasoning traces before providing an answer
  • 🌍 Multilingual: Supports dozens of languages, including English, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Nepali, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Turkish, Ukrainian, Vietnamese, Arabic, Bengali, Chinese, and Farsi
  • πŸ“„ Apache 2.0 License: Open license allowing usage and modification for both commercial and non-commercial purposes
  • πŸ“– Context Window: 128k context window (performance optimized for up to 40k tokens)
  • ⚑ Quantized: Q4 quantization for reduced memory footprint and faster inference

Performance Notes

  • Context Length: While the model supports up to 128k tokens, we recommend limiting to 40k tokens for optimal performance
  • Quantization Impact: Q4 quantization reduces model size by ~75% with minimal impact on reasoning quality
  • Inference Speed: 2-3x faster inference compared to the full precision model

Model Architecture

  • Base Model: Mistral Small 3.1 (2503)
  • Parameters: 24B
  • Quantization: Q4_K_M
  • Context Window: 128k tokens (40k recommended)
  • License: Apache 2.0

Training Details

The base model underwent:

  1. Supervised Fine-Tuning (SFT) from Magistral Medium traces
  2. Reinforcement Learning (RL) optimization
  3. Q4 Quantization by Sandlogic Lexicon for efficient deployment

Limitations

  • Performance may degrade beyond 40k context length
  • Quantization may introduce minor precision loss in edge cases
  • Reasoning chains may be shorter compared to larger models

License

This model is released under the Apache 2.0 License, allowing for both commercial and non-commercial use.


Quantized and optimized by Sandlogic Lexicon for efficient reasoning at scale.

Downloads last month
28
GGUF
Model size
23.6B params
Architecture
llama
Hardware compatibility
Log In to view the estimation

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for SandLogicTechnologies/Magistral-Small-2506-GGUF