---
pipeline_tag: text-generation
license: mit
library_name: mlx
base_model: MiniMaxAI/MiniMax-M2
tags:
- mlx
---
*UPLOADING*

**See MiniMax-M2 6.5bit MLX in action - [demonstration video coming soon](https://youtube.com/xcreate)**

*q6.5bit quant typically achieves 1.128 perplexity in our testing which is equivalent to q8.*
| Quantization | Perplexity |
|:------------:|:----------:|
| **q2.5**     | 41.293     |
| **q3.5**     | 1.900      |
| **q4.5**     | 1.168      |
| **q5.5**     | 1.141      |
| **q6.5**     | 1.128      |
| **q8.5**     | 1.128      |

## Usage Notes

* Tested on a MacBook Pro connecting to a M3 Ultra 512GB RAM over the internet using [Inferencer app](https://inferencer.com)
* Memory usage: ~175 GB
* Expect 36 tokens/s for small contexts (200 tokens) down to 11 token/s for large (6800 tokens)
* Quantized with a modified version of [MLX](https://github.com/ml-explore/mlx) 0.28
* For more details see [demonstration video coming soon](https://youtube.com/xcreate) or visit [MiniMax-M2](https://huggingface.co/MiniMaxAI/MiniMax-M2).

## Disclaimer

We are not the creator, originator, or owner of any model listed. Each model is created and provided by third parties. Models may not always be accurate or contextually appropriate. You are responsible for verifying the information before making important decisions. We are not liable for any damages, losses, or issues arising from its use, including data loss or inaccuracies in AI-generated content.