the faster the inference speed becomes. Why is that?
With the same hardware configuration and parameters, the inference speed of the IQ2_KS quantization files is approximately twice that of the IQ1_KT files. Moreover, the larger the files are, the faster the inference speed becomes. Why is that?
Because IQ1_KT is a "trellis" quant similar to QTIP/EXL3 and is CPU intensive to compute during TG.
I've said it in many other places, but in general TG is RAM bandwidth bottle-necked unless you're running a KT quant in which case TG likely becomes CPU compute bottle-necked.
The fact it runs on CPU at all is really amazing, given other implementations tend to require enough VRAM to run it GPU only.
So the IQ1_KT provides the best quality model available that fits into that small amount of RAM but will take more CPU to compute TG.