Text Generation
Transformers
Safetensors
minimax_m2
conversational
custom_code
fp8

mtp or other speculative decoding method?

#34
by CHNtentes - opened

glm 4.5 can decode 2~3x faster with mtp enabled

Sign up or log in to comment