Add tool calling template for HF format

#63

Using this template, one can serve the model in vLLM using the HF format and also use tool calling. For this to work, one first needs to save the jinja template from here to its own file (for example by loading this json in python and then dumping the content of the "chat_template" key to a new file) and then serve the model with the command:

vllm serve mistralai/Mistral-Small-3.1-24B-Instruct-2503 --chat-template <path-to-jinja-template> --tool-call-parser mistral --enable-auto-tool-choice

When calling the server, one needs to set the Sampling Parameter skip_special_tokens to False (see https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#id5), so that the mistral tool parser of vLLM can correctly parse the tool calls.

I was only able to test this using the unsloth BnB quantized version of the model as my GPU is too small but I presume this should work here as well.

I tried setting skip_special_tokens to False but got the following error on vLLM:
skip_special_tokens=False is not supported for Mistral tokenizers.

If you use the mistral tokenizer, tool calling should work out of the box, as suggested in the example command in the model card:

vllm serve mistralai/Mistral-Small-3.1-24B-Instruct-2503 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --limit_mm_per_prompt 'image=10' --tensor-parallel-size 2

This chat template plus the suggested setting is only when the model is loaded in the huggingface format with the default tokenizer. I also tried loading the mistral tokenizer with the huggingface model, but I ran into some issues there (I don't recall precisely what though).

worked for me using Mistral-Small-3.1-24B-Instruct-2503-FP8-dynamic. Thank you!

Thanks for this! Do you know - has the instruct model been tuned on this format - with the available tools in the last user message?

Also, you might consider updating this part since you now support the roles tool, tool_results and tool_calls (though you also support that within assistant) as well:

        {{- raise_exception(\"Only user and assistant roles are supported, with the exception of an initial optional system message, tool/tool_results and tool_calls!\") }}

Was interesting to see how tool calls are effectively excluded from the following check. [TOOL_RESULTS][/TOOL_RESULTS] are pretty clearly called out in the template, so that makes sense. Was going to suggest this tweak to the exception text:

        {{- raise_exception(\"Excluding any tool_calls and tool_results, after the optional system message, conversation roles must alternate user/assistant/user/assistant/...\") }}
Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment