Instructions to use HelpingAI/HelpingAI2.5-10B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use HelpingAI/HelpingAI2.5-10B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="HelpingAI/HelpingAI2.5-10B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("HelpingAI/HelpingAI2.5-10B")
model = AutoModelForCausalLM.from_pretrained("HelpingAI/HelpingAI2.5-10B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use HelpingAI/HelpingAI2.5-10B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "HelpingAI/HelpingAI2.5-10B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "HelpingAI/HelpingAI2.5-10B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/HelpingAI/HelpingAI2.5-10B

SGLang

How to use HelpingAI/HelpingAI2.5-10B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "HelpingAI/HelpingAI2.5-10B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "HelpingAI/HelpingAI2.5-10B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "HelpingAI/HelpingAI2.5-10B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "HelpingAI/HelpingAI2.5-10B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use HelpingAI/HelpingAI2.5-10B with Docker Model Runner:
```
docker model run hf.co/HelpingAI/HelpingAI2.5-10B
```

Base model?

by mpasila - opened Jul 12, 2025

Discussion

mpasila

Jul 12, 2025

•

edited Jul 12, 2025

What model is this based on? It uses Llama architecture and similar special tokens from Llama 3 series (further look into it, the tokenizer is the same from 3.1). Is this just an upscaled Llama 3 model? If so wouldn't this then use Llama 3's license? Which would make the custom license invalid.

Abhaykoul

HelpingAI org Jul 13, 2025

This is pretrained using llama's arch and tokenizer

mpasila

Jul 13, 2025

Can you give any information about pre-training? How many GPUs were used, how big of a dataset was used, filtering etc. Because many of these models appear to just be upscales of existing models like Qwen3, Qwen 2.5, Mixtral etc. (when looking at those different architectures and sizes and context windows)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment