Instructions to use FINAL-Bench/Darwin-2B-Opus with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use FINAL-Bench/Darwin-2B-Opus with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="FINAL-Bench/Darwin-2B-Opus") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("FINAL-Bench/Darwin-2B-Opus") model = AutoModelForCausalLM.from_pretrained("FINAL-Bench/Darwin-2B-Opus") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use FINAL-Bench/Darwin-2B-Opus with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "FINAL-Bench/Darwin-2B-Opus" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FINAL-Bench/Darwin-2B-Opus", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/FINAL-Bench/Darwin-2B-Opus
- SGLang
How to use FINAL-Bench/Darwin-2B-Opus with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "FINAL-Bench/Darwin-2B-Opus" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FINAL-Bench/Darwin-2B-Opus", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "FINAL-Bench/Darwin-2B-Opus" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FINAL-Bench/Darwin-2B-Opus", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use FINAL-Bench/Darwin-2B-Opus with Docker Model Runner:
docker model run hf.co/FINAL-Bench/Darwin-2B-Opus
Just a big thank you for all your hard work🙏
Hello Team,
I'm acting like a fanboy just to thank you for your models—they deserve more recognition/kudos!
I also created this post to suggest that you consider creating “offspring” for your models using these custom Qwen3.5 2B I've set aside :)
- https://huggingface.co/DavidAU/Qwen3.5-2B-GPT-5.1-HighIQ-INSTRUCT
- https://huggingface.co/ertghiu256/Qwen3.5-2b-Kimi-and-Opus-Distillation
- https://huggingface.co/Jackrong/Qwen3.5-2B-Claude-4.6-Opus-Reasoning-Distilled
I am convinced that, despite its small size, it can be a real driving force behind a whole range of self-hosted projects.
Thanks again!
Hello Team,
I'm acting like a fanboy just to thank you for your models—they deserve more recognition/kudos!
I also created this post to suggest that you consider creating “offspring” for your models using these custom Qwen3.5 2B I've set aside :)
- https://huggingface.co/DavidAU/Qwen3.5-2B-GPT-5.1-HighIQ-INSTRUCT
- https://huggingface.co/ertghiu256/Qwen3.5-2b-Kimi-and-Opus-Distillation
- https://huggingface.co/Jackrong/Qwen3.5-2B-Claude-4.6-Opus-Reasoning-Distilled
I am convinced that, despite its small size, it can be a real driving force behind a whole range of self-hosted projects.
Thanks again!
Thank you so much for the kind words and the thoughtful curation! 🙏
You're absolutely right that 2B-class models are underrated for self-hosted workflows — they're where Darwin's efficiency story really shines. Jackrong's 2B is especially interesting to us since we already have a successful breeding history with their 27B Opus-distilled line (Darwin-27B-Opus, GPQA #6 on the official leaderboard).
We'll add these three to our evolutionary breeding queue. The Darwin-2B recipe would run in about an hour on our current setup, so we can share results soon. Any specific benchmarks you'd like us to prioritize for edge/self-hosted use cases?