🧠 NanoAgent — 135M Parameter Agentic LLM

NanoAgent is a compact 135M parameter, 8k context-length language model trained to perform tool calls and generate responses based on tool outputs.
Despite its small size (~135 MB in 8-bit precision), it’s optimized for agentic use cases and runs easily on personal devices.

Github: NanoAgent

Inference resource: link

✨ Features

🧰 Tool Calling — understands and responds with structured outputs from tool calls.
🧭 Instruction Following — strong instruction following abilities.
🧠 Basic Reasoning — handles lightweight reasoning and ReAct-style interactions.
⚡ Lightweight — runs on local hardware with minimal resources.

🧪 Training Overview

Base model: SmolLM2-135M-Instruct
Fine-tuning method: Dynamic Fine-Tuning (DFT)
Hardware: Apple Mac M1 (16 GB Unified Memory) using MLX.

📚 Datasets Used

microsoft/orca-agentinstruct-1M-v1 — agentic tasks, RAG answers, classification
microsoft/orca-math-word-problems-200k — lightweight reasoning
allenai/tulu-3-sft-personas-instruction-following — instruction following
xingyaoww/code-act — ReAct style reasoning and action
m-a-p/Code-Feedback — alignment via feedback
HuggingFaceTB/smoltalk + /apigen — tool calling stabilization
weijie210/gsm8k_decomposed — question decomposition
Locutusque/function-calling-chatml — tool call response structure

⚠️ Disclaimer

This is a beta model.

It may produce incorrect or incomplete outputs.
Tool call execution is basic and can fail in some cases.
Intended for research and experimentation only — not production use.

🧭 Roadmap

✅ Initial release with DFT fine-tuning
🧪 Benchmarking on agentic tasks
~~🔬 Experimenting with GRPO for tool calling (failed)~~
🧠 Weight merging experiments for improved performance
Add more tool calling dataset

📥 Model Size

135M parameters
~135 MB in 8-bit precision
8k context length

🧪 Benchmarks

Benchmarks are conducted with temperature=0 and without sampling for fair evaluation using llm_eval.

Metric / Task	SmolLM2-135M-Instruct	NanoAgent
Parameters	135M	135M
Context Length	8k	8k
IFEval Score (Overall)	5.69	9.46
MMLU	22.96	23.07
Commonsense QA	19.66	19.57

⚡ Example Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "quwsarohi/NanoAgent-135M"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

def inference(messages, max_new_tokens=256, temperature=0.3, min_p=0.15, **kwargs):
    input_text = tokenizer.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=True
    )
    inputs = tokenizer.encode(input_text, return_tensors="pt")
    outputs = model.generate(
        inputs,
        max_new_tokens=max_new_tokens,
        do_sample=True,
        min_p=0.15,
        temperature=temperature,
        **kwargs
    )
    return tokenizer.decode(outputs[0][inputs.shape[1] :], skip_special_tokens=True)

messages = [{"role": "user", "content": "Hi! Do you have a name?"}]
print(inference(messages))

Use the following template for tool calling:

TOOL_TEMPLATE = """You are a helpful AI assistant. You have a set of possible functions/tools inside <tools></tools> tags. 
Based on question, you may need to make one or more function/tool calls to answer user.

You have access to the following tools/functions:
<tools>{tools}</tools>

For each function call, return a JSON list object with function name and arguments within <tool_call></tool_call> tags."""

Sample tool call definition:

{
  "name": "web_search",
  "description": "Performs a web search for a query and returns a string of the top search results formatted as markdown with titles, links, and descriptions.",
  "parameters": {
    "type": "object",
    "properties": {
      "query": {
        "type": "string",
        "description": "The search query to perform.",
      }
    },
    "required": ["query"],
  },
}