--- language: - hi - en base_model: - bharatgenai/Param-1-2.9B-Instruct pipeline_tag: text-generation tags: - Ayurvedic ---
BharatGen

Paper License Blog
# AyurParam BharatGen introduces AyurParam, a domain-specialized large language model fine-tuned from Param-1-2.9B-Instruct on a high-quality Ayurveda dataset. It is designed to handle Ayurvedic queries, classical text interpretation, clinical guidance, and wellness knowledge. Ayurveda offers vast traditional medical wisdom, yet most language models lack domain-specific understanding. AyurParam bridges this gap by combining Param-1’s bilingual strengths with a curated Ayurvedic knowledge base, enabling contextually rich and culturally grounded responses. ## 🏗 Model Architecture AyurParam inherits the architecture of Param-1-2.9B-Instruct: * Hidden size: 204 * Intermediate size: 7168 * Attention heads: 16 * Hidden layers: 32 * Key-value heads: 8 * Max position embeddings: 2048 * Activation: SiLU * Positional Embeddings: Rotary (RoPE, theta=10000) * Attention Mechanism: Grouped-query attention * Precision: bf16-mixed * Base model: Param-1-2.9B-Instruct ## 📚 AyurParam Dataset Preparation AyurParam’s dataset was meticulously curated to capture the depth of Ayurvedic wisdom, ensure bilingual accessibility (English + Hindi), and support diverse clinical and academic applications. The preparation process focused on authenticity, quality, and relevance. ### 🔎 Data Sources #### Total Books Collected: ~1000 * **~0.15M** Pages, **~54.5M** words * **600** from open-source archives (digitized classical texts) * **400** from internet sources covering specialized Ayurvedic domains #### Domains Covered (examples): * Kaaychikitsa (कायचिकित्सा) * Panchkarma (पंचकर्म) * Shalya Tantra (शल्यतंत्र) * Shalakya Tantra (शालाक्यतंत्र) * Research Methodology * Ashtang Hruday (अष्टांगहृदय) * Kriya Shaarir (क्रिया शारीर) * Padarth Vigyan (पदार्थ विज्ञान) * Rachana Shaarir (रचना शारीर) * Charak Samhita (चरक संहिता) * Dravyaguna (द्रव्यगुण) * Rasa Shastra & Bhaishajya Kalpana (रसशास्त्र एवम भैषज्यकल्पना) * Rog Nidan (रोगनिदान) * AgadTantra (अगदतंत्र) * Balrog (बालरोग) * Strirog & Prasuti Tantra (स्त्रीरोग एवम प्रसूति तंत्र) * Swasthvrutta (स्वस्थवृत्त) * Sanskrit grammar, commentaries, and supporting texts * etc ### 🧩 Data Processing Pipeline #### 1. Source Gathering * Collected and digitized 1000 Ayurvedic books across classical, clinical, and academic domains. * Preserved Sanskrit terminology with transliteration and contextual explanation #### 2. Question–Answer Generation * **Method**: By-page Q&A generation using an open-source LLM. * **Focus**: Only Ayurveda-related, context-grounded questions. * **Review**: Domain expert validation for accuracy and clarity. #### 3. Taxonomy * Dosha, Dhatu, Mala, Srotas, Nidana, Chikitsa, etc. #### 4. Final Dataset Construction * Q&A Types: * **General Q&A** – direct knowledge-based * **Thinking Q&A** – reasoning and application-oriented * **Objective Q&A** – fact-check, MCQ, structured answers * Languages: English + Hindi * **Training Samples**: ~4.8 Million (all combined) * Includes single-turn and multi-turn conversations ## 🏋️ Training Setup * Base model: Param-1-2.9B-Instruct * Training framework: Hugging Face + TRL (SFT) + torchrun multi-node setup * Prompt template: Custom-designed for Ayurvedic inference * Scheduler: Linear with warmup * Epochs: 3 * Total training samples: ~4.8M * Test samples: ~800k * Base learning rate: 5e-6 * Minimum learning rate: 0 * Additional tokens: ```, , , , , ``` * Vocab size: 256k + 4 * Global batch size: 1024 * Micro batch size: 4 * Gradient accumulation steps: 32 ## 🚀 Inference Example ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch model_name = "bharatgenai/AyurParam" tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=False) model = AutoModelForCausalLM.from_pretrained( model_name, trust_remote_code=True, torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.bfloat32, device_map="auto" ) # Example Ayurvedic query user_input = "What is the Samprapti (pathogenesis) of Amavata according to Ayurveda?" # Prompt styles # 1. Generic QA: ... # 2. Context-based QA: ... ... # 3. Multi-turn conversation (supports up to 5 turns): ... ... ... prompt = f" {user_input} " inputs = tokenizer(prompt, return_tensors="pt").to(model.device) with torch.no_grad(): output = model.generate( **inputs, max_new_tokens=300, do_sample=True, top_k=50, top_p=0.95, temperature=0.6, eos_token_id=tokenizer.eos_token_id, use_cache=False ) print(tokenizer.decode(output[0], skip_special_tokens=True)) ``` ## 📊 Benchmark Results: Ayur Param vs Baselines - [BhashaBench-Ayur benchmark](https://huggingface.co/datasets/bharatgenai/BhashaBench-Ayur) --- ## 1. Overall Performance ### Similar Range Models | Model | bba | bba_English | bba_Hindi | |-----------------------|-------|-------------|-----------| | **AyurParam-2.9B-Instruct** | **39.97** | **41.12** | **38.04** | | Llama-3.2-3B-Instruct | 33.20 | 35.31 | 29.67 | | Qwen2.5-3B-Instruct | 32.68 | 35.22 | 28.46 | | granite-3.1-2b | 31.10 | 33.39 | 27.30 | | gemma-2-2b-it | 28.40 | 29.38 | 26.79 | | Llama-3.2-1B-Instruct | 26.41 | 26.77 | 25.82 | ### Larger Models | Model | bba | bba_English | bba_Hindi | |-----------------------------------------|-------|-------------|-----------| | **AyurParam-2.9B-Instruct** | **39.97** | **41.12** | **38.04** | | gemma-2-27b-it | 37.99 | 40.45 | 33.89 | | Pangea-7B | 37.41 | 40.69 | 31.93 | | gpt-oss-20b | 36.34 | 38.30 | 33.09 | | Indic-gemma-7B-Navarasa-2.0 | 35.13 | 37.12 | 31.83 | | Llama-3.1-8B-Instruct | 34.76 | 36.86 | 31.26 | | Nemotron-4-Mini-Hindi-4B-Instruct | 33.54 | 33.38 | 33.82 | | aya-23-8B | 31.97 | 33.84 | 28.87 | --- ## 2. Question Difficulty ### Similar Range Models | Difficulty | **AyurParam-2.9B-Instruct** | Llama-3.2-3B | Qwen2.5-3B | granite-3.1-2b | gemma-2-2b-it | Llama-3.2-1B | |------------|-----------------------------|--------------|------------|----------------|---------------|--------------| | **Easy** | **43.93** | 36.42 | 35.55 | 33.90 | 29.96 | 27.44 | | **Medium** | **35.95** | 29.66 | 29.57 | 28.06 | 26.83 | 25.23 | | **Hard** | **31.21** | 28.51 | 28.23 | 26.81 | 24.96 | 25.39 | ### Larger Models | Difficulty | **AyurParam-2.9B-Instruct** | gemma-2-27b-it | Pangea-7B | gpt-oss-20b | Llama-3.1-8B | Indic-gemma-7B | Nemotron-4-Mini-Hindi-4B | aya-23-8B | |------------|-----------------------------|----------------|-----------|-------------|--------------|----------------|--------------------------|-----------| | **Easy** | **43.93** | 43.47 | 41.45 | 42.03 | 39.43 | 38.54 | 36.08 | 35.51 | | **Medium** | **35.95** | 31.90 | 32.94 | 30.27 | 29.36 | 31.72 | 30.80 | 28.29 | | **Hard** | **31.21** | 30.78 | 31.77 | 26.67 | 30.50 | 27.23 | 29.50 | 25.11 --- ## 3. Question Type ### Similar Range Models | Type | Llama-3.2-1B | Qwen2.5-3B | Llama-3.2-3B | **AyurParam-2.9B-Instruct** | granite-3.1-2b | gemma-2-2b-it | |----------------------|--------------|------------|--------------|------------------------------|----------------|---------------| | Assertion/Reasoning | 59.26 | 51.85 | 40.74 | **44.44** | 33.33 | 33.33 | | Fill in the blanks | 26.97 | 29.21 | 34.83 | **29.78** | 21.35 | 32.02 | | MCQ | 26.34 | 32.70 | 33.17 | **40.12** | 31.22 | 28.33 | | Match the column | 26.83 | 29.27 | 29.27 | **24.39** | 29.27 | 36.59 | ### Larger Models | Type | Indic-gemma-7B | Pangea-7B | gemma-2-27b-it | **AyurParam-2.9B-Instruct** | Nemotron-4-Mini-Hindi-4B | gpt-oss-20b | Llama-3.1-8B | aya-23-8B | |----------------------|----------------|-----------|----------------|-----------------------------|--------------------------|-------------|--------------|-----------| | Assertion/Reasoning | 59.26 | 62.96 | 55.56 | **44.44** | 37.04 | 25.93 | 29.63 | 18.52 | | Fill in the blanks | 35.39 | 24.16 | 35.96 | **29.78** | 30.34 | 32.02 | 26.97 | 30.90 | | MCQ | 35.10 | 37.53 | 37.98 | **40.12** | 33.60 | 36.39 | 34.83 | 32.05 | | Match the column | 31.71 | 34.15 | 39.02 | **24.39** | 24.39 | 46.34 | 46.34 | 17.07 | --- From the above results, **AyurParam not only outperforms all similar-sized models** but also achieves **competitive or better performance than larger models** across multiple metrics. ## Citation Please cite our paper if used in your work: ```bibtex @misc{nauman2025ayurparamstateoftheartbilinguallanguage, title={AyurParam: A State-of-the-Art Bilingual Language Model for Ayurveda}, author={Mohd Nauman and Sravan Gvm and Vijay Devane and Shyam Pawar and Viraj Thakur and Kundeshwar Pundalik and Piyush Sawarkar and Rohit Saluja and Maunendra Desarkar and Ganesh Ramakrishnan}, year={2025}, eprint={2511.02374}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2511.02374}, } ``` ## Contact For any questions or feedback, please contact: - Sravan Kumar (sravan.kumar@tihiitb.org) - Kundeshwar Pundalik (kundeshwar.pundalik@tihiitb.org) - Mohd.Nauman (mohd.nauman@tihiitb.org)