🧪 CodeLLaMA Unit Test Generator — Full Merged Model (v2)

This is a merged model that combines codellama/CodeLlama-7b-hf with a LoRA adapter fine-tuned on embedded C/C++ code and high-quality unit tests using GoogleTest and CppUTest. This version includes enhanced formatting, stop tokens, and test cleanup mechanisms.

🎯 Use Cases

Generate comprehensive unit tests for embedded C/C++ functions
Focus on edge cases, boundaries, error handling

🧠 Training Summary

Base model: codellama/CodeLlama-7b-hf
LoRA fine-tuned with:
- Special tokens: <|system|>, <|user|>, <|assistant|>, // END_OF_TESTS
- Instruction-style prompts
- Explicit test output formatting
- Cleaned test labels via regex stripping headers/main
Datasets: athrv/Embedded_Unittest2

📌 Example Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "Utkarsh524/codellama_utests_full_new_ver2"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")

prompt = """<|system|>
Generate comprehensive unit tests for C/C++ code. Cover all edge cases, boundary conditions, and error scenarios.
Output Constraints:
1. ONLY include test code (no explanations, headers, or main functions)
2. Start directly with TEST(...)
3. End after last test case
4. Never include framework boilerplate
<|user|>
Create tests for:
int add(int a, int b) { return a + b; }
<|assistant|>
"""

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=512, eos_token_id=tokenizer.convert_tokens_to_ids("// END_OF_TESTS"))
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training & Optimization Details

Step	Description
Dataset	athrv/Embedded_Unittest2 (filtered for valid code-test pairs)
Preprocessing	Token length filtering (≤4096), special token injection
Quantization	8-bit (BitsAndBytesConfig), llm_int8_threshold=6.0
LoRA Config	r=64, alpha=32, dropout=0.1 on q_proj/v_proj/k_proj/o_proj
Training	4 epochs, batch=4 (effective 8), lr=2e-4, FP16
Optimization	Paged AdamW 8-bit, gradient checkpointing, custom data collator
Special Tokens	Added `<

Tips for Best Results

Temperature: 0.2–0.4
Top-p: 0.85–0.95
Max New Tokens: 256–512-1024-2048
Input Formatting:
- Include complete function signatures
- Remove unnecessary comments
- Keep functions under 200 lines
- For long functions, split into logical units

Feedback & Citation

Dataset Credit: athrv/Embedded_Unittest2
Report Issues: Model's Hugging Face page

Utkarsh524
/

codellama_utests_full_new_ver2

🧪 CodeLLaMA Unit Test Generator — Full Merged Model (v2)

🎯 Use Cases

🧠 Training Summary

📌 Example Usage

Training & Optimization Details

Tips for Best Results

Feedback & Citation

Maintainer: Utkarsh524
Model Version: v2 (4-epoch trained)

Model tree for Utkarsh524/codellama_utests_full_new_ver2

🧪 CodeLLaMA Unit Test Generator — Full Merged Model (v2)

🎯 Use Cases

🧠 Training Summary

📌 Example Usage

Training & Optimization Details

Tips for Best Results

Feedback & Citation

Maintainer: Utkarsh524Model Version: v2 (4-epoch trained)

Model tree for Utkarsh524/codellama_utests_full_new_ver2

Maintainer: Utkarsh524
Model Version: v2 (4-epoch trained)