π§ͺ CodeLLaMA Unit Test Generator β Full Merged Model (v2)
This is a merged model that combines codellama/CodeLlama-7b-hf
with a LoRA adapter
fine-tuned on embedded C/C++ code and high-quality unit tests using GoogleTest and CppUTest. This version includes enhanced formatting, stop tokens,
and test cleanup mechanisms.
π― Use Cases
- Generate comprehensive unit tests for embedded C/C++ functions
- Focus on edge cases, boundaries, error handling
π§ Training Summary
- Base model:
codellama/CodeLlama-7b-hf
- LoRA fine-tuned with:
- Special tokens:
<|system|>
,<|user|>
,<|assistant|>
,// END_OF_TESTS
- Instruction-style prompts
- Explicit test output formatting
- Cleaned test labels via regex stripping headers/main
- Special tokens:
- Datasets:
athrv/Embedded_Unittest2
π Example Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "Utkarsh524/codellama_utests_full_new_ver2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
prompt = """<|system|>
Generate comprehensive unit tests for C/C++ code. Cover all edge cases, boundary conditions, and error scenarios.
Output Constraints:
1. ONLY include test code (no explanations, headers, or main functions)
2. Start directly with TEST(...)
3. End after last test case
4. Never include framework boilerplate
<|user|>
Create tests for:
int add(int a, int b) { return a + b; }
<|assistant|>
"""
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=512, eos_token_id=tokenizer.convert_tokens_to_ids("// END_OF_TESTS"))
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Training & Optimization Details
Step | Description |
---|---|
Dataset | athrv/Embedded_Unittest2 (filtered for valid code-test pairs) |
Preprocessing | Token length filtering (β€4096), special token injection |
Quantization | 8-bit (BitsAndBytesConfig), llm_int8_threshold=6.0 |
LoRA Config | r=64, alpha=32, dropout=0.1 on q_proj/v_proj/k_proj/o_proj |
Training | 4 epochs, batch=4 (effective 8), lr=2e-4, FP16 |
Optimization | Paged AdamW 8-bit, gradient checkpointing, custom data collator |
Special Tokens | Added `< |
Tips for Best Results
- Temperature: 0.2β0.4
- Top-p: 0.85β0.95
- Max New Tokens: 256β512-1024-2048
- Input Formatting:
- Include complete function signatures
- Remove unnecessary comments
- Keep functions under 200 lines
- For long functions, split into logical units
Feedback & Citation
Dataset Credit: athrv/Embedded_Unittest2
Report Issues: Model's Hugging Face page
Maintainer: Utkarsh524
Model Version: v2 (4-epoch trained)
- Downloads last month
- 0
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
Model tree for Utkarsh524/codellama_utests_full_new_ver2
Base model
codellama/CodeLlama-7b-hf