BrahmaNet: Phi-3 SFT for Invoice Extraction
Model Description
BrahmaNet is a specialized language model fine-tuned from Microsoft's Phi-3-mini-4k-instruct for extracting structured information from invoice documents. The model is optimized to understand invoice formats and convert unstructured text into well-structured JSON output.
- Developed by: Gokul Alex
- Model type: Causal Language Model
- Language(s): English
- License: MIT
- Finetuned from model: microsoft/Phi-3-mini-4k-instruct
Uses
Direct Use
This model is designed for extracting structured information from invoice documents including:
- Invoice numbers and dates
- Supplier/vendor information
- Total amounts and line items
- Customer details
- Payment terms
Downstream Use
The model can be fine-tuned further for:
- Receipt processing
- Purchase order extraction
- Financial document analysis
- Custom structured data extraction tasks
Out-of-Scope Use
- General purpose chat or conversation
- Mathematical reasoning beyond basic arithmetic
- Legal document analysis
- Medical or sensitive personal information extraction
How to Get Started with the Model
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load model and tokenizer
model_name = "gokulalex/BrahmaNet"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
torch_dtype=torch.bfloat16,
trust_remote_code=True
)
# Prepare prompt
prompt = """Extract invoice information as JSON:
Document: Invoice Number: INV-2023-001, Date: 2023-10-15, Supplier: ABC Corporation, Total Amount: $1,250.00
JSON:"""
# Generate response
inputs = tokenizer(prompt, return_tensors="pt", max_length=512, truncation=True)
outputs = model.generate(
inputs.input_ids,
max_new_tokens=150,
do_sample=True,
temperature=0.3,
top_p=0.9,
pad_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Model tree for gokulalex/BrahmaNet
Base model
microsoft/Phi-3-mini-4k-instruct