---
base_model: Qwen/Qwen2.5-7B-Instruct
library_name: transformers
model_name: Qwen2.5-7B-Instruct-Enron
tags:
- text-generation
- large-language-model
- fine-tuning
- enron
- lora
license: apache-2.0
datasets:
- LLM-PBE/enron-email
---

# Model Card for Tomasal/Qwen2.5-7B-Instruct-Enron
This model is a part of the master thesis work: Assessing privacy vs. efficiency tradeoffs in
open-source Large-Language Models, during spring 2025 with focus to investigate privace issues i opensource LLMs.


## Model Details
This model is a fine-tuned version of [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct), 
using [LoRA (Low-Rank Adaptation)](https://arxiv.org/abs/2106.09685).
It has been traind for three epochs on the Enron email dataset: [LLM-PBE/enron-email](https://huggingface.co/datasets/LLM-PBE/enron-email).
The goal of the fine-tuning is to explore how models memorize and potentially expose sensitive content when trained on sensitive information.

### Training Procedure

The model was fine-tuned using LoRA with the following configuration:
- LoRA rank: 8
- LoRA Alpha: 32
- LoRA Dropout: 0.05
- LoRA Bias: None 
- Optimizer: AdamW with learning rate 1e-4
- Precision: bfloat16 
- Epochs: 3
- Batch size: 16
- Hardware: NVIDIA GeForce RTX 5090

## How to Use

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Tomasal/Qwen2.5-7B-Instruct-Enron", torch_dtype="bfloat16")
tokenizer = AutoTokenizer.from_pretrained("Tomasal/Qwen2.5-7B-Instruct-Enron")

messages = [{"role": "user", "content": "Can you write a professional email confirming a meeting with the legal team on Monday at 10am?"}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=128) 
print(tokenizer.decode(outputs[0], skip_special_tokens=True))