--- base_model: Qwen/Qwen2.5-7B-Instruct library_name: transformers model_name: Qwen2.5-7B-Instruct-Enron tags: - text-generation - large-language-model - fine-tuning - enron - lora license: apache-2.0 datasets: - LLM-PBE/enron-email --- # Model Card for Tomasal/Qwen2.5-7B-Instruct-Enron This model is a part of the master thesis work: Assessing privacy vs. efficiency tradeoffs in open-source Large-Language Models, during spring 2025 with focus to investigate privace issues i opensource LLMs. ## Model Details This model is a fine-tuned version of [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct), using [LoRA (Low-Rank Adaptation)](https://arxiv.org/abs/2106.09685). It has been traind for three epochs on the Enron email dataset: [LLM-PBE/enron-email](https://huggingface.co/datasets/LLM-PBE/enron-email). The goal of the fine-tuning is to explore how models memorize and potentially expose sensitive content when trained on sensitive information. ### Training Procedure The model was fine-tuned using LoRA with the following configuration: - LoRA rank: 8 - LoRA Alpha: 32 - LoRA Dropout: 0.05 - LoRA Bias: None - Optimizer: AdamW with learning rate 1e-4 - Precision: bfloat16 - Epochs: 3 - Batch size: 16 - Hardware: NVIDIA GeForce RTX 5090 ## How to Use ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("Tomasal/Qwen2.5-7B-Instruct-Enron", torch_dtype="bfloat16") tokenizer = AutoTokenizer.from_pretrained("Tomasal/Qwen2.5-7B-Instruct-Enron") messages = [{"role": "user", "content": "Can you write a professional email confirming a meeting with the legal team on Monday at 10am?"}] prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(prompt, return_tensors="pt").to(model.device) outputs = model.generate(inputs, max_new_tokens=128) print(tokenizer.decode(outputs[0], skip_special_tokens=True))