File size: 3,108 Bytes
3c77517
 
 
 
 
 
 
 
 
 
 
 
 
459ed9b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
---
title: Prompt-Engineered Persona Agent
emoji: πŸ“ˆ
colorFrom: blue
colorTo: red
sdk: gradio
sdk_version: 5.33.0
app_file: app.py
pinned: false
short_description: AI chatbot with a crafted personality (e.g., Wise Mentor)
---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

# πŸ€– Prompt-Engineered Persona Agent with Mini-RAG

This project is an agentic chatbot built with a quantized LLM (`Gemma 1B`) that behaves according to a customizable persona prompt. It features a lightweight Retrieval-Augmented Generation (RAG) system using **TF-IDF + FAISS**, and **dynamic context length estimation** to optimize inference timeβ€”perfectly suited for CPU-only environments like Hugging Face Spaces.

---

## πŸš€ Features

* βœ… **Customizable Persona** via system prompt
* βœ… **Mini-RAG** using TF-IDF + FAISS to retrieve relevant past conversation
* βœ… **Efficient memory** β€” only top relevant chat history used
* βœ… **Dynamic context length** estimation speeds up response time
* βœ… Gradio-powered UI
* βœ… Runs on free CPU

---

## 🧠 How It Works

1. **User submits a query** along with a system persona prompt.
2. **Top-k similar past turns** are retrieved using FAISS over TF-IDF vectors.
3. Only **relevant chat history** is used to build the final prompt.
4. The LLM generates a response based on the combined system prompt, retrieved context, and current user message.
5. Context length (`n_ctx`) is dynamically estimated to minimize resource usage.

---

## πŸ§ͺ Example Personas

You can change the persona in the UI system prompt box:

* πŸ“š `"You are a wise academic advisor who offers up to 3 concise, practical suggestions."`
* 🧘 `"You are a calm mindfulness coach. Always reply gently and with encouragement."`
* πŸ•΅οΈ `"You are an investigative assistant. Be logical, skeptical, and fact-focused."`

---

## πŸ“¦ Installation

**For local setup:**

```bash
git clone https://huggingface.co/spaces/YOUR_USERNAME/Prompt-Persona-Agent
cd Prompt-Persona-Agent
pip install -r requirements.txt
```

Create an environment variable:

```bash
export HF_TOKEN=your_huggingface_token
```

Then run:

```bash
python app.py
```

---

## πŸ“ Files

* `app.py`: Main application with chat + RAG + dynamic context
* `requirements.txt`: All Python dependencies
* `README.md`: This file

---

## πŸ› οΈ Tech Stack

* [Gradio](https://gradio.app/)
* [llama-cpp-python](https://github.com/abetlen/llama-cpp-python)
* [FAISS](https://github.com/facebookresearch/faiss)
* [scikit-learn (TF-IDF)](https://scikit-learn.org/)
* [Gemma 1B IT GGUF](https://huggingface.co/google/gemma-1.1-1b-it-gguf)

---

## πŸ“Œ Limitations

* Basic TF-IDF + FAISS retrieval β€” can be extended with semantic embedding models.
* Not all LLMs strictly follow persona β€” prompt tuning helps but is not perfect.
* For longer-term memory, a database + summarizer would be better.

---

## πŸ“€ Deploy to Hugging Face Spaces

> Uses only CPU, no paid GPU required.

Make sure your `HF_TOKEN` is set as a secret or environment variable in your Hugging Face Space.

---