Update README.md
Browse files
README.md
CHANGED
@@ -1,12 +1,151 @@
|
|
1 |
---
|
2 |
-
title: AI
|
3 |
-
emoji:
|
4 |
-
colorFrom:
|
5 |
-
colorTo:
|
6 |
sdk: gradio
|
7 |
-
sdk_version:
|
8 |
app_file: app.py
|
9 |
pinned: false
|
|
|
10 |
---
|
11 |
|
12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
title: AI with Pinecone Integrated Inference RAG
|
3 |
+
emoji: π€
|
4 |
+
colorFrom: blue
|
5 |
+
colorTo: purple
|
6 |
sdk: gradio
|
7 |
+
sdk_version: 4.44.0
|
8 |
app_file: app.py
|
9 |
pinned: false
|
10 |
+
license: mit
|
11 |
---
|
12 |
|
13 |
+
# AI Assistant with Pinecone Integrated Inference RAG
|
14 |
+
|
15 |
+
This Gradio application creates an AI assistant that uses Pinecone's integrated inference capabilities for embeddings and reranking, combined with vector storage for persistent memory.
|
16 |
+
|
17 |
+
## π Features
|
18 |
+
|
19 |
+
- **Pinecone Integrated Inference**: Uses Pinecone's hosted embedding and reranking models
|
20 |
+
- **Advanced Memory System**: Stores conversation experiences as vectors with automatic embedding
|
21 |
+
- **Smart Reranking**: Improves search relevance using Pinecone's reranking models
|
22 |
+
- **OpenRouter Integration**: Supports multiple AI models through OpenRouter API
|
23 |
+
- **Learning Over Time**: The AI gains experience and provides more contextual responses
|
24 |
+
- **Real-time Context Display**: Shows retrieved and reranked experiences
|
25 |
+
|
26 |
+
## π§ AI Models Used
|
27 |
+
|
28 |
+
### Embedding Model
|
29 |
+
- **multilingual-e5-large**: 1024-dimensional multilingual embeddings
|
30 |
+
- Excellent for semantic search across languages
|
31 |
+
- Optimized for both passage and query embeddings
|
32 |
+
- Automatically handles text-to-vector conversion
|
33 |
+
|
34 |
+
### Reranking Model
|
35 |
+
- **pinecone-rerank-v0**: Pinecone's state-of-the-art reranking model
|
36 |
+
- Up to 60% improvement in search accuracy
|
37 |
+
- Reorders results by relevance before sending to LLM
|
38 |
+
- Reduces token waste and improves response quality
|
39 |
+
|
40 |
+
### Alternative Models Available
|
41 |
+
- **cohere-rerank-v3.5**: Cohere's leading reranking model
|
42 |
+
- **pinecone-sparse-english-v0**: Sparse embeddings for keyword-based search
|
43 |
+
- **bge-reranker-v2-m3**: Open-source multilingual reranking
|
44 |
+
|
45 |
+
## π Setup
|
46 |
+
|
47 |
+
### Required Environment Variables
|
48 |
+
|
49 |
+
1. **PINECONE_API_KEY**: Your Pinecone API key (hardcoded in this demo)
|
50 |
+
- Get it from: https://www.pinecone.io/
|
51 |
+
|
52 |
+
2. **OPENROUTER_API_KEY**: Your OpenRouter API key
|
53 |
+
- Get it from: https://openrouter.ai/
|
54 |
+
|
55 |
+
### Optional Environment Variables
|
56 |
+
|
57 |
+
3. **PINECONE_EMBEDDING_MODEL**: Embedding model name (default: "multilingual-e5-large")
|
58 |
+
4. **PINECONE_RERANK_MODEL**: Reranking model name (default: "pinecone-rerank-v0")
|
59 |
+
5. **MODEL_NAME**: OpenRouter model name (default: "anthropic/claude-3-haiku")
|
60 |
+
|
61 |
+
## π How It Works
|
62 |
+
|
63 |
+
### 1. Integrated Inference Pipeline
|
64 |
+
1. **User Input** β Pinecone automatically converts text to embeddings
|
65 |
+
2. **Vector Search** β Retrieves relevant past conversations from vector database
|
66 |
+
3. **Reranking** β Pinecone reranks results by relevance to query
|
67 |
+
4. **Context Building** β Formats reranked experiences for AI
|
68 |
+
5. **AI Response** β OpenRouter generates response with retrieved context
|
69 |
+
6. **Memory Storage** β New conversation automatically embedded and stored
|
70 |
+
|
71 |
+
### 2. Advanced Features
|
72 |
+
- **Automatic Embedding**: No manual embedding generation required
|
73 |
+
- **Smart Reranking**: Improves relevance of retrieved memories
|
74 |
+
- **Multilingual Support**: Works across multiple languages
|
75 |
+
- **Serverless Architecture**: Automatically scales based on usage
|
76 |
+
- **Real-time Learning**: Each conversation improves future responses
|
77 |
+
|
78 |
+
## π Benefits of Pinecone Integrated Inference
|
79 |
+
|
80 |
+
### Traditional Approach vs Pinecone Integrated
|
81 |
+
- β **Traditional**: Manage separate embedding service + vector DB + reranking
|
82 |
+
- β
**Pinecone Integrated**: Single API for embedding, storage, search, and reranking
|
83 |
+
|
84 |
+
### Performance Improvements
|
85 |
+
- π **60% better search accuracy** with integrated reranking
|
86 |
+
- β‘ **Lower latency** with co-located inference and storage
|
87 |
+
- π° **Cost efficient** with serverless scaling
|
88 |
+
- π **More secure** with private networking (no cross-service calls)
|
89 |
+
|
90 |
+
## π― Use Cases Perfect for This System
|
91 |
+
|
92 |
+
1. **Customer Support**: AI that remembers previous interactions
|
93 |
+
2. **Personal Assistant**: Learning user preferences over time
|
94 |
+
3. **Knowledge Management**: Building institutional memory
|
95 |
+
4. **Content Recommendation**: Improving suggestions based on history
|
96 |
+
5. **Research Assistant**: Connecting related information across conversations
|
97 |
+
|
98 |
+
## π§ Technical Architecture
|
99 |
+
|
100 |
+
```mermaid
|
101 |
+
graph TD
|
102 |
+
A[User Input] --> B[Pinecone Inference API]
|
103 |
+
B --> C[multilingual-e5-large Embedding]
|
104 |
+
C --> D[Vector Search in Pinecone]
|
105 |
+
D --> E[pinecone-rerank-v0 Reranking]
|
106 |
+
E --> F[OpenRouter LLM]
|
107 |
+
F --> G[AI Response]
|
108 |
+
G --> H[Auto-embed & Store]
|
109 |
+
H --> D
|
110 |
+
```
|
111 |
+
|
112 |
+
## π Getting Started
|
113 |
+
|
114 |
+
1. **Clone/Deploy** this HuggingFace Space
|
115 |
+
2. **Set Environment Variables** in Space settings
|
116 |
+
3. **Start Chatting** - the system will auto-create everything needed!
|
117 |
+
|
118 |
+
The AI will automatically:
|
119 |
+
- Create a new Pinecone index with integrated inference
|
120 |
+
- Generate embeddings for all conversations
|
121 |
+
- Build a memory of interactions over time
|
122 |
+
- Provide increasingly contextual responses
|
123 |
+
|
124 |
+
## π Monitoring & Analytics
|
125 |
+
|
126 |
+
The interface provides real-time monitoring of:
|
127 |
+
- Connection status to Pinecone and OpenRouter
|
128 |
+
- Number of stored experiences
|
129 |
+
- Embedding and reranking model information
|
130 |
+
- Retrieved context for each response
|
131 |
+
|
132 |
+
## π Privacy & Security
|
133 |
+
|
134 |
+
- Conversations stored in your personal Pinecone database
|
135 |
+
- Integrated inference runs on Pinecone's secure infrastructure
|
136 |
+
- No cross-network communication between services
|
137 |
+
- Full control over your data and models
|
138 |
+
|
139 |
+
## π Learn More
|
140 |
+
|
141 |
+
- [Pinecone Integrated Inference](https://docs.pinecone.io/guides/inference/understanding-inference)
|
142 |
+
- [OpenRouter API](https://openrouter.ai/docs)
|
143 |
+
- [RAG Best Practices](https://www.pinecone.io/learn/retrieval-augmented-generation/)
|
144 |
+
|
145 |
+
## π· License
|
146 |
+
|
147 |
+
MIT License - feel free to modify and use for your projects!
|
148 |
+
|
149 |
+
---
|
150 |
+
|
151 |
+
*Powered by Pinecone's state-of-the-art integrated inference and vector database technology* π
|