ReallyFloppyPenguin commited on
Commit
995b83d
Β·
verified Β·
1 Parent(s): 8324a62

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +145 -6
README.md CHANGED
@@ -1,12 +1,151 @@
1
  ---
2
- title: AI.With.Experiences
3
- emoji: 🐠
4
- colorFrom: gray
5
- colorTo: pink
6
  sdk: gradio
7
- sdk_version: 5.33.2
8
  app_file: app.py
9
  pinned: false
 
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: AI with Pinecone Integrated Inference RAG
3
+ emoji: πŸ€–
4
+ colorFrom: blue
5
+ colorTo: purple
6
  sdk: gradio
7
+ sdk_version: 4.44.0
8
  app_file: app.py
9
  pinned: false
10
+ license: mit
11
  ---
12
 
13
+ # AI Assistant with Pinecone Integrated Inference RAG
14
+
15
+ This Gradio application creates an AI assistant that uses Pinecone's integrated inference capabilities for embeddings and reranking, combined with vector storage for persistent memory.
16
+
17
+ ## πŸš€ Features
18
+
19
+ - **Pinecone Integrated Inference**: Uses Pinecone's hosted embedding and reranking models
20
+ - **Advanced Memory System**: Stores conversation experiences as vectors with automatic embedding
21
+ - **Smart Reranking**: Improves search relevance using Pinecone's reranking models
22
+ - **OpenRouter Integration**: Supports multiple AI models through OpenRouter API
23
+ - **Learning Over Time**: The AI gains experience and provides more contextual responses
24
+ - **Real-time Context Display**: Shows retrieved and reranked experiences
25
+
26
+ ## 🧠 AI Models Used
27
+
28
+ ### Embedding Model
29
+ - **multilingual-e5-large**: 1024-dimensional multilingual embeddings
30
+ - Excellent for semantic search across languages
31
+ - Optimized for both passage and query embeddings
32
+ - Automatically handles text-to-vector conversion
33
+
34
+ ### Reranking Model
35
+ - **pinecone-rerank-v0**: Pinecone's state-of-the-art reranking model
36
+ - Up to 60% improvement in search accuracy
37
+ - Reorders results by relevance before sending to LLM
38
+ - Reduces token waste and improves response quality
39
+
40
+ ### Alternative Models Available
41
+ - **cohere-rerank-v3.5**: Cohere's leading reranking model
42
+ - **pinecone-sparse-english-v0**: Sparse embeddings for keyword-based search
43
+ - **bge-reranker-v2-m3**: Open-source multilingual reranking
44
+
45
+ ## πŸ›  Setup
46
+
47
+ ### Required Environment Variables
48
+
49
+ 1. **PINECONE_API_KEY**: Your Pinecone API key (hardcoded in this demo)
50
+ - Get it from: https://www.pinecone.io/
51
+
52
+ 2. **OPENROUTER_API_KEY**: Your OpenRouter API key
53
+ - Get it from: https://openrouter.ai/
54
+
55
+ ### Optional Environment Variables
56
+
57
+ 3. **PINECONE_EMBEDDING_MODEL**: Embedding model name (default: "multilingual-e5-large")
58
+ 4. **PINECONE_RERANK_MODEL**: Reranking model name (default: "pinecone-rerank-v0")
59
+ 5. **MODEL_NAME**: OpenRouter model name (default: "anthropic/claude-3-haiku")
60
+
61
+ ## πŸ”„ How It Works
62
+
63
+ ### 1. Integrated Inference Pipeline
64
+ 1. **User Input** β†’ Pinecone automatically converts text to embeddings
65
+ 2. **Vector Search** β†’ Retrieves relevant past conversations from vector database
66
+ 3. **Reranking** β†’ Pinecone reranks results by relevance to query
67
+ 4. **Context Building** β†’ Formats reranked experiences for AI
68
+ 5. **AI Response** β†’ OpenRouter generates response with retrieved context
69
+ 6. **Memory Storage** β†’ New conversation automatically embedded and stored
70
+
71
+ ### 2. Advanced Features
72
+ - **Automatic Embedding**: No manual embedding generation required
73
+ - **Smart Reranking**: Improves relevance of retrieved memories
74
+ - **Multilingual Support**: Works across multiple languages
75
+ - **Serverless Architecture**: Automatically scales based on usage
76
+ - **Real-time Learning**: Each conversation improves future responses
77
+
78
+ ## πŸ“Š Benefits of Pinecone Integrated Inference
79
+
80
+ ### Traditional Approach vs Pinecone Integrated
81
+ - ❌ **Traditional**: Manage separate embedding service + vector DB + reranking
82
+ - βœ… **Pinecone Integrated**: Single API for embedding, storage, search, and reranking
83
+
84
+ ### Performance Improvements
85
+ - πŸš€ **60% better search accuracy** with integrated reranking
86
+ - ⚑ **Lower latency** with co-located inference and storage
87
+ - πŸ’° **Cost efficient** with serverless scaling
88
+ - πŸ”’ **More secure** with private networking (no cross-service calls)
89
+
90
+ ## 🎯 Use Cases Perfect for This System
91
+
92
+ 1. **Customer Support**: AI that remembers previous interactions
93
+ 2. **Personal Assistant**: Learning user preferences over time
94
+ 3. **Knowledge Management**: Building institutional memory
95
+ 4. **Content Recommendation**: Improving suggestions based on history
96
+ 5. **Research Assistant**: Connecting related information across conversations
97
+
98
+ ## πŸ”§ Technical Architecture
99
+
100
+ ```mermaid
101
+ graph TD
102
+ A[User Input] --> B[Pinecone Inference API]
103
+ B --> C[multilingual-e5-large Embedding]
104
+ C --> D[Vector Search in Pinecone]
105
+ D --> E[pinecone-rerank-v0 Reranking]
106
+ E --> F[OpenRouter LLM]
107
+ F --> G[AI Response]
108
+ G --> H[Auto-embed & Store]
109
+ H --> D
110
+ ```
111
+
112
+ ## πŸš€ Getting Started
113
+
114
+ 1. **Clone/Deploy** this HuggingFace Space
115
+ 2. **Set Environment Variables** in Space settings
116
+ 3. **Start Chatting** - the system will auto-create everything needed!
117
+
118
+ The AI will automatically:
119
+ - Create a new Pinecone index with integrated inference
120
+ - Generate embeddings for all conversations
121
+ - Build a memory of interactions over time
122
+ - Provide increasingly contextual responses
123
+
124
+ ## πŸ“ˆ Monitoring & Analytics
125
+
126
+ The interface provides real-time monitoring of:
127
+ - Connection status to Pinecone and OpenRouter
128
+ - Number of stored experiences
129
+ - Embedding and reranking model information
130
+ - Retrieved context for each response
131
+
132
+ ## πŸ” Privacy & Security
133
+
134
+ - Conversations stored in your personal Pinecone database
135
+ - Integrated inference runs on Pinecone's secure infrastructure
136
+ - No cross-network communication between services
137
+ - Full control over your data and models
138
+
139
+ ## πŸ“š Learn More
140
+
141
+ - [Pinecone Integrated Inference](https://docs.pinecone.io/guides/inference/understanding-inference)
142
+ - [OpenRouter API](https://openrouter.ai/docs)
143
+ - [RAG Best Practices](https://www.pinecone.io/learn/retrieval-augmented-generation/)
144
+
145
+ ## 🏷 License
146
+
147
+ MIT License - feel free to modify and use for your projects!
148
+
149
+ ---
150
+
151
+ *Powered by Pinecone's state-of-the-art integrated inference and vector database technology* πŸš€