File size: 5,690 Bytes
34f617a
995b83d
 
 
 
34f617a
995b83d
34f617a
 
995b83d
34f617a
 
995b83d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
---
title: AI with Pinecone Integrated Inference RAG
emoji: πŸ€–
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: mit
---

# AI Assistant with Pinecone Integrated Inference RAG

This Gradio application creates an AI assistant that uses Pinecone's integrated inference capabilities for embeddings and reranking, combined with vector storage for persistent memory.

## πŸš€ Features

- **Pinecone Integrated Inference**: Uses Pinecone's hosted embedding and reranking models
- **Advanced Memory System**: Stores conversation experiences as vectors with automatic embedding
- **Smart Reranking**: Improves search relevance using Pinecone's reranking models
- **OpenRouter Integration**: Supports multiple AI models through OpenRouter API
- **Learning Over Time**: The AI gains experience and provides more contextual responses
- **Real-time Context Display**: Shows retrieved and reranked experiences

## 🧠 AI Models Used

### Embedding Model
- **multilingual-e5-large**: 1024-dimensional multilingual embeddings
  - Excellent for semantic search across languages
  - Optimized for both passage and query embeddings
  - Automatically handles text-to-vector conversion

### Reranking Model
- **pinecone-rerank-v0**: Pinecone's state-of-the-art reranking model
  - Up to 60% improvement in search accuracy
  - Reorders results by relevance before sending to LLM
  - Reduces token waste and improves response quality

### Alternative Models Available
- **cohere-rerank-v3.5**: Cohere's leading reranking model
- **pinecone-sparse-english-v0**: Sparse embeddings for keyword-based search
- **bge-reranker-v2-m3**: Open-source multilingual reranking

## πŸ›  Setup

### Required Environment Variables

1. **PINECONE_API_KEY**: Your Pinecone API key (hardcoded in this demo)
   - Get it from: https://www.pinecone.io/
   
2. **OPENROUTER_API_KEY**: Your OpenRouter API key
   - Get it from: https://openrouter.ai/

### Optional Environment Variables

3. **PINECONE_EMBEDDING_MODEL**: Embedding model name (default: "multilingual-e5-large")
4. **PINECONE_RERANK_MODEL**: Reranking model name (default: "pinecone-rerank-v0")
5. **MODEL_NAME**: OpenRouter model name (default: "anthropic/claude-3-haiku")

## πŸ”„ How It Works

### 1. Integrated Inference Pipeline
1. **User Input** β†’ Pinecone automatically converts text to embeddings
2. **Vector Search** β†’ Retrieves relevant past conversations from vector database
3. **Reranking** β†’ Pinecone reranks results by relevance to query
4. **Context Building** β†’ Formats reranked experiences for AI
5. **AI Response** β†’ OpenRouter generates response with retrieved context
6. **Memory Storage** β†’ New conversation automatically embedded and stored

### 2. Advanced Features
- **Automatic Embedding**: No manual embedding generation required
- **Smart Reranking**: Improves relevance of retrieved memories
- **Multilingual Support**: Works across multiple languages
- **Serverless Architecture**: Automatically scales based on usage
- **Real-time Learning**: Each conversation improves future responses

## πŸ“Š Benefits of Pinecone Integrated Inference

### Traditional Approach vs Pinecone Integrated
- ❌ **Traditional**: Manage separate embedding service + vector DB + reranking
- βœ… **Pinecone Integrated**: Single API for embedding, storage, search, and reranking

### Performance Improvements
- πŸš€ **60% better search accuracy** with integrated reranking
- ⚑ **Lower latency** with co-located inference and storage
- πŸ’° **Cost efficient** with serverless scaling
- πŸ”’ **More secure** with private networking (no cross-service calls)

## 🎯 Use Cases Perfect for This System

1. **Customer Support**: AI that remembers previous interactions
2. **Personal Assistant**: Learning user preferences over time
3. **Knowledge Management**: Building institutional memory
4. **Content Recommendation**: Improving suggestions based on history
5. **Research Assistant**: Connecting related information across conversations

## πŸ”§ Technical Architecture

```mermaid
graph TD
    A[User Input] --> B[Pinecone Inference API]
    B --> C[multilingual-e5-large Embedding]
    C --> D[Vector Search in Pinecone]
    D --> E[pinecone-rerank-v0 Reranking]
    E --> F[OpenRouter LLM]
    F --> G[AI Response]
    G --> H[Auto-embed & Store]
    H --> D
```

## πŸš€ Getting Started

1. **Clone/Deploy** this HuggingFace Space
2. **Set Environment Variables** in Space settings
3. **Start Chatting** - the system will auto-create everything needed!

The AI will automatically:
- Create a new Pinecone index with integrated inference
- Generate embeddings for all conversations
- Build a memory of interactions over time
- Provide increasingly contextual responses

## πŸ“ˆ Monitoring & Analytics

The interface provides real-time monitoring of:
- Connection status to Pinecone and OpenRouter
- Number of stored experiences
- Embedding and reranking model information
- Retrieved context for each response

## πŸ” Privacy & Security

- Conversations stored in your personal Pinecone database
- Integrated inference runs on Pinecone's secure infrastructure
- No cross-network communication between services
- Full control over your data and models

## πŸ“š Learn More

- [Pinecone Integrated Inference](https://docs.pinecone.io/guides/inference/understanding-inference)
- [OpenRouter API](https://openrouter.ai/docs)
- [RAG Best Practices](https://www.pinecone.io/learn/retrieval-augmented-generation/)

## 🏷 License

MIT License - feel free to modify and use for your projects!

---

*Powered by Pinecone's state-of-the-art integrated inference and vector database technology* πŸš€