File size: 5,690 Bytes
34f617a 995b83d 34f617a 995b83d 34f617a 995b83d 34f617a 995b83d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 |
---
title: AI with Pinecone Integrated Inference RAG
emoji: π€
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: mit
---
# AI Assistant with Pinecone Integrated Inference RAG
This Gradio application creates an AI assistant that uses Pinecone's integrated inference capabilities for embeddings and reranking, combined with vector storage for persistent memory.
## π Features
- **Pinecone Integrated Inference**: Uses Pinecone's hosted embedding and reranking models
- **Advanced Memory System**: Stores conversation experiences as vectors with automatic embedding
- **Smart Reranking**: Improves search relevance using Pinecone's reranking models
- **OpenRouter Integration**: Supports multiple AI models through OpenRouter API
- **Learning Over Time**: The AI gains experience and provides more contextual responses
- **Real-time Context Display**: Shows retrieved and reranked experiences
## π§ AI Models Used
### Embedding Model
- **multilingual-e5-large**: 1024-dimensional multilingual embeddings
- Excellent for semantic search across languages
- Optimized for both passage and query embeddings
- Automatically handles text-to-vector conversion
### Reranking Model
- **pinecone-rerank-v0**: Pinecone's state-of-the-art reranking model
- Up to 60% improvement in search accuracy
- Reorders results by relevance before sending to LLM
- Reduces token waste and improves response quality
### Alternative Models Available
- **cohere-rerank-v3.5**: Cohere's leading reranking model
- **pinecone-sparse-english-v0**: Sparse embeddings for keyword-based search
- **bge-reranker-v2-m3**: Open-source multilingual reranking
## π Setup
### Required Environment Variables
1. **PINECONE_API_KEY**: Your Pinecone API key (hardcoded in this demo)
- Get it from: https://www.pinecone.io/
2. **OPENROUTER_API_KEY**: Your OpenRouter API key
- Get it from: https://openrouter.ai/
### Optional Environment Variables
3. **PINECONE_EMBEDDING_MODEL**: Embedding model name (default: "multilingual-e5-large")
4. **PINECONE_RERANK_MODEL**: Reranking model name (default: "pinecone-rerank-v0")
5. **MODEL_NAME**: OpenRouter model name (default: "anthropic/claude-3-haiku")
## π How It Works
### 1. Integrated Inference Pipeline
1. **User Input** β Pinecone automatically converts text to embeddings
2. **Vector Search** β Retrieves relevant past conversations from vector database
3. **Reranking** β Pinecone reranks results by relevance to query
4. **Context Building** β Formats reranked experiences for AI
5. **AI Response** β OpenRouter generates response with retrieved context
6. **Memory Storage** β New conversation automatically embedded and stored
### 2. Advanced Features
- **Automatic Embedding**: No manual embedding generation required
- **Smart Reranking**: Improves relevance of retrieved memories
- **Multilingual Support**: Works across multiple languages
- **Serverless Architecture**: Automatically scales based on usage
- **Real-time Learning**: Each conversation improves future responses
## π Benefits of Pinecone Integrated Inference
### Traditional Approach vs Pinecone Integrated
- β **Traditional**: Manage separate embedding service + vector DB + reranking
- β
**Pinecone Integrated**: Single API for embedding, storage, search, and reranking
### Performance Improvements
- π **60% better search accuracy** with integrated reranking
- β‘ **Lower latency** with co-located inference and storage
- π° **Cost efficient** with serverless scaling
- π **More secure** with private networking (no cross-service calls)
## π― Use Cases Perfect for This System
1. **Customer Support**: AI that remembers previous interactions
2. **Personal Assistant**: Learning user preferences over time
3. **Knowledge Management**: Building institutional memory
4. **Content Recommendation**: Improving suggestions based on history
5. **Research Assistant**: Connecting related information across conversations
## π§ Technical Architecture
```mermaid
graph TD
A[User Input] --> B[Pinecone Inference API]
B --> C[multilingual-e5-large Embedding]
C --> D[Vector Search in Pinecone]
D --> E[pinecone-rerank-v0 Reranking]
E --> F[OpenRouter LLM]
F --> G[AI Response]
G --> H[Auto-embed & Store]
H --> D
```
## π Getting Started
1. **Clone/Deploy** this HuggingFace Space
2. **Set Environment Variables** in Space settings
3. **Start Chatting** - the system will auto-create everything needed!
The AI will automatically:
- Create a new Pinecone index with integrated inference
- Generate embeddings for all conversations
- Build a memory of interactions over time
- Provide increasingly contextual responses
## π Monitoring & Analytics
The interface provides real-time monitoring of:
- Connection status to Pinecone and OpenRouter
- Number of stored experiences
- Embedding and reranking model information
- Retrieved context for each response
## π Privacy & Security
- Conversations stored in your personal Pinecone database
- Integrated inference runs on Pinecone's secure infrastructure
- No cross-network communication between services
- Full control over your data and models
## π Learn More
- [Pinecone Integrated Inference](https://docs.pinecone.io/guides/inference/understanding-inference)
- [OpenRouter API](https://openrouter.ai/docs)
- [RAG Best Practices](https://www.pinecone.io/learn/retrieval-augmented-generation/)
## π· License
MIT License - feel free to modify and use for your projects!
---
*Powered by Pinecone's state-of-the-art integrated inference and vector database technology* π
|