|
--- |
|
title: AI with Pinecone Integrated Inference RAG |
|
emoji: π€ |
|
colorFrom: blue |
|
colorTo: purple |
|
sdk: gradio |
|
sdk_version: 4.44.0 |
|
app_file: app.py |
|
pinned: false |
|
license: mit |
|
--- |
|
|
|
# AI Assistant with Pinecone Integrated Inference RAG |
|
|
|
This Gradio application creates an AI assistant that uses Pinecone's integrated inference capabilities for embeddings and reranking, combined with vector storage for persistent memory. |
|
|
|
## π Features |
|
|
|
- **Pinecone Integrated Inference**: Uses Pinecone's hosted embedding and reranking models |
|
- **Advanced Memory System**: Stores conversation experiences as vectors with automatic embedding |
|
- **Smart Reranking**: Improves search relevance using Pinecone's reranking models |
|
- **OpenRouter Integration**: Supports multiple AI models through OpenRouter API |
|
- **Learning Over Time**: The AI gains experience and provides more contextual responses |
|
- **Real-time Context Display**: Shows retrieved and reranked experiences |
|
|
|
## π§ AI Models Used |
|
|
|
### Embedding Model |
|
- **multilingual-e5-large**: 1024-dimensional multilingual embeddings |
|
- Excellent for semantic search across languages |
|
- Optimized for both passage and query embeddings |
|
- Automatically handles text-to-vector conversion |
|
|
|
### Reranking Model |
|
- **pinecone-rerank-v0**: Pinecone's state-of-the-art reranking model |
|
- Up to 60% improvement in search accuracy |
|
- Reorders results by relevance before sending to LLM |
|
- Reduces token waste and improves response quality |
|
|
|
### Alternative Models Available |
|
- **cohere-rerank-v3.5**: Cohere's leading reranking model |
|
- **pinecone-sparse-english-v0**: Sparse embeddings for keyword-based search |
|
- **bge-reranker-v2-m3**: Open-source multilingual reranking |
|
|
|
## π Setup |
|
|
|
### Required Environment Variables |
|
|
|
1. **PINECONE_API_KEY**: Your Pinecone API key (hardcoded in this demo) |
|
- Get it from: https://www.pinecone.io/ |
|
|
|
2. **OPENROUTER_API_KEY**: Your OpenRouter API key |
|
- Get it from: https://openrouter.ai/ |
|
|
|
### Optional Environment Variables |
|
|
|
3. **PINECONE_EMBEDDING_MODEL**: Embedding model name (default: "multilingual-e5-large") |
|
4. **PINECONE_RERANK_MODEL**: Reranking model name (default: "pinecone-rerank-v0") |
|
5. **MODEL_NAME**: OpenRouter model name (default: "anthropic/claude-3-haiku") |
|
|
|
## π How It Works |
|
|
|
### 1. Integrated Inference Pipeline |
|
1. **User Input** β Pinecone automatically converts text to embeddings |
|
2. **Vector Search** β Retrieves relevant past conversations from vector database |
|
3. **Reranking** β Pinecone reranks results by relevance to query |
|
4. **Context Building** β Formats reranked experiences for AI |
|
5. **AI Response** β OpenRouter generates response with retrieved context |
|
6. **Memory Storage** β New conversation automatically embedded and stored |
|
|
|
### 2. Advanced Features |
|
- **Automatic Embedding**: No manual embedding generation required |
|
- **Smart Reranking**: Improves relevance of retrieved memories |
|
- **Multilingual Support**: Works across multiple languages |
|
- **Serverless Architecture**: Automatically scales based on usage |
|
- **Real-time Learning**: Each conversation improves future responses |
|
|
|
## π Benefits of Pinecone Integrated Inference |
|
|
|
### Traditional Approach vs Pinecone Integrated |
|
- β **Traditional**: Manage separate embedding service + vector DB + reranking |
|
- β
**Pinecone Integrated**: Single API for embedding, storage, search, and reranking |
|
|
|
### Performance Improvements |
|
- π **60% better search accuracy** with integrated reranking |
|
- β‘ **Lower latency** with co-located inference and storage |
|
- π° **Cost efficient** with serverless scaling |
|
- π **More secure** with private networking (no cross-service calls) |
|
|
|
## π― Use Cases Perfect for This System |
|
|
|
1. **Customer Support**: AI that remembers previous interactions |
|
2. **Personal Assistant**: Learning user preferences over time |
|
3. **Knowledge Management**: Building institutional memory |
|
4. **Content Recommendation**: Improving suggestions based on history |
|
5. **Research Assistant**: Connecting related information across conversations |
|
|
|
## π§ Technical Architecture |
|
|
|
```mermaid |
|
graph TD |
|
A[User Input] --> B[Pinecone Inference API] |
|
B --> C[multilingual-e5-large Embedding] |
|
C --> D[Vector Search in Pinecone] |
|
D --> E[pinecone-rerank-v0 Reranking] |
|
E --> F[OpenRouter LLM] |
|
F --> G[AI Response] |
|
G --> H[Auto-embed & Store] |
|
H --> D |
|
``` |
|
|
|
## π Getting Started |
|
|
|
1. **Clone/Deploy** this HuggingFace Space |
|
2. **Set Environment Variables** in Space settings |
|
3. **Start Chatting** - the system will auto-create everything needed! |
|
|
|
The AI will automatically: |
|
- Create a new Pinecone index with integrated inference |
|
- Generate embeddings for all conversations |
|
- Build a memory of interactions over time |
|
- Provide increasingly contextual responses |
|
|
|
## π Monitoring & Analytics |
|
|
|
The interface provides real-time monitoring of: |
|
- Connection status to Pinecone and OpenRouter |
|
- Number of stored experiences |
|
- Embedding and reranking model information |
|
- Retrieved context for each response |
|
|
|
## π Privacy & Security |
|
|
|
- Conversations stored in your personal Pinecone database |
|
- Integrated inference runs on Pinecone's secure infrastructure |
|
- No cross-network communication between services |
|
- Full control over your data and models |
|
|
|
## π Learn More |
|
|
|
- [Pinecone Integrated Inference](https://docs.pinecone.io/guides/inference/understanding-inference) |
|
- [OpenRouter API](https://openrouter.ai/docs) |
|
- [RAG Best Practices](https://www.pinecone.io/learn/retrieval-augmented-generation/) |
|
|
|
## π· License |
|
|
|
MIT License - feel free to modify and use for your projects! |
|
|
|
--- |
|
|
|
*Powered by Pinecone's state-of-the-art integrated inference and vector database technology* π |
|
|