AI.With.Experiences / README.md
ReallyFloppyPenguin's picture
Update README.md
995b83d verified
---
title: AI with Pinecone Integrated Inference RAG
emoji: πŸ€–
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: mit
---
# AI Assistant with Pinecone Integrated Inference RAG
This Gradio application creates an AI assistant that uses Pinecone's integrated inference capabilities for embeddings and reranking, combined with vector storage for persistent memory.
## πŸš€ Features
- **Pinecone Integrated Inference**: Uses Pinecone's hosted embedding and reranking models
- **Advanced Memory System**: Stores conversation experiences as vectors with automatic embedding
- **Smart Reranking**: Improves search relevance using Pinecone's reranking models
- **OpenRouter Integration**: Supports multiple AI models through OpenRouter API
- **Learning Over Time**: The AI gains experience and provides more contextual responses
- **Real-time Context Display**: Shows retrieved and reranked experiences
## 🧠 AI Models Used
### Embedding Model
- **multilingual-e5-large**: 1024-dimensional multilingual embeddings
- Excellent for semantic search across languages
- Optimized for both passage and query embeddings
- Automatically handles text-to-vector conversion
### Reranking Model
- **pinecone-rerank-v0**: Pinecone's state-of-the-art reranking model
- Up to 60% improvement in search accuracy
- Reorders results by relevance before sending to LLM
- Reduces token waste and improves response quality
### Alternative Models Available
- **cohere-rerank-v3.5**: Cohere's leading reranking model
- **pinecone-sparse-english-v0**: Sparse embeddings for keyword-based search
- **bge-reranker-v2-m3**: Open-source multilingual reranking
## πŸ›  Setup
### Required Environment Variables
1. **PINECONE_API_KEY**: Your Pinecone API key (hardcoded in this demo)
- Get it from: https://www.pinecone.io/
2. **OPENROUTER_API_KEY**: Your OpenRouter API key
- Get it from: https://openrouter.ai/
### Optional Environment Variables
3. **PINECONE_EMBEDDING_MODEL**: Embedding model name (default: "multilingual-e5-large")
4. **PINECONE_RERANK_MODEL**: Reranking model name (default: "pinecone-rerank-v0")
5. **MODEL_NAME**: OpenRouter model name (default: "anthropic/claude-3-haiku")
## πŸ”„ How It Works
### 1. Integrated Inference Pipeline
1. **User Input** β†’ Pinecone automatically converts text to embeddings
2. **Vector Search** β†’ Retrieves relevant past conversations from vector database
3. **Reranking** β†’ Pinecone reranks results by relevance to query
4. **Context Building** β†’ Formats reranked experiences for AI
5. **AI Response** β†’ OpenRouter generates response with retrieved context
6. **Memory Storage** β†’ New conversation automatically embedded and stored
### 2. Advanced Features
- **Automatic Embedding**: No manual embedding generation required
- **Smart Reranking**: Improves relevance of retrieved memories
- **Multilingual Support**: Works across multiple languages
- **Serverless Architecture**: Automatically scales based on usage
- **Real-time Learning**: Each conversation improves future responses
## πŸ“Š Benefits of Pinecone Integrated Inference
### Traditional Approach vs Pinecone Integrated
- ❌ **Traditional**: Manage separate embedding service + vector DB + reranking
- βœ… **Pinecone Integrated**: Single API for embedding, storage, search, and reranking
### Performance Improvements
- πŸš€ **60% better search accuracy** with integrated reranking
- ⚑ **Lower latency** with co-located inference and storage
- πŸ’° **Cost efficient** with serverless scaling
- πŸ”’ **More secure** with private networking (no cross-service calls)
## 🎯 Use Cases Perfect for This System
1. **Customer Support**: AI that remembers previous interactions
2. **Personal Assistant**: Learning user preferences over time
3. **Knowledge Management**: Building institutional memory
4. **Content Recommendation**: Improving suggestions based on history
5. **Research Assistant**: Connecting related information across conversations
## πŸ”§ Technical Architecture
```mermaid
graph TD
A[User Input] --> B[Pinecone Inference API]
B --> C[multilingual-e5-large Embedding]
C --> D[Vector Search in Pinecone]
D --> E[pinecone-rerank-v0 Reranking]
E --> F[OpenRouter LLM]
F --> G[AI Response]
G --> H[Auto-embed & Store]
H --> D
```
## πŸš€ Getting Started
1. **Clone/Deploy** this HuggingFace Space
2. **Set Environment Variables** in Space settings
3. **Start Chatting** - the system will auto-create everything needed!
The AI will automatically:
- Create a new Pinecone index with integrated inference
- Generate embeddings for all conversations
- Build a memory of interactions over time
- Provide increasingly contextual responses
## πŸ“ˆ Monitoring & Analytics
The interface provides real-time monitoring of:
- Connection status to Pinecone and OpenRouter
- Number of stored experiences
- Embedding and reranking model information
- Retrieved context for each response
## πŸ” Privacy & Security
- Conversations stored in your personal Pinecone database
- Integrated inference runs on Pinecone's secure infrastructure
- No cross-network communication between services
- Full control over your data and models
## πŸ“š Learn More
- [Pinecone Integrated Inference](https://docs.pinecone.io/guides/inference/understanding-inference)
- [OpenRouter API](https://openrouter.ai/docs)
- [RAG Best Practices](https://www.pinecone.io/learn/retrieval-augmented-generation/)
## 🏷 License
MIT License - feel free to modify and use for your projects!
---
*Powered by Pinecone's state-of-the-art integrated inference and vector database technology* πŸš€