Spaces:

ReallyFloppyPenguin
/

AI.With.Experiences

Sleeping

App Files Files Community

ReallyFloppyPenguin commited on Jun 13

Commit

995b83d

verified ·

1 Parent(s): 8324a62

Update README.md

Browse files

Files changed (1) hide show

README.md +145 -6

README.md CHANGED Viewed

@@ -1,12 +1,151 @@
 ---
-title: AI.With.Experiences
-emoji: 🐠
-colorFrom: gray
-colorTo: pink
 sdk: gradio
-sdk_version: 5.33.2
 app_file: app.py
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: AI with Pinecone Integrated Inference RAG
+emoji: 🤖
+colorFrom: blue
+colorTo: purple
 sdk: gradio
+sdk_version: 4.44.0
 app_file: app.py
 pinned: false
+license: mit
 ---
+# AI Assistant with Pinecone Integrated Inference RAG
+This Gradio application creates an AI assistant that uses Pinecone's integrated inference capabilities for embeddings and reranking, combined with vector storage for persistent memory.
+## 🚀 Features
+- **Pinecone Integrated Inference**: Uses Pinecone's hosted embedding and reranking models
+- **Advanced Memory System**: Stores conversation experiences as vectors with automatic embedding
+- **Smart Reranking**: Improves search relevance using Pinecone's reranking models
+- **OpenRouter Integration**: Supports multiple AI models through OpenRouter API
+- **Learning Over Time**: The AI gains experience and provides more contextual responses
+- **Real-time Context Display**: Shows retrieved and reranked experiences
+## 🧠 AI Models Used
+### Embedding Model
+- **multilingual-e5-large**: 1024-dimensional multilingual embeddings
+  - Excellent for semantic search across languages
+  - Optimized for both passage and query embeddings
+  - Automatically handles text-to-vector conversion
+### Reranking Model
+- **pinecone-rerank-v0**: Pinecone's state-of-the-art reranking model
+  - Up to 60% improvement in search accuracy
+  - Reorders results by relevance before sending to LLM
+  - Reduces token waste and improves response quality
+### Alternative Models Available
+- **cohere-rerank-v3.5**: Cohere's leading reranking model
+- **pinecone-sparse-english-v0**: Sparse embeddings for keyword-based search
+- **bge-reranker-v2-m3**: Open-source multilingual reranking
+## 🛠 Setup
+### Required Environment Variables
+1. **PINECONE_API_KEY**: Your Pinecone API key (hardcoded in this demo)
+   - Get it from: https://www.pinecone.io/
+2. **OPENROUTER_API_KEY**: Your OpenRouter API key
+   - Get it from: https://openrouter.ai/
+### Optional Environment Variables
+3. **PINECONE_EMBEDDING_MODEL**: Embedding model name (default: "multilingual-e5-large")
+4. **PINECONE_RERANK_MODEL**: Reranking model name (default: "pinecone-rerank-v0")
+5. **MODEL_NAME**: OpenRouter model name (default: "anthropic/claude-3-haiku")
+## 🔄 How It Works
+### 1. Integrated Inference Pipeline
+1. **User Input** → Pinecone automatically converts text to embeddings
+2. **Vector Search** → Retrieves relevant past conversations from vector database
+3. **Reranking** → Pinecone reranks results by relevance to query
+4. **Context Building** → Formats reranked experiences for AI
+5. **AI Response** → OpenRouter generates response with retrieved context
+6. **Memory Storage** → New conversation automatically embedded and stored
+### 2. Advanced Features
+- **Automatic Embedding**: No manual embedding generation required
+- **Smart Reranking**: Improves relevance of retrieved memories
+- **Multilingual Support**: Works across multiple languages
+- **Serverless Architecture**: Automatically scales based on usage
+- **Real-time Learning**: Each conversation improves future responses
+## 📊 Benefits of Pinecone Integrated Inference
+### Traditional Approach vs Pinecone Integrated
+- ❌ **Traditional**: Manage separate embedding service + vector DB + reranking
+- ✅ **Pinecone Integrated**: Single API for embedding, storage, search, and reranking
+### Performance Improvements
+- 🚀 **60% better search accuracy** with integrated reranking
+- ⚡ **Lower latency** with co-located inference and storage
+- 💰 **Cost efficient** with serverless scaling
+- 🔒 **More secure** with private networking (no cross-service calls)
+## 🎯 Use Cases Perfect for This System
+1. **Customer Support**: AI that remembers previous interactions
+2. **Personal Assistant**: Learning user preferences over time
+3. **Knowledge Management**: Building institutional memory
+4. **Content Recommendation**: Improving suggestions based on history
+5. **Research Assistant**: Connecting related information across conversations
+## 🔧 Technical Architecture
+```mermaid
+graph TD
+    A[User Input] --> B[Pinecone Inference API]
+    B --> C[multilingual-e5-large Embedding]
+    C --> D[Vector Search in Pinecone]
+    D --> E[pinecone-rerank-v0 Reranking]
+    E --> F[OpenRouter LLM]
+    F --> G[AI Response]
+    G --> H[Auto-embed & Store]
+    H --> D
+```
+## 🚀 Getting Started
+1. **Clone/Deploy** this HuggingFace Space
+2. **Set Environment Variables** in Space settings
+3. **Start Chatting** - the system will auto-create everything needed!
+The AI will automatically:
+- Create a new Pinecone index with integrated inference
+- Generate embeddings for all conversations
+- Build a memory of interactions over time
+- Provide increasingly contextual responses
+## 📈 Monitoring & Analytics
+The interface provides real-time monitoring of:
+- Connection status to Pinecone and OpenRouter
+- Number of stored experiences
+- Embedding and reranking model information
+- Retrieved context for each response
+## 🔐 Privacy & Security
+- Conversations stored in your personal Pinecone database
+- Integrated inference runs on Pinecone's secure infrastructure
+- No cross-network communication between services
+- Full control over your data and models
+## 📚 Learn More
+- [Pinecone Integrated Inference](https://docs.pinecone.io/guides/inference/understanding-inference)
+- [OpenRouter API](https://openrouter.ai/docs)
+- [RAG Best Practices](https://www.pinecone.io/learn/retrieval-augmented-generation/)
+## 🏷 License
+MIT License - feel free to modify and use for your projects!
+---
+*Powered by Pinecone's state-of-the-art integrated inference and vector database technology* 🚀