metadata

title: AI with Pinecone Integrated Inference RAG
emoji: 🤖
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: mit

AI Assistant with Pinecone Integrated Inference RAG

This Gradio application creates an AI assistant that uses Pinecone's integrated inference capabilities for embeddings and reranking, combined with vector storage for persistent memory.

🚀 Features

Pinecone Integrated Inference: Uses Pinecone's hosted embedding and reranking models
Advanced Memory System: Stores conversation experiences as vectors with automatic embedding
Smart Reranking: Improves search relevance using Pinecone's reranking models
OpenRouter Integration: Supports multiple AI models through OpenRouter API
Learning Over Time: The AI gains experience and provides more contextual responses
Real-time Context Display: Shows retrieved and reranked experiences

🧠 AI Models Used

Embedding Model

multilingual-e5-large: 1024-dimensional multilingual embeddings
- Excellent for semantic search across languages
- Optimized for both passage and query embeddings
- Automatically handles text-to-vector conversion

Reranking Model

pinecone-rerank-v0: Pinecone's state-of-the-art reranking model
- Up to 60% improvement in search accuracy
- Reorders results by relevance before sending to LLM
- Reduces token waste and improves response quality

Alternative Models Available

cohere-rerank-v3.5: Cohere's leading reranking model
pinecone-sparse-english-v0: Sparse embeddings for keyword-based search
bge-reranker-v2-m3: Open-source multilingual reranking

🛠 Setup

Required Environment Variables

PINECONE_API_KEY: Your Pinecone API key (hardcoded in this demo)
- Get it from: https://www.pinecone.io/
OPENROUTER_API_KEY: Your OpenRouter API key
- Get it from: https://openrouter.ai/

Optional Environment Variables

PINECONE_EMBEDDING_MODEL: Embedding model name (default: "multilingual-e5-large")
PINECONE_RERANK_MODEL: Reranking model name (default: "pinecone-rerank-v0")
MODEL_NAME: OpenRouter model name (default: "anthropic/claude-3-haiku")

🔄 How It Works

1. Integrated Inference Pipeline

User Input → Pinecone automatically converts text to embeddings
Vector Search → Retrieves relevant past conversations from vector database
Reranking → Pinecone reranks results by relevance to query
Context Building → Formats reranked experiences for AI
AI Response → OpenRouter generates response with retrieved context
Memory Storage → New conversation automatically embedded and stored

2. Advanced Features

Automatic Embedding: No manual embedding generation required
Smart Reranking: Improves relevance of retrieved memories
Multilingual Support: Works across multiple languages
Serverless Architecture: Automatically scales based on usage
Real-time Learning: Each conversation improves future responses

📊 Benefits of Pinecone Integrated Inference

Traditional Approach vs Pinecone Integrated

❌ Traditional: Manage separate embedding service + vector DB + reranking
✅ Pinecone Integrated: Single API for embedding, storage, search, and reranking

Performance Improvements

🚀 60% better search accuracy with integrated reranking
⚡ Lower latency with co-located inference and storage
💰 Cost efficient with serverless scaling
🔒 More secure with private networking (no cross-service calls)

🎯 Use Cases Perfect for This System

Customer Support: AI that remembers previous interactions
Personal Assistant: Learning user preferences over time
Knowledge Management: Building institutional memory
Content Recommendation: Improving suggestions based on history
Research Assistant: Connecting related information across conversations

🔧 Technical Architecture

graph TD
    A[User Input] --> B[Pinecone Inference API]
    B --> C[multilingual-e5-large Embedding]
    C --> D[Vector Search in Pinecone]
    D --> E[pinecone-rerank-v0 Reranking]
    E --> F[OpenRouter LLM]
    F --> G[AI Response]
    G --> H[Auto-embed & Store]
    H --> D

🚀 Getting Started

Clone/Deploy this HuggingFace Space
Set Environment Variables in Space settings
Start Chatting - the system will auto-create everything needed!

The AI will automatically:

Create a new Pinecone index with integrated inference
Generate embeddings for all conversations
Build a memory of interactions over time
Provide increasingly contextual responses

📈 Monitoring & Analytics

The interface provides real-time monitoring of:

Connection status to Pinecone and OpenRouter
Number of stored experiences
Embedding and reranking model information
Retrieved context for each response

🔐 Privacy & Security

Conversations stored in your personal Pinecone database
Integrated inference runs on Pinecone's secure infrastructure
No cross-network communication between services
Full control over your data and models

📚 Learn More

🏷 License

MIT License - feel free to modify and use for your projects!

Powered by Pinecone's state-of-the-art integrated inference and vector database technology 🚀