AI.With.Experiences / README.md
ReallyFloppyPenguin's picture
Update README.md
995b83d verified

A newer version of the Gradio SDK is available: 5.35.0

Upgrade
metadata
title: AI with Pinecone Integrated Inference RAG
emoji: πŸ€–
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: mit

AI Assistant with Pinecone Integrated Inference RAG

This Gradio application creates an AI assistant that uses Pinecone's integrated inference capabilities for embeddings and reranking, combined with vector storage for persistent memory.

πŸš€ Features

  • Pinecone Integrated Inference: Uses Pinecone's hosted embedding and reranking models
  • Advanced Memory System: Stores conversation experiences as vectors with automatic embedding
  • Smart Reranking: Improves search relevance using Pinecone's reranking models
  • OpenRouter Integration: Supports multiple AI models through OpenRouter API
  • Learning Over Time: The AI gains experience and provides more contextual responses
  • Real-time Context Display: Shows retrieved and reranked experiences

🧠 AI Models Used

Embedding Model

  • multilingual-e5-large: 1024-dimensional multilingual embeddings
    • Excellent for semantic search across languages
    • Optimized for both passage and query embeddings
    • Automatically handles text-to-vector conversion

Reranking Model

  • pinecone-rerank-v0: Pinecone's state-of-the-art reranking model
    • Up to 60% improvement in search accuracy
    • Reorders results by relevance before sending to LLM
    • Reduces token waste and improves response quality

Alternative Models Available

  • cohere-rerank-v3.5: Cohere's leading reranking model
  • pinecone-sparse-english-v0: Sparse embeddings for keyword-based search
  • bge-reranker-v2-m3: Open-source multilingual reranking

πŸ›  Setup

Required Environment Variables

  1. PINECONE_API_KEY: Your Pinecone API key (hardcoded in this demo)

  2. OPENROUTER_API_KEY: Your OpenRouter API key

Optional Environment Variables

  1. PINECONE_EMBEDDING_MODEL: Embedding model name (default: "multilingual-e5-large")
  2. PINECONE_RERANK_MODEL: Reranking model name (default: "pinecone-rerank-v0")
  3. MODEL_NAME: OpenRouter model name (default: "anthropic/claude-3-haiku")

πŸ”„ How It Works

1. Integrated Inference Pipeline

  1. User Input β†’ Pinecone automatically converts text to embeddings
  2. Vector Search β†’ Retrieves relevant past conversations from vector database
  3. Reranking β†’ Pinecone reranks results by relevance to query
  4. Context Building β†’ Formats reranked experiences for AI
  5. AI Response β†’ OpenRouter generates response with retrieved context
  6. Memory Storage β†’ New conversation automatically embedded and stored

2. Advanced Features

  • Automatic Embedding: No manual embedding generation required
  • Smart Reranking: Improves relevance of retrieved memories
  • Multilingual Support: Works across multiple languages
  • Serverless Architecture: Automatically scales based on usage
  • Real-time Learning: Each conversation improves future responses

πŸ“Š Benefits of Pinecone Integrated Inference

Traditional Approach vs Pinecone Integrated

  • ❌ Traditional: Manage separate embedding service + vector DB + reranking
  • βœ… Pinecone Integrated: Single API for embedding, storage, search, and reranking

Performance Improvements

  • πŸš€ 60% better search accuracy with integrated reranking
  • ⚑ Lower latency with co-located inference and storage
  • πŸ’° Cost efficient with serverless scaling
  • πŸ”’ More secure with private networking (no cross-service calls)

🎯 Use Cases Perfect for This System

  1. Customer Support: AI that remembers previous interactions
  2. Personal Assistant: Learning user preferences over time
  3. Knowledge Management: Building institutional memory
  4. Content Recommendation: Improving suggestions based on history
  5. Research Assistant: Connecting related information across conversations

πŸ”§ Technical Architecture

graph TD
    A[User Input] --> B[Pinecone Inference API]
    B --> C[multilingual-e5-large Embedding]
    C --> D[Vector Search in Pinecone]
    D --> E[pinecone-rerank-v0 Reranking]
    E --> F[OpenRouter LLM]
    F --> G[AI Response]
    G --> H[Auto-embed & Store]
    H --> D

πŸš€ Getting Started

  1. Clone/Deploy this HuggingFace Space
  2. Set Environment Variables in Space settings
  3. Start Chatting - the system will auto-create everything needed!

The AI will automatically:

  • Create a new Pinecone index with integrated inference
  • Generate embeddings for all conversations
  • Build a memory of interactions over time
  • Provide increasingly contextual responses

πŸ“ˆ Monitoring & Analytics

The interface provides real-time monitoring of:

  • Connection status to Pinecone and OpenRouter
  • Number of stored experiences
  • Embedding and reranking model information
  • Retrieved context for each response

πŸ” Privacy & Security

  • Conversations stored in your personal Pinecone database
  • Integrated inference runs on Pinecone's secure infrastructure
  • No cross-network communication between services
  • Full control over your data and models

πŸ“š Learn More

🏷 License

MIT License - feel free to modify and use for your projects!


Powered by Pinecone's state-of-the-art integrated inference and vector database technology πŸš€