fazeel007's picture
initial commit
7c012de

Knowledge Base Browser

A powerful Gradio custom component for retrieval-augmented generation (RAG) applications. This component provides an intuitive interface for searching through documents with AI-powered semantic search capabilities.

Features

  • Semantic Search: AI-powered meaning-based search using OpenAI embeddings
  • Keyword Search: Traditional text matching for precise queries
  • Hybrid Search: Combines semantic and keyword approaches
  • Source Type Filtering: Filter by PDF, web pages, academic papers, or code repositories
  • Citation Tracking: Built-in citation management and export functionality
  • Agent Integration: Designed for both human users and AI agents
  • LlamaIndex Integration: Uses LlamaIndex with FAISS vector store for efficient retrieval
  • Responsive UI: Modern, accessible interface with expandable result cards

Installation

pip install gradio_kb_browser

For development installation:

git clone <repository-url>
cd kb_browser
pip install -e .

Quick Start

Basic Usage

import gradio as gr
from kb_browser import KnowledgeBrowser

# Create the component
kb_browser = KnowledgeBrowser(
    index_path="./documents",  # Path to your document directory
    search_type="semantic",    # Default search type
    max_results=10            # Maximum results to return
)

# Use in a Gradio interface
with gr.Blocks() as demo:
    gr.Markdown("# Document Search")
    
    query = gr.Textbox(label="Search Query")
    search_btn = gr.Button("Search")
    
    results = gr.JSON(label="Results")
    
    def search_documents(query_text):
        return kb_browser.search(query_text)
    
    search_btn.click(
        fn=search_documents,
        inputs=query,
        outputs=results
    )

demo.launch()

Agent Integration

from kb_browser import KnowledgeBrowser

# Initialize component for agent use
kb_browser = KnowledgeBrowser()

# Agent can search and get structured results
def agent_research(question):
    results = kb_browser.search(
        query=question,
        search_type="semantic",
        max_results=5
    )
    
    # Process results for agent response
    citations = []
    for doc in results["results"]:
        citations.append({
            "title": doc["title"],
            "source": doc["source"],
            "relevance": doc["relevance_score"],
            "snippet": doc["snippet"]
        })
    
    return citations

Configuration

Environment Variables

Set your OpenAI API key for semantic search:

export OPENAI_API_KEY="your-api-key-here"

Component Parameters

  • query: Initial search query string
  • results: Pre-loaded search results
  • index_path: Path to document directory (default: "./data")
  • search_type: Search method - "semantic", "keyword", or "hybrid"
  • max_results: Maximum number of results to return
  • label: Component label for UI
  • visible: Whether component is visible
  • elem_classes: CSS classes for styling

Document Formats

The component supports various document formats:

  • PDF Files: Automatically parsed and indexed
  • Text Files: Plain text documents
  • Markdown: Documentation and notes
  • JSON: Structured data documents

Search Types

Semantic Search

Uses OpenAI embeddings to understand meaning and context. Best for:

  • Conceptual queries
  • Finding related topics
  • Cross-domain searches

Keyword Search

Traditional text matching. Best for:

  • Exact phrase searches
  • Technical terms
  • Specific names or identifiers

Hybrid Search

Combines both approaches for comprehensive results.

API Reference

KnowledgeBrowser Class

Methods

  • search(query, search_type, max_results): Perform search and return results
  • preprocess(payload): Preprocess component input
  • postprocess(value): Postprocess component output
  • api_info(): Get API schema information

Events

  • submit: Triggered when search is performed
  • select: Triggered when document is selected
  • change: Triggered when component state changes

Example Applications

Research Assistant

import gradio as gr
from kb_browser import KnowledgeBrowser

def create_research_app():
    kb_browser = KnowledgeBrowser(index_path="./research_papers")
    
    with gr.Blocks() as app:
        gr.Markdown("# Research Assistant")
        
        question = gr.Textbox(label="Research Question")
        search_btn = gr.Button("Search Literature")
        
        results_display = gr.HTML()
        citations = gr.State([])
        
        def research_query(question_text):
            results = kb_browser.search(question_text, max_results=5)
            
            html = "<div class='research-results'>"
            for doc in results["results"]:
                html += f"""
                <div class='paper'>
                    <h3>{doc['title']}</h3>
                    <p><strong>Source:</strong> {doc['source']}</p>
                    <p><strong>Relevance:</strong> {doc['relevance_score']:.0%}</p>
                    <p>{doc['snippet']}</p>
                </div>
                """
            html += "</div>"
            
            return html
        
        search_btn.click(research_query, question, results_display)
    
    return app

Customer Support

def create_support_app():
    kb_browser = KnowledgeBrowser(index_path="./support_docs")
    
    with gr.Blocks() as app:
        gr.Markdown("# Customer Support Assistant")
        
        issue = gr.Textbox(label="Describe your issue")
        help_btn = gr.Button("Find Solutions")
        
        solutions = gr.HTML()
        
        def find_solutions(issue_text):
            results = kb_browser.search(issue_text, search_type="hybrid")
            
            html = "<div class='solutions'>"
            for doc in results["results"][:3]:
                html += f"""
                <div class='solution'>
                    <h4>{doc['title']}</h4>
                    <p>{doc['snippet']}</p>
                    <a href="{doc.get('url', '#')}" target="_blank">View Full Article</a>
                </div>
                """
            html += "</div>"
            
            return html
        
        help_btn.click(find_solutions, issue, solutions)
    
    return app

Development

Running Tests

pip install pytest
pytest test_kb_browser.py -v

Building the Component

pip install build
python -m build

Publishing

gradio cc publish kb_browser --name "KnowledgeBaseBrowser"

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests for new functionality
  5. Submit a pull request

License

MIT License - see LICENSE file for details.

Support

For issues and questions: