# Knowledge Base Browser A powerful Gradio custom component for retrieval-augmented generation (RAG) applications. This component provides an intuitive interface for searching through documents with AI-powered semantic search capabilities. ## Features - **Semantic Search**: AI-powered meaning-based search using OpenAI embeddings - **Keyword Search**: Traditional text matching for precise queries - **Hybrid Search**: Combines semantic and keyword approaches - **Source Type Filtering**: Filter by PDF, web pages, academic papers, or code repositories - **Citation Tracking**: Built-in citation management and export functionality - **Agent Integration**: Designed for both human users and AI agents - **LlamaIndex Integration**: Uses LlamaIndex with FAISS vector store for efficient retrieval - **Responsive UI**: Modern, accessible interface with expandable result cards ## Installation ```bash pip install gradio_kb_browser ``` For development installation: ```bash git clone cd kb_browser pip install -e . ``` ## Quick Start ### Basic Usage ```python import gradio as gr from kb_browser import KnowledgeBrowser # Create the component kb_browser = KnowledgeBrowser( index_path="./documents", # Path to your document directory search_type="semantic", # Default search type max_results=10 # Maximum results to return ) # Use in a Gradio interface with gr.Blocks() as demo: gr.Markdown("# Document Search") query = gr.Textbox(label="Search Query") search_btn = gr.Button("Search") results = gr.JSON(label="Results") def search_documents(query_text): return kb_browser.search(query_text) search_btn.click( fn=search_documents, inputs=query, outputs=results ) demo.launch() ``` ### Agent Integration ```python from kb_browser import KnowledgeBrowser # Initialize component for agent use kb_browser = KnowledgeBrowser() # Agent can search and get structured results def agent_research(question): results = kb_browser.search( query=question, search_type="semantic", max_results=5 ) # Process results for agent response citations = [] for doc in results["results"]: citations.append({ "title": doc["title"], "source": doc["source"], "relevance": doc["relevance_score"], "snippet": doc["snippet"] }) return citations ``` ## Configuration ### Environment Variables Set your OpenAI API key for semantic search: ```bash export OPENAI_API_KEY="your-api-key-here" ``` ### Component Parameters - `query`: Initial search query string - `results`: Pre-loaded search results - `index_path`: Path to document directory (default: "./data") - `search_type`: Search method - "semantic", "keyword", or "hybrid" - `max_results`: Maximum number of results to return - `label`: Component label for UI - `visible`: Whether component is visible - `elem_classes`: CSS classes for styling ## Document Formats The component supports various document formats: - **PDF Files**: Automatically parsed and indexed - **Text Files**: Plain text documents - **Markdown**: Documentation and notes - **JSON**: Structured data documents ## Search Types ### Semantic Search Uses OpenAI embeddings to understand meaning and context. Best for: - Conceptual queries - Finding related topics - Cross-domain searches ### Keyword Search Traditional text matching. Best for: - Exact phrase searches - Technical terms - Specific names or identifiers ### Hybrid Search Combines both approaches for comprehensive results. ## API Reference ### KnowledgeBrowser Class #### Methods - `search(query, search_type, max_results)`: Perform search and return results - `preprocess(payload)`: Preprocess component input - `postprocess(value)`: Postprocess component output - `api_info()`: Get API schema information #### Events - `submit`: Triggered when search is performed - `select`: Triggered when document is selected - `change`: Triggered when component state changes ## Example Applications ### Research Assistant ```python import gradio as gr from kb_browser import KnowledgeBrowser def create_research_app(): kb_browser = KnowledgeBrowser(index_path="./research_papers") with gr.Blocks() as app: gr.Markdown("# Research Assistant") question = gr.Textbox(label="Research Question") search_btn = gr.Button("Search Literature") results_display = gr.HTML() citations = gr.State([]) def research_query(question_text): results = kb_browser.search(question_text, max_results=5) html = "
" for doc in results["results"]: html += f"""

{doc['title']}

Source: {doc['source']}

Relevance: {doc['relevance_score']:.0%}

{doc['snippet']}

""" html += "
" return html search_btn.click(research_query, question, results_display) return app ``` ### Customer Support ```python def create_support_app(): kb_browser = KnowledgeBrowser(index_path="./support_docs") with gr.Blocks() as app: gr.Markdown("# Customer Support Assistant") issue = gr.Textbox(label="Describe your issue") help_btn = gr.Button("Find Solutions") solutions = gr.HTML() def find_solutions(issue_text): results = kb_browser.search(issue_text, search_type="hybrid") html = "
" for doc in results["results"][:3]: html += f"""

{doc['title']}

{doc['snippet']}

View Full Article
""" html += "
" return html help_btn.click(find_solutions, issue, solutions) return app ``` ## Development ### Running Tests ```bash pip install pytest pytest test_kb_browser.py -v ``` ### Building the Component ```bash pip install build python -m build ``` ### Publishing ```bash gradio cc publish kb_browser --name "KnowledgeBaseBrowser" ``` ## Contributing 1. Fork the repository 2. Create a feature branch 3. Make your changes 4. Add tests for new functionality 5. Submit a pull request ## License MIT License - see LICENSE file for details. ## Support For issues and questions: - GitHub Issues: [Create an issue](https://github.com/gradio-app/gradio/issues) - Documentation: [Gradio Docs](https://gradio.app/docs) - Community: [Gradio Discord](https://discord.gg/gradio)