Knowledge Base Browser
A powerful Gradio custom component for retrieval-augmented generation (RAG) applications. This component provides an intuitive interface for searching through documents with AI-powered semantic search capabilities.
Features
- Semantic Search: AI-powered meaning-based search using OpenAI embeddings
- Keyword Search: Traditional text matching for precise queries
- Hybrid Search: Combines semantic and keyword approaches
- Source Type Filtering: Filter by PDF, web pages, academic papers, or code repositories
- Citation Tracking: Built-in citation management and export functionality
- Agent Integration: Designed for both human users and AI agents
- LlamaIndex Integration: Uses LlamaIndex with FAISS vector store for efficient retrieval
- Responsive UI: Modern, accessible interface with expandable result cards
Installation
pip install gradio_kb_browser
For development installation:
git clone <repository-url>
cd kb_browser
pip install -e .
Quick Start
Basic Usage
import gradio as gr
from kb_browser import KnowledgeBrowser
# Create the component
kb_browser = KnowledgeBrowser(
index_path="./documents", # Path to your document directory
search_type="semantic", # Default search type
max_results=10 # Maximum results to return
)
# Use in a Gradio interface
with gr.Blocks() as demo:
gr.Markdown("# Document Search")
query = gr.Textbox(label="Search Query")
search_btn = gr.Button("Search")
results = gr.JSON(label="Results")
def search_documents(query_text):
return kb_browser.search(query_text)
search_btn.click(
fn=search_documents,
inputs=query,
outputs=results
)
demo.launch()
Agent Integration
from kb_browser import KnowledgeBrowser
# Initialize component for agent use
kb_browser = KnowledgeBrowser()
# Agent can search and get structured results
def agent_research(question):
results = kb_browser.search(
query=question,
search_type="semantic",
max_results=5
)
# Process results for agent response
citations = []
for doc in results["results"]:
citations.append({
"title": doc["title"],
"source": doc["source"],
"relevance": doc["relevance_score"],
"snippet": doc["snippet"]
})
return citations
Configuration
Environment Variables
Set your OpenAI API key for semantic search:
export OPENAI_API_KEY="your-api-key-here"
Component Parameters
query
: Initial search query stringresults
: Pre-loaded search resultsindex_path
: Path to document directory (default: "./data")search_type
: Search method - "semantic", "keyword", or "hybrid"max_results
: Maximum number of results to returnlabel
: Component label for UIvisible
: Whether component is visibleelem_classes
: CSS classes for styling
Document Formats
The component supports various document formats:
- PDF Files: Automatically parsed and indexed
- Text Files: Plain text documents
- Markdown: Documentation and notes
- JSON: Structured data documents
Search Types
Semantic Search
Uses OpenAI embeddings to understand meaning and context. Best for:
- Conceptual queries
- Finding related topics
- Cross-domain searches
Keyword Search
Traditional text matching. Best for:
- Exact phrase searches
- Technical terms
- Specific names or identifiers
Hybrid Search
Combines both approaches for comprehensive results.
API Reference
KnowledgeBrowser Class
Methods
search(query, search_type, max_results)
: Perform search and return resultspreprocess(payload)
: Preprocess component inputpostprocess(value)
: Postprocess component outputapi_info()
: Get API schema information
Events
submit
: Triggered when search is performedselect
: Triggered when document is selectedchange
: Triggered when component state changes
Example Applications
Research Assistant
import gradio as gr
from kb_browser import KnowledgeBrowser
def create_research_app():
kb_browser = KnowledgeBrowser(index_path="./research_papers")
with gr.Blocks() as app:
gr.Markdown("# Research Assistant")
question = gr.Textbox(label="Research Question")
search_btn = gr.Button("Search Literature")
results_display = gr.HTML()
citations = gr.State([])
def research_query(question_text):
results = kb_browser.search(question_text, max_results=5)
html = "<div class='research-results'>"
for doc in results["results"]:
html += f"""
<div class='paper'>
<h3>{doc['title']}</h3>
<p><strong>Source:</strong> {doc['source']}</p>
<p><strong>Relevance:</strong> {doc['relevance_score']:.0%}</p>
<p>{doc['snippet']}</p>
</div>
"""
html += "</div>"
return html
search_btn.click(research_query, question, results_display)
return app
Customer Support
def create_support_app():
kb_browser = KnowledgeBrowser(index_path="./support_docs")
with gr.Blocks() as app:
gr.Markdown("# Customer Support Assistant")
issue = gr.Textbox(label="Describe your issue")
help_btn = gr.Button("Find Solutions")
solutions = gr.HTML()
def find_solutions(issue_text):
results = kb_browser.search(issue_text, search_type="hybrid")
html = "<div class='solutions'>"
for doc in results["results"][:3]:
html += f"""
<div class='solution'>
<h4>{doc['title']}</h4>
<p>{doc['snippet']}</p>
<a href="{doc.get('url', '#')}" target="_blank">View Full Article</a>
</div>
"""
html += "</div>"
return html
help_btn.click(find_solutions, issue, solutions)
return app
Development
Running Tests
pip install pytest
pytest test_kb_browser.py -v
Building the Component
pip install build
python -m build
Publishing
gradio cc publish kb_browser --name "KnowledgeBaseBrowser"
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Submit a pull request
License
MIT License - see LICENSE file for details.
Support
For issues and questions:
- GitHub Issues: Create an issue
- Documentation: Gradio Docs
- Community: Gradio Discord