fazeel007's picture
initial commit
7c012de
# Knowledge Base Browser
A powerful Gradio custom component for retrieval-augmented generation (RAG) applications. This component provides an intuitive interface for searching through documents with AI-powered semantic search capabilities.
## Features
- **Semantic Search**: AI-powered meaning-based search using OpenAI embeddings
- **Keyword Search**: Traditional text matching for precise queries
- **Hybrid Search**: Combines semantic and keyword approaches
- **Source Type Filtering**: Filter by PDF, web pages, academic papers, or code repositories
- **Citation Tracking**: Built-in citation management and export functionality
- **Agent Integration**: Designed for both human users and AI agents
- **LlamaIndex Integration**: Uses LlamaIndex with FAISS vector store for efficient retrieval
- **Responsive UI**: Modern, accessible interface with expandable result cards
## Installation
```bash
pip install gradio_kb_browser
```
For development installation:
```bash
git clone <repository-url>
cd kb_browser
pip install -e .
```
## Quick Start
### Basic Usage
```python
import gradio as gr
from kb_browser import KnowledgeBrowser
# Create the component
kb_browser = KnowledgeBrowser(
index_path="./documents", # Path to your document directory
search_type="semantic", # Default search type
max_results=10 # Maximum results to return
)
# Use in a Gradio interface
with gr.Blocks() as demo:
gr.Markdown("# Document Search")
query = gr.Textbox(label="Search Query")
search_btn = gr.Button("Search")
results = gr.JSON(label="Results")
def search_documents(query_text):
return kb_browser.search(query_text)
search_btn.click(
fn=search_documents,
inputs=query,
outputs=results
)
demo.launch()
```
### Agent Integration
```python
from kb_browser import KnowledgeBrowser
# Initialize component for agent use
kb_browser = KnowledgeBrowser()
# Agent can search and get structured results
def agent_research(question):
results = kb_browser.search(
query=question,
search_type="semantic",
max_results=5
)
# Process results for agent response
citations = []
for doc in results["results"]:
citations.append({
"title": doc["title"],
"source": doc["source"],
"relevance": doc["relevance_score"],
"snippet": doc["snippet"]
})
return citations
```
## Configuration
### Environment Variables
Set your OpenAI API key for semantic search:
```bash
export OPENAI_API_KEY="your-api-key-here"
```
### Component Parameters
- `query`: Initial search query string
- `results`: Pre-loaded search results
- `index_path`: Path to document directory (default: "./data")
- `search_type`: Search method - "semantic", "keyword", or "hybrid"
- `max_results`: Maximum number of results to return
- `label`: Component label for UI
- `visible`: Whether component is visible
- `elem_classes`: CSS classes for styling
## Document Formats
The component supports various document formats:
- **PDF Files**: Automatically parsed and indexed
- **Text Files**: Plain text documents
- **Markdown**: Documentation and notes
- **JSON**: Structured data documents
## Search Types
### Semantic Search
Uses OpenAI embeddings to understand meaning and context. Best for:
- Conceptual queries
- Finding related topics
- Cross-domain searches
### Keyword Search
Traditional text matching. Best for:
- Exact phrase searches
- Technical terms
- Specific names or identifiers
### Hybrid Search
Combines both approaches for comprehensive results.
## API Reference
### KnowledgeBrowser Class
#### Methods
- `search(query, search_type, max_results)`: Perform search and return results
- `preprocess(payload)`: Preprocess component input
- `postprocess(value)`: Postprocess component output
- `api_info()`: Get API schema information
#### Events
- `submit`: Triggered when search is performed
- `select`: Triggered when document is selected
- `change`: Triggered when component state changes
## Example Applications
### Research Assistant
```python
import gradio as gr
from kb_browser import KnowledgeBrowser
def create_research_app():
kb_browser = KnowledgeBrowser(index_path="./research_papers")
with gr.Blocks() as app:
gr.Markdown("# Research Assistant")
question = gr.Textbox(label="Research Question")
search_btn = gr.Button("Search Literature")
results_display = gr.HTML()
citations = gr.State([])
def research_query(question_text):
results = kb_browser.search(question_text, max_results=5)
html = "<div class='research-results'>"
for doc in results["results"]:
html += f"""
<div class='paper'>
<h3>{doc['title']}</h3>
<p><strong>Source:</strong> {doc['source']}</p>
<p><strong>Relevance:</strong> {doc['relevance_score']:.0%}</p>
<p>{doc['snippet']}</p>
</div>
"""
html += "</div>"
return html
search_btn.click(research_query, question, results_display)
return app
```
### Customer Support
```python
def create_support_app():
kb_browser = KnowledgeBrowser(index_path="./support_docs")
with gr.Blocks() as app:
gr.Markdown("# Customer Support Assistant")
issue = gr.Textbox(label="Describe your issue")
help_btn = gr.Button("Find Solutions")
solutions = gr.HTML()
def find_solutions(issue_text):
results = kb_browser.search(issue_text, search_type="hybrid")
html = "<div class='solutions'>"
for doc in results["results"][:3]:
html += f"""
<div class='solution'>
<h4>{doc['title']}</h4>
<p>{doc['snippet']}</p>
<a href="{doc.get('url', '#')}" target="_blank">View Full Article</a>
</div>
"""
html += "</div>"
return html
help_btn.click(find_solutions, issue, solutions)
return app
```
## Development
### Running Tests
```bash
pip install pytest
pytest test_kb_browser.py -v
```
### Building the Component
```bash
pip install build
python -m build
```
### Publishing
```bash
gradio cc publish kb_browser --name "KnowledgeBaseBrowser"
```
## Contributing
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests for new functionality
5. Submit a pull request
## License
MIT License - see LICENSE file for details.
## Support
For issues and questions:
- GitHub Issues: [Create an issue](https://github.com/gradio-app/gradio/issues)
- Documentation: [Gradio Docs](https://gradio.app/docs)
- Community: [Gradio Discord](https://discord.gg/gradio)