|
# Knowledge Base Browser |
|
|
|
A powerful Gradio custom component for retrieval-augmented generation (RAG) applications. This component provides an intuitive interface for searching through documents with AI-powered semantic search capabilities. |
|
|
|
## Features |
|
|
|
- **Semantic Search**: AI-powered meaning-based search using OpenAI embeddings |
|
- **Keyword Search**: Traditional text matching for precise queries |
|
- **Hybrid Search**: Combines semantic and keyword approaches |
|
- **Source Type Filtering**: Filter by PDF, web pages, academic papers, or code repositories |
|
- **Citation Tracking**: Built-in citation management and export functionality |
|
- **Agent Integration**: Designed for both human users and AI agents |
|
- **LlamaIndex Integration**: Uses LlamaIndex with FAISS vector store for efficient retrieval |
|
- **Responsive UI**: Modern, accessible interface with expandable result cards |
|
|
|
## Installation |
|
|
|
```bash |
|
pip install gradio_kb_browser |
|
``` |
|
|
|
For development installation: |
|
|
|
```bash |
|
git clone <repository-url> |
|
cd kb_browser |
|
pip install -e . |
|
``` |
|
|
|
## Quick Start |
|
|
|
### Basic Usage |
|
|
|
```python |
|
import gradio as gr |
|
from kb_browser import KnowledgeBrowser |
|
|
|
# Create the component |
|
kb_browser = KnowledgeBrowser( |
|
index_path="./documents", # Path to your document directory |
|
search_type="semantic", # Default search type |
|
max_results=10 # Maximum results to return |
|
) |
|
|
|
# Use in a Gradio interface |
|
with gr.Blocks() as demo: |
|
gr.Markdown("# Document Search") |
|
|
|
query = gr.Textbox(label="Search Query") |
|
search_btn = gr.Button("Search") |
|
|
|
results = gr.JSON(label="Results") |
|
|
|
def search_documents(query_text): |
|
return kb_browser.search(query_text) |
|
|
|
search_btn.click( |
|
fn=search_documents, |
|
inputs=query, |
|
outputs=results |
|
) |
|
|
|
demo.launch() |
|
``` |
|
|
|
### Agent Integration |
|
|
|
```python |
|
from kb_browser import KnowledgeBrowser |
|
|
|
# Initialize component for agent use |
|
kb_browser = KnowledgeBrowser() |
|
|
|
# Agent can search and get structured results |
|
def agent_research(question): |
|
results = kb_browser.search( |
|
query=question, |
|
search_type="semantic", |
|
max_results=5 |
|
) |
|
|
|
# Process results for agent response |
|
citations = [] |
|
for doc in results["results"]: |
|
citations.append({ |
|
"title": doc["title"], |
|
"source": doc["source"], |
|
"relevance": doc["relevance_score"], |
|
"snippet": doc["snippet"] |
|
}) |
|
|
|
return citations |
|
``` |
|
|
|
## Configuration |
|
|
|
### Environment Variables |
|
|
|
Set your OpenAI API key for semantic search: |
|
|
|
```bash |
|
export OPENAI_API_KEY="your-api-key-here" |
|
``` |
|
|
|
### Component Parameters |
|
|
|
- `query`: Initial search query string |
|
- `results`: Pre-loaded search results |
|
- `index_path`: Path to document directory (default: "./data") |
|
- `search_type`: Search method - "semantic", "keyword", or "hybrid" |
|
- `max_results`: Maximum number of results to return |
|
- `label`: Component label for UI |
|
- `visible`: Whether component is visible |
|
- `elem_classes`: CSS classes for styling |
|
|
|
## Document Formats |
|
|
|
The component supports various document formats: |
|
|
|
- **PDF Files**: Automatically parsed and indexed |
|
- **Text Files**: Plain text documents |
|
- **Markdown**: Documentation and notes |
|
- **JSON**: Structured data documents |
|
|
|
## Search Types |
|
|
|
### Semantic Search |
|
Uses OpenAI embeddings to understand meaning and context. Best for: |
|
- Conceptual queries |
|
- Finding related topics |
|
- Cross-domain searches |
|
|
|
### Keyword Search |
|
Traditional text matching. Best for: |
|
- Exact phrase searches |
|
- Technical terms |
|
- Specific names or identifiers |
|
|
|
### Hybrid Search |
|
Combines both approaches for comprehensive results. |
|
|
|
## API Reference |
|
|
|
### KnowledgeBrowser Class |
|
|
|
#### Methods |
|
|
|
- `search(query, search_type, max_results)`: Perform search and return results |
|
- `preprocess(payload)`: Preprocess component input |
|
- `postprocess(value)`: Postprocess component output |
|
- `api_info()`: Get API schema information |
|
|
|
#### Events |
|
|
|
- `submit`: Triggered when search is performed |
|
- `select`: Triggered when document is selected |
|
- `change`: Triggered when component state changes |
|
|
|
## Example Applications |
|
|
|
### Research Assistant |
|
|
|
```python |
|
import gradio as gr |
|
from kb_browser import KnowledgeBrowser |
|
|
|
def create_research_app(): |
|
kb_browser = KnowledgeBrowser(index_path="./research_papers") |
|
|
|
with gr.Blocks() as app: |
|
gr.Markdown("# Research Assistant") |
|
|
|
question = gr.Textbox(label="Research Question") |
|
search_btn = gr.Button("Search Literature") |
|
|
|
results_display = gr.HTML() |
|
citations = gr.State([]) |
|
|
|
def research_query(question_text): |
|
results = kb_browser.search(question_text, max_results=5) |
|
|
|
html = "<div class='research-results'>" |
|
for doc in results["results"]: |
|
html += f""" |
|
<div class='paper'> |
|
<h3>{doc['title']}</h3> |
|
<p><strong>Source:</strong> {doc['source']}</p> |
|
<p><strong>Relevance:</strong> {doc['relevance_score']:.0%}</p> |
|
<p>{doc['snippet']}</p> |
|
</div> |
|
""" |
|
html += "</div>" |
|
|
|
return html |
|
|
|
search_btn.click(research_query, question, results_display) |
|
|
|
return app |
|
``` |
|
|
|
### Customer Support |
|
|
|
```python |
|
def create_support_app(): |
|
kb_browser = KnowledgeBrowser(index_path="./support_docs") |
|
|
|
with gr.Blocks() as app: |
|
gr.Markdown("# Customer Support Assistant") |
|
|
|
issue = gr.Textbox(label="Describe your issue") |
|
help_btn = gr.Button("Find Solutions") |
|
|
|
solutions = gr.HTML() |
|
|
|
def find_solutions(issue_text): |
|
results = kb_browser.search(issue_text, search_type="hybrid") |
|
|
|
html = "<div class='solutions'>" |
|
for doc in results["results"][:3]: |
|
html += f""" |
|
<div class='solution'> |
|
<h4>{doc['title']}</h4> |
|
<p>{doc['snippet']}</p> |
|
<a href="{doc.get('url', '#')}" target="_blank">View Full Article</a> |
|
</div> |
|
""" |
|
html += "</div>" |
|
|
|
return html |
|
|
|
help_btn.click(find_solutions, issue, solutions) |
|
|
|
return app |
|
``` |
|
|
|
## Development |
|
|
|
### Running Tests |
|
|
|
```bash |
|
pip install pytest |
|
pytest test_kb_browser.py -v |
|
``` |
|
|
|
### Building the Component |
|
|
|
```bash |
|
pip install build |
|
python -m build |
|
``` |
|
|
|
### Publishing |
|
|
|
```bash |
|
gradio cc publish kb_browser --name "KnowledgeBaseBrowser" |
|
``` |
|
|
|
## Contributing |
|
|
|
1. Fork the repository |
|
2. Create a feature branch |
|
3. Make your changes |
|
4. Add tests for new functionality |
|
5. Submit a pull request |
|
|
|
## License |
|
|
|
MIT License - see LICENSE file for details. |
|
|
|
## Support |
|
|
|
For issues and questions: |
|
- GitHub Issues: [Create an issue](https://github.com/gradio-app/gradio/issues) |
|
- Documentation: [Gradio Docs](https://gradio.app/docs) |
|
- Community: [Gradio Discord](https://discord.gg/gradio) |