File size: 7,059 Bytes
7c012de |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 |
# Knowledge Base Browser
A powerful Gradio custom component for retrieval-augmented generation (RAG) applications. This component provides an intuitive interface for searching through documents with AI-powered semantic search capabilities.
## Features
- **Semantic Search**: AI-powered meaning-based search using OpenAI embeddings
- **Keyword Search**: Traditional text matching for precise queries
- **Hybrid Search**: Combines semantic and keyword approaches
- **Source Type Filtering**: Filter by PDF, web pages, academic papers, or code repositories
- **Citation Tracking**: Built-in citation management and export functionality
- **Agent Integration**: Designed for both human users and AI agents
- **LlamaIndex Integration**: Uses LlamaIndex with FAISS vector store for efficient retrieval
- **Responsive UI**: Modern, accessible interface with expandable result cards
## Installation
```bash
pip install gradio_kb_browser
```
For development installation:
```bash
git clone <repository-url>
cd kb_browser
pip install -e .
```
## Quick Start
### Basic Usage
```python
import gradio as gr
from kb_browser import KnowledgeBrowser
# Create the component
kb_browser = KnowledgeBrowser(
index_path="./documents", # Path to your document directory
search_type="semantic", # Default search type
max_results=10 # Maximum results to return
)
# Use in a Gradio interface
with gr.Blocks() as demo:
gr.Markdown("# Document Search")
query = gr.Textbox(label="Search Query")
search_btn = gr.Button("Search")
results = gr.JSON(label="Results")
def search_documents(query_text):
return kb_browser.search(query_text)
search_btn.click(
fn=search_documents,
inputs=query,
outputs=results
)
demo.launch()
```
### Agent Integration
```python
from kb_browser import KnowledgeBrowser
# Initialize component for agent use
kb_browser = KnowledgeBrowser()
# Agent can search and get structured results
def agent_research(question):
results = kb_browser.search(
query=question,
search_type="semantic",
max_results=5
)
# Process results for agent response
citations = []
for doc in results["results"]:
citations.append({
"title": doc["title"],
"source": doc["source"],
"relevance": doc["relevance_score"],
"snippet": doc["snippet"]
})
return citations
```
## Configuration
### Environment Variables
Set your OpenAI API key for semantic search:
```bash
export OPENAI_API_KEY="your-api-key-here"
```
### Component Parameters
- `query`: Initial search query string
- `results`: Pre-loaded search results
- `index_path`: Path to document directory (default: "./data")
- `search_type`: Search method - "semantic", "keyword", or "hybrid"
- `max_results`: Maximum number of results to return
- `label`: Component label for UI
- `visible`: Whether component is visible
- `elem_classes`: CSS classes for styling
## Document Formats
The component supports various document formats:
- **PDF Files**: Automatically parsed and indexed
- **Text Files**: Plain text documents
- **Markdown**: Documentation and notes
- **JSON**: Structured data documents
## Search Types
### Semantic Search
Uses OpenAI embeddings to understand meaning and context. Best for:
- Conceptual queries
- Finding related topics
- Cross-domain searches
### Keyword Search
Traditional text matching. Best for:
- Exact phrase searches
- Technical terms
- Specific names or identifiers
### Hybrid Search
Combines both approaches for comprehensive results.
## API Reference
### KnowledgeBrowser Class
#### Methods
- `search(query, search_type, max_results)`: Perform search and return results
- `preprocess(payload)`: Preprocess component input
- `postprocess(value)`: Postprocess component output
- `api_info()`: Get API schema information
#### Events
- `submit`: Triggered when search is performed
- `select`: Triggered when document is selected
- `change`: Triggered when component state changes
## Example Applications
### Research Assistant
```python
import gradio as gr
from kb_browser import KnowledgeBrowser
def create_research_app():
kb_browser = KnowledgeBrowser(index_path="./research_papers")
with gr.Blocks() as app:
gr.Markdown("# Research Assistant")
question = gr.Textbox(label="Research Question")
search_btn = gr.Button("Search Literature")
results_display = gr.HTML()
citations = gr.State([])
def research_query(question_text):
results = kb_browser.search(question_text, max_results=5)
html = "<div class='research-results'>"
for doc in results["results"]:
html += f"""
<div class='paper'>
<h3>{doc['title']}</h3>
<p><strong>Source:</strong> {doc['source']}</p>
<p><strong>Relevance:</strong> {doc['relevance_score']:.0%}</p>
<p>{doc['snippet']}</p>
</div>
"""
html += "</div>"
return html
search_btn.click(research_query, question, results_display)
return app
```
### Customer Support
```python
def create_support_app():
kb_browser = KnowledgeBrowser(index_path="./support_docs")
with gr.Blocks() as app:
gr.Markdown("# Customer Support Assistant")
issue = gr.Textbox(label="Describe your issue")
help_btn = gr.Button("Find Solutions")
solutions = gr.HTML()
def find_solutions(issue_text):
results = kb_browser.search(issue_text, search_type="hybrid")
html = "<div class='solutions'>"
for doc in results["results"][:3]:
html += f"""
<div class='solution'>
<h4>{doc['title']}</h4>
<p>{doc['snippet']}</p>
<a href="{doc.get('url', '#')}" target="_blank">View Full Article</a>
</div>
"""
html += "</div>"
return html
help_btn.click(find_solutions, issue, solutions)
return app
```
## Development
### Running Tests
```bash
pip install pytest
pytest test_kb_browser.py -v
```
### Building the Component
```bash
pip install build
python -m build
```
### Publishing
```bash
gradio cc publish kb_browser --name "KnowledgeBaseBrowser"
```
## Contributing
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests for new functionality
5. Submit a pull request
## License
MIT License - see LICENSE file for details.
## Support
For issues and questions:
- GitHub Issues: [Create an issue](https://github.com/gradio-app/gradio/issues)
- Documentation: [Gradio Docs](https://gradio.app/docs)
- Community: [Gradio Discord](https://discord.gg/gradio) |