File size: 7,059 Bytes
7c012de
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
# Knowledge Base Browser

A powerful Gradio custom component for retrieval-augmented generation (RAG) applications. This component provides an intuitive interface for searching through documents with AI-powered semantic search capabilities.

## Features

- **Semantic Search**: AI-powered meaning-based search using OpenAI embeddings
- **Keyword Search**: Traditional text matching for precise queries
- **Hybrid Search**: Combines semantic and keyword approaches
- **Source Type Filtering**: Filter by PDF, web pages, academic papers, or code repositories
- **Citation Tracking**: Built-in citation management and export functionality
- **Agent Integration**: Designed for both human users and AI agents
- **LlamaIndex Integration**: Uses LlamaIndex with FAISS vector store for efficient retrieval
- **Responsive UI**: Modern, accessible interface with expandable result cards

## Installation

```bash
pip install gradio_kb_browser
```

For development installation:

```bash
git clone <repository-url>
cd kb_browser
pip install -e .
```

## Quick Start

### Basic Usage

```python
import gradio as gr
from kb_browser import KnowledgeBrowser

# Create the component
kb_browser = KnowledgeBrowser(
    index_path="./documents",  # Path to your document directory
    search_type="semantic",    # Default search type
    max_results=10            # Maximum results to return
)

# Use in a Gradio interface
with gr.Blocks() as demo:
    gr.Markdown("# Document Search")
    
    query = gr.Textbox(label="Search Query")
    search_btn = gr.Button("Search")
    
    results = gr.JSON(label="Results")
    
    def search_documents(query_text):
        return kb_browser.search(query_text)
    
    search_btn.click(
        fn=search_documents,
        inputs=query,
        outputs=results
    )

demo.launch()
```

### Agent Integration

```python
from kb_browser import KnowledgeBrowser

# Initialize component for agent use
kb_browser = KnowledgeBrowser()

# Agent can search and get structured results
def agent_research(question):
    results = kb_browser.search(
        query=question,
        search_type="semantic",
        max_results=5
    )
    
    # Process results for agent response
    citations = []
    for doc in results["results"]:
        citations.append({
            "title": doc["title"],
            "source": doc["source"],
            "relevance": doc["relevance_score"],
            "snippet": doc["snippet"]
        })
    
    return citations
```

## Configuration

### Environment Variables

Set your OpenAI API key for semantic search:

```bash
export OPENAI_API_KEY="your-api-key-here"
```

### Component Parameters

- `query`: Initial search query string
- `results`: Pre-loaded search results
- `index_path`: Path to document directory (default: "./data")
- `search_type`: Search method - "semantic", "keyword", or "hybrid"
- `max_results`: Maximum number of results to return
- `label`: Component label for UI
- `visible`: Whether component is visible
- `elem_classes`: CSS classes for styling

## Document Formats

The component supports various document formats:

- **PDF Files**: Automatically parsed and indexed
- **Text Files**: Plain text documents
- **Markdown**: Documentation and notes
- **JSON**: Structured data documents

## Search Types

### Semantic Search
Uses OpenAI embeddings to understand meaning and context. Best for:
- Conceptual queries
- Finding related topics
- Cross-domain searches

### Keyword Search
Traditional text matching. Best for:
- Exact phrase searches
- Technical terms
- Specific names or identifiers

### Hybrid Search
Combines both approaches for comprehensive results.

## API Reference

### KnowledgeBrowser Class

#### Methods

- `search(query, search_type, max_results)`: Perform search and return results
- `preprocess(payload)`: Preprocess component input
- `postprocess(value)`: Postprocess component output
- `api_info()`: Get API schema information

#### Events

- `submit`: Triggered when search is performed
- `select`: Triggered when document is selected
- `change`: Triggered when component state changes

## Example Applications

### Research Assistant

```python
import gradio as gr
from kb_browser import KnowledgeBrowser

def create_research_app():
    kb_browser = KnowledgeBrowser(index_path="./research_papers")
    
    with gr.Blocks() as app:
        gr.Markdown("# Research Assistant")
        
        question = gr.Textbox(label="Research Question")
        search_btn = gr.Button("Search Literature")
        
        results_display = gr.HTML()
        citations = gr.State([])
        
        def research_query(question_text):
            results = kb_browser.search(question_text, max_results=5)
            
            html = "<div class='research-results'>"
            for doc in results["results"]:
                html += f"""
                <div class='paper'>
                    <h3>{doc['title']}</h3>
                    <p><strong>Source:</strong> {doc['source']}</p>
                    <p><strong>Relevance:</strong> {doc['relevance_score']:.0%}</p>
                    <p>{doc['snippet']}</p>
                </div>
                """
            html += "</div>"
            
            return html
        
        search_btn.click(research_query, question, results_display)
    
    return app
```

### Customer Support

```python
def create_support_app():
    kb_browser = KnowledgeBrowser(index_path="./support_docs")
    
    with gr.Blocks() as app:
        gr.Markdown("# Customer Support Assistant")
        
        issue = gr.Textbox(label="Describe your issue")
        help_btn = gr.Button("Find Solutions")
        
        solutions = gr.HTML()
        
        def find_solutions(issue_text):
            results = kb_browser.search(issue_text, search_type="hybrid")
            
            html = "<div class='solutions'>"
            for doc in results["results"][:3]:
                html += f"""
                <div class='solution'>
                    <h4>{doc['title']}</h4>
                    <p>{doc['snippet']}</p>
                    <a href="{doc.get('url', '#')}" target="_blank">View Full Article</a>
                </div>
                """
            html += "</div>"
            
            return html
        
        help_btn.click(find_solutions, issue, solutions)
    
    return app
```

## Development

### Running Tests

```bash
pip install pytest
pytest test_kb_browser.py -v
```

### Building the Component

```bash
pip install build
python -m build
```

### Publishing

```bash
gradio cc publish kb_browser --name "KnowledgeBaseBrowser"
```

## Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests for new functionality
5. Submit a pull request

## License

MIT License - see LICENSE file for details.

## Support

For issues and questions:
- GitHub Issues: [Create an issue](https://github.com/gradio-app/gradio/issues)
- Documentation: [Gradio Docs](https://gradio.app/docs)
- Community: [Gradio Discord](https://discord.gg/gradio)