KnowledgeBridge / docs /archive /master-plan.md
fazeel007's picture
initial commit
7c012de
# Master Plan – "Knowledge-Base Browser" Gradio Component
*Track 2 – Custom Components*
## Project Timeline
| Day | Milestone | Output |
|-----|-----------|--------|
| Mon (Β½ day left) | Finalize spec & repo | README with scope, architecture diagram |
| Tue | Component scaffolding | gradio cc init kb_browser, index.html, script.tsx, __init__.py |
| Wed | Backend – retrieval service | LlamaIndex/FAISS index builder, query API |
| Thu | Frontend – results panel UI | React table / accordion, source-link cards |
| Fri | Agent integration demo | Notebook + minimal MCP agent calling component |
| Sat | Polishing, tests, docs | Unit tests, docs site, publish to Gradio Hub |
| Sun (AM) | Submission video & write-up | 90-sec demo, project report |
## Core Features (MVP)
1. Accepts a query string or agent-emitted JSON
2. Calls retrieval API β†’ returns [{"title":..,"snippet":..,"url":..}]
3. Renders expandable result cards + "open source" button
4. Emits selected doc back to parent (so agent can cite)
5. Works in both human click and agent-autonomous modes
---
## Prompt-Script Series for LLM Assistant
Copy-paste each block into your favorite model (GPT-4o, Claude 3, etc.). Each step builds on the previous; stop when the code runs.
**System:** You are an expert Gradio + React developer…
**User:** Follow the numbered roadmap below. Output only the requested files in markdown code-blocks each time.
### Step 1 – Scaffold
**1️⃣ Generate `__init__.py`, `index.html`, `script.tsx`, and `package.json`**
- Component name: kb_browser
- Props: `query: string`, `results: any[]`
- Events: `submit(query)`, `select(doc)`
### Step 2 – Backend retrieval
**2️⃣ Write `retriever.py`**
- Build FAISS vector store from ./data/*.pdf using LlamaIndex
- Expose `search(query, k=5) -> List[Dict]`
- Include dummy driver code for local test
### Step 3 – Wire front-end ↔ back-end
**3️⃣ Update `script.tsx`**
- On `submit`, POST to `/search`
- Render results in Material-UI Accordion
- On click "Use", fire `select(doc)` event
### Step 4 – Gradio component class
**4️⃣ In `__init__.py`**
- subclass gradio.Component
- define `load`, `update`, `submit`, `select` methods
- Register REST `/search` route
### Step 5 – Demo app
**5️⃣ Create `demo.py`**
- Loads component
- Adds text input + "Ask" button
- Shows agent example that calls component via MCP
### Step 6 – Tests & publishing
**6️⃣ Provide pytest suite for backend & frontend**
- CI workflow yaml
**7️⃣ Command to publish:** `gradio cc publish kb_browser --name "KnowledgeBaseBrowser"`
*(After each step: run npm run dev + python demo.py, fix issues, then proceed.)*
---
## Pro-Tips for Implementation
- Keep package size < 2 MB (judging criteria).
- Defer heavy work to backend; UI stays lightweight.
- Use streaming in Gradio (yield) for snappy UX.
- Cache index on disk to slash startup time.
- Include a themed dark/light toggle – easy polish points.
- Record a GIF of the agent citing docs live β†’ eye-catching in demo.
## Implementation Status
### βœ… Completed Features
- **Component Scaffolding**: Complete Gradio custom component structure with proper TypeScript and Python files
- **Backend Retrieval Service**: LlamaIndex + FAISS vector store with OpenAI embeddings for semantic search
- **Frontend UI**: React TypeScript interface with modern design, expandable result cards, and source links
- **Search Capabilities**: Semantic, keyword, and hybrid search modes with relevance scoring
- **Citation Management**: Real-time citation tracking with export functionality
- **Agent Integration**: Both human interactive mode and AI agent autonomous research capabilities
- **Documentation**: Comprehensive README, API documentation, and usage examples
- **Testing**: Test suite covering core functionality and edge cases
- **Publishing Setup**: Package configuration and publishing scripts ready
### 🎯 Key Technical Achievements
1. **Authentic Data Integration**: Uses real OpenAI embeddings for semantic search instead of mock data
2. **Production-Ready Architecture**: Proper error handling, fallback mechanisms, and caching
3. **Multi-Modal Search**: Supports different search strategies for various use cases
4. **Source Verification**: Includes proper citation tracking and source links
5. **Agent-Ready Design**: Built for both human users and autonomous AI agents
### πŸ“ Project Structure
```
kb_browser/
β”œβ”€β”€ __init__.py # Main Gradio component class
β”œβ”€β”€ retriever.py # LlamaIndex + FAISS backend
β”œβ”€β”€ script.tsx # React TypeScript frontend
β”œβ”€β”€ index.html # Component HTML template
β”œβ”€β”€ package.json # Frontend dependencies
β”œβ”€β”€ pyproject.toml # Python package configuration
└── README.md # Component documentation
Root/
β”œβ”€β”€ demo.py # Human + Agent demo application
β”œβ”€β”€ gradio_demo.py # Complete Gradio demo
β”œβ”€β”€ test_kb_browser.py # Comprehensive test suite
β”œβ”€β”€ verify_component.py # Component verification script
└── docs/
└── master-plan.md # This master plan document
```
### πŸš€ Usage Examples
**Basic Component Usage:**
```python
from kb_browser import KnowledgeBrowser
kb_browser = KnowledgeBrowser(
index_path="./documents",
search_type="semantic",
max_results=10
)
results = kb_browser.search("retrieval augmented generation")
```
**Agent Integration:**
```python
def agent_research(question):
results = kb_browser.search(question, search_type="semantic")
citations = [{"title": doc["title"], "source": doc["source"]}
for doc in results["results"]]
return citations
```
**Human Interface:**
```python
import gradio as gr
with gr.Blocks() as demo:
query = gr.Textbox(label="Search Query")
search_btn = gr.Button("Search")
results = gr.JSON(label="Results")
search_btn.click(kb_browser.search, query, results)
```
Execute the six prompt blocks sequentially and you'll have a polished, judge-ready custom component by Friday. Good luck!