KnowledgeBridge / docs /archive /master-plan.md
fazeel007's picture
initial commit
7c012de

Master Plan – "Knowledge-Base Browser" Gradio Component

Track 2 – Custom Components

Project Timeline

Day Milestone Output
Mon (Β½ day left) Finalize spec & repo README with scope, architecture diagram
Tue Component scaffolding gradio cc init kb_browser, index.html, script.tsx, init.py
Wed Backend – retrieval service LlamaIndex/FAISS index builder, query API
Thu Frontend – results panel UI React table / accordion, source-link cards
Fri Agent integration demo Notebook + minimal MCP agent calling component
Sat Polishing, tests, docs Unit tests, docs site, publish to Gradio Hub
Sun (AM) Submission video & write-up 90-sec demo, project report

Core Features (MVP)

  1. Accepts a query string or agent-emitted JSON
  2. Calls retrieval API β†’ returns [{"title":..,"snippet":..,"url":..}]
  3. Renders expandable result cards + "open source" button
  4. Emits selected doc back to parent (so agent can cite)
  5. Works in both human click and agent-autonomous modes

Prompt-Script Series for LLM Assistant

Copy-paste each block into your favorite model (GPT-4o, Claude 3, etc.). Each step builds on the previous; stop when the code runs.

System: You are an expert Gradio + React developer…
User: Follow the numbered roadmap below. Output only the requested files in markdown code-blocks each time.

Step 1 – Scaffold

1️⃣ Generate __init__.py, index.html, script.tsx, and package.json

  • Component name: kb_browser
  • Props: query: string, results: any[]
  • Events: submit(query), select(doc)

Step 2 – Backend retrieval

2️⃣ Write retriever.py

  • Build FAISS vector store from ./data/*.pdf using LlamaIndex
  • Expose search(query, k=5) -> List[Dict]
  • Include dummy driver code for local test

Step 3 – Wire front-end ↔ back-end

3️⃣ Update script.tsx

  • On submit, POST to /search
  • Render results in Material-UI Accordion
  • On click "Use", fire select(doc) event

Step 4 – Gradio component class

4️⃣ In __init__.py

  • subclass gradio.Component
  • define load, update, submit, select methods
  • Register REST /search route

Step 5 – Demo app

5️⃣ Create demo.py

  • Loads component
  • Adds text input + "Ask" button
  • Shows agent example that calls component via MCP

Step 6 – Tests & publishing

6️⃣ Provide pytest suite for backend & frontend

  • CI workflow yaml

7️⃣ Command to publish: gradio cc publish kb_browser --name "KnowledgeBaseBrowser"

(After each step: run npm run dev + python demo.py, fix issues, then proceed.)


Pro-Tips for Implementation

  • Keep package size < 2 MB (judging criteria).
  • Defer heavy work to backend; UI stays lightweight.
  • Use streaming in Gradio (yield) for snappy UX.
  • Cache index on disk to slash startup time.
  • Include a themed dark/light toggle – easy polish points.
  • Record a GIF of the agent citing docs live β†’ eye-catching in demo.

Implementation Status

βœ… Completed Features

  • Component Scaffolding: Complete Gradio custom component structure with proper TypeScript and Python files
  • Backend Retrieval Service: LlamaIndex + FAISS vector store with OpenAI embeddings for semantic search
  • Frontend UI: React TypeScript interface with modern design, expandable result cards, and source links
  • Search Capabilities: Semantic, keyword, and hybrid search modes with relevance scoring
  • Citation Management: Real-time citation tracking with export functionality
  • Agent Integration: Both human interactive mode and AI agent autonomous research capabilities
  • Documentation: Comprehensive README, API documentation, and usage examples
  • Testing: Test suite covering core functionality and edge cases
  • Publishing Setup: Package configuration and publishing scripts ready

🎯 Key Technical Achievements

  1. Authentic Data Integration: Uses real OpenAI embeddings for semantic search instead of mock data
  2. Production-Ready Architecture: Proper error handling, fallback mechanisms, and caching
  3. Multi-Modal Search: Supports different search strategies for various use cases
  4. Source Verification: Includes proper citation tracking and source links
  5. Agent-Ready Design: Built for both human users and autonomous AI agents

πŸ“ Project Structure

kb_browser/
β”œβ”€β”€ __init__.py           # Main Gradio component class
β”œβ”€β”€ retriever.py          # LlamaIndex + FAISS backend
β”œβ”€β”€ script.tsx           # React TypeScript frontend
β”œβ”€β”€ index.html           # Component HTML template
β”œβ”€β”€ package.json         # Frontend dependencies
β”œβ”€β”€ pyproject.toml       # Python package configuration
└── README.md            # Component documentation

Root/
β”œβ”€β”€ demo.py              # Human + Agent demo application
β”œβ”€β”€ gradio_demo.py       # Complete Gradio demo
β”œβ”€β”€ test_kb_browser.py   # Comprehensive test suite
β”œβ”€β”€ verify_component.py  # Component verification script
└── docs/
    └── master-plan.md   # This master plan document

πŸš€ Usage Examples

Basic Component Usage:

from kb_browser import KnowledgeBrowser

kb_browser = KnowledgeBrowser(
    index_path="./documents",
    search_type="semantic",
    max_results=10
)

results = kb_browser.search("retrieval augmented generation")

Agent Integration:

def agent_research(question):
    results = kb_browser.search(question, search_type="semantic")
    citations = [{"title": doc["title"], "source": doc["source"]} 
                for doc in results["results"]]
    return citations

Human Interface:

import gradio as gr

with gr.Blocks() as demo:
    query = gr.Textbox(label="Search Query")
    search_btn = gr.Button("Search")
    results = gr.JSON(label="Results")
    
    search_btn.click(kb_browser.search, query, results)

Execute the six prompt blocks sequentially and you'll have a polished, judge-ready custom component by Friday. Good luck!