Spaces:

DataQuests
/

DeepCritical

Running

App Files Files Community

devtonic

by Tonic - opened 4 days ago

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

-52562

This view is limited to 50 files because it contains too many changes. See the raw diff here.

Files changed (50) hide show

.cursorrules +0 -240
.env.example +0 -48
.github/README.md +0 -57
.github/workflows/ci.yml +0 -127
.gitignore +0 -80
.pre-commit-config.yaml +0 -64
.pre-commit-hooks/run_pytest.ps1 +0 -19
.pre-commit-hooks/run_pytest.sh +0 -20
.pre-commit-hooks/run_pytest_embeddings.ps1 +0 -14
.pre-commit-hooks/run_pytest_embeddings.sh +0 -15
.pre-commit-hooks/run_pytest_unit.ps1 +0 -14
.pre-commit-hooks/run_pytest_unit.sh +0 -15
.pre-commit-hooks/run_pytest_with_sync.ps1 +0 -25
.pre-commit-hooks/run_pytest_with_sync.py +0 -235
.python-version +0 -1
AGENTS.txt +0 -236
CONTRIBUTING.md +0 -1
Dockerfile +0 -52
Makefile +0 -42
README.md +8 -113
dev/.cursorrules +0 -241
dev/AGENTS.txt +0 -236
dev/Makefile +0 -51
dev/docs_plugins.py +0 -74
docs/api/agents.md +0 -270
docs/api/models.md +0 -248
docs/api/orchestrators.md +0 -195
docs/api/services.md +0 -201
docs/api/tools.md +0 -235
docs/architecture/agents.md +0 -192
docs/architecture/graph-orchestration.md +0 -152
docs/architecture/graph_orchestration.md +0 -235
docs/architecture/middleware.md +0 -142
docs/architecture/orchestrators.md +0 -198
docs/architecture/services.md +0 -142
docs/architecture/tools.md +0 -175
docs/architecture/workflow-diagrams.md +0 -670
docs/architecture/workflows.md +0 -662
docs/configuration/CONFIGURATION.md +0 -743
docs/configuration/index.md +0 -746
docs/contributing.md +0 -428
docs/contributing/code-quality.md +0 -81
docs/contributing/code-style.md +0 -61
docs/contributing/error-handling.md +0 -69
docs/contributing/implementation-patterns.md +0 -84
docs/contributing/index.md +0 -163
docs/contributing/prompt-engineering.md +0 -69
docs/contributing/testing.md +0 -65
docs/getting-started/examples.md +0 -209
docs/getting-started/installation.md +0 -148

.cursorrules DELETED Viewed

@@ -1,240 +0,0 @@
-# DeepCritical Project - Cursor Rules
-## Project-Wide Rules
-**Architecture**: Multi-agent research system using Pydantic AI for agent orchestration, supporting iterative and deep research patterns. Uses middleware for state management, budget tracking, and workflow coordination.
-**Type Safety**: ALWAYS use complete type hints. All functions must have parameter and return type annotations. Use `mypy --strict` compliance. Use `TYPE_CHECKING` imports for circular dependencies: `from typing import TYPE_CHECKING; if TYPE_CHECKING: from src.services.embeddings import EmbeddingService`
-**Async Patterns**: ALL I/O operations must be async (`async def`, `await`). Use `asyncio.gather()` for parallel operations. CPU-bound work must use `run_in_executor()`: `loop = asyncio.get_running_loop(); result = await loop.run_in_executor(None, cpu_bound_function, args)`. Never block the event loop.
-**Error Handling**: Use custom exceptions from `src/utils/exceptions.py`: `DeepCriticalError`, `SearchError`, `RateLimitError`, `JudgeError`, `ConfigurationError`. Always chain exceptions: `raise SearchError(...) from e`. Log with structlog: `logger.error("Operation failed", error=str(e), context=value)`.
-**Logging**: Use `structlog` for ALL logging (NOT `print` or `logging`). Import: `import structlog; logger = structlog.get_logger()`. Log with structured data: `logger.info("event", key=value)`. Use appropriate levels: DEBUG, INFO, WARNING, ERROR.
-**Pydantic Models**: All data exchange uses Pydantic models from `src/utils/models.py`. Models are frozen (`model_config = {"frozen": True}`) for immutability. Use `Field()` with descriptions. Validate with `ge=`, `le=`, `min_length=`, `max_length=` constraints.
-**Code Style**: Ruff with 100-char line length. Ignore rules: `PLR0913` (too many arguments), `PLR0912` (too many branches), `PLR0911` (too many returns), `PLR2004` (magic values), `PLW0603` (global statement), `PLC0415` (lazy imports).
-**Docstrings**: Google-style docstrings for all public functions. Include Args, Returns, Raises sections. Use type hints in docstrings only if needed for clarity.
-**Testing**: Unit tests in `tests/unit/` (mocked, fast). Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`). Use `respx` for httpx mocking, `pytest-mock` for general mocking.
-**State Management**: Use `ContextVar` in middleware for thread-safe isolation. Never use global mutable state (except singletons via `@lru_cache`). Use `WorkflowState` from `src/middleware/state_machine.py` for workflow state.
-**Citation Validation**: ALWAYS validate references before returning reports. Use `validate_references()` from `src/utils/citation_validator.py`. Remove hallucinated citations. Log warnings for removed citations.
----
-## src/agents/ - Agent Implementation Rules
-**Pattern**: All agents use Pydantic AI `Agent` class. Agents have structured output types (Pydantic models) or return strings. Use factory functions in `src/agent_factory/agents.py` for creation.
-**Agent Structure**:
-- System prompt as module-level constant (with date injection: `datetime.now().strftime("%Y-%m-%d")`)
-- Agent class with `__init__(model: Any | None = None)`
-- Main method (e.g., `async def evaluate()`, `async def write_report()`)
-- Factory function: `def create_agent_name(model: Any | None = None) -> AgentName`
-**Model Initialization**: Use `get_model()` from `src/agent_factory/judges.py` if no model provided. Support OpenAI/Anthropic/HF Inference via settings.
-**Error Handling**: Return fallback values (e.g., `KnowledgeGapOutput(research_complete=False, outstanding_gaps=[...])`) on failure. Log errors with context. Use retry logic (3 retries) in Pydantic AI Agent initialization.
-**Input Validation**: Validate query/inputs are not empty. Truncate very long inputs with warnings. Handle None values gracefully.
-**Output Types**: Use structured output types from `src/utils/models.py` (e.g., `KnowledgeGapOutput`, `AgentSelectionPlan`, `ReportDraft`). For text output (writer agents), return `str` directly.
-**Agent-Specific Rules**:
-- `knowledge_gap.py`: Outputs `KnowledgeGapOutput`. Evaluates research completeness.
-- `tool_selector.py`: Outputs `AgentSelectionPlan`. Selects tools (RAG/web/database).
-- `writer.py`: Returns markdown string. Includes citations in numbered format.
-- `long_writer.py`: Uses `ReportDraft` input/output. Handles section-by-section writing.
-- `proofreader.py`: Takes `ReportDraft`, returns polished markdown.
-- `thinking.py`: Returns observation string from conversation history.
-- `input_parser.py`: Outputs `ParsedQuery` with research mode detection.
----
-## src/tools/ - Search Tool Rules
-**Protocol**: All tools implement `SearchTool` protocol from `src/tools/base.py`: `name` property and `async def search(query, max_results) -> list[Evidence]`.
-**Rate Limiting**: Use `@retry` decorator from tenacity: `@retry(stop=stop_after_attempt(3), wait=wait_exponential(...))`. Implement `_rate_limit()` method for APIs with limits. Use shared rate limiters from `src/tools/rate_limiter.py`.
-**Error Handling**: Raise `SearchError` or `RateLimitError` on failures. Handle HTTP errors (429, 500, timeout). Return empty list on non-critical errors (log warning).
-**Query Preprocessing**: Use `preprocess_query()` from `src/tools/query_utils.py` to remove noise and expand synonyms.
-**Evidence Conversion**: Convert API responses to `Evidence` objects with `Citation`. Extract metadata (title, url, date, authors). Set relevance scores (0.0-1.0). Handle missing fields gracefully.
-**Tool-Specific Rules**:
-- `pubmed.py`: Use NCBI E-utilities (ESearch → EFetch). Rate limit: 0.34s between requests. Parse XML with `xmltodict`. Handle single vs. multiple articles.
-- `clinicaltrials.py`: Use `requests` library (NOT httpx - WAF blocks httpx). Run in thread pool: `await asyncio.to_thread(requests.get, ...)`. Filter: Only interventional studies, active/completed.
-- `europepmc.py`: Handle preprint markers: `[PREPRINT - Not peer-reviewed]`. Build URLs from DOI or PMID.
-- `rag_tool.py`: Wraps `LlamaIndexRAGService`. Returns Evidence from RAG results. Handles ingestion.
-- `search_handler.py`: Orchestrates parallel searches across multiple tools. Uses `asyncio.gather()` with `return_exceptions=True`. Aggregates results into `SearchResult`.
----
-## src/middleware/ - Middleware Rules
-**State Management**: Use `ContextVar` for thread-safe isolation. `WorkflowState` uses `ContextVar[WorkflowState | None]`. Initialize with `init_workflow_state(embedding_service)`. Access with `get_workflow_state()` (auto-initializes if missing).
-**WorkflowState**: Tracks `evidence: list[Evidence]`, `conversation: Conversation`, `embedding_service: Any`. Methods: `add_evidence()` (deduplicates by URL), `async search_related()` (semantic search).
-**WorkflowManager**: Manages parallel research loops. Methods: `add_loop()`, `run_loops_parallel()`, `update_loop_status()`, `sync_loop_evidence_to_state()`. Uses `asyncio.gather()` for parallel execution. Handles errors per loop (don't fail all if one fails).
-**BudgetTracker**: Tracks tokens, time, iterations per loop and globally. Methods: `create_budget()`, `add_tokens()`, `start_timer()`, `update_timer()`, `increment_iteration()`, `check_budget()`, `can_continue()`. Token estimation: `estimate_tokens(text)` (~4 chars per token), `estimate_llm_call_tokens(prompt, response)`.
-**Models**: All middleware models in `src/utils/models.py`. `IterationData`, `Conversation`, `ResearchLoop`, `BudgetStatus` are used by middleware.
----
-## src/orchestrator/ - Orchestration Rules
-**Research Flows**: Two patterns: `IterativeResearchFlow` (single loop) and `DeepResearchFlow` (plan → parallel loops → synthesis). Both support agent chains (`use_graph=False`) and graph execution (`use_graph=True`).
-**IterativeResearchFlow**: Pattern: Generate observations → Evaluate gaps → Select tools → Execute → Judge → Continue/Complete. Uses `KnowledgeGapAgent`, `ToolSelectorAgent`, `ThinkingAgent`, `WriterAgent`, `JudgeHandler`. Tracks iterations, time, budget.
-**DeepResearchFlow**: Pattern: Planner → Parallel iterative loops per section → Synthesizer. Uses `PlannerAgent`, `IterativeResearchFlow` (per section), `LongWriterAgent` or `ProofreaderAgent`. Uses `WorkflowManager` for parallel execution.
-**Graph Orchestrator**: Uses Pydantic AI Graphs (when available) or agent chains (fallback). Routes based on research mode (iterative/deep/auto). Streams `AgentEvent` objects for UI.
-**State Initialization**: Always call `init_workflow_state()` before running flows. Initialize `BudgetTracker` per loop. Use `WorkflowManager` for parallel coordination.
-**Event Streaming**: Yield `AgentEvent` objects during execution. Event types: "started", "search_complete", "judge_complete", "hypothesizing", "synthesizing", "complete", "error". Include iteration numbers and data payloads.
----
-## src/services/ - Service Rules
-**EmbeddingService**: Local sentence-transformers (NO API key required). All operations async-safe via `run_in_executor()`. ChromaDB for vector storage. Deduplication threshold: 0.85 (85% similarity = duplicate).
-**LlamaIndexRAGService**: Uses OpenAI embeddings (requires `OPENAI_API_KEY`). Methods: `ingest_evidence()`, `retrieve()`, `query()`. Returns documents with metadata (source, title, url, date, authors). Lazy initialization with graceful fallback.
-**StatisticalAnalyzer**: Generates Python code via LLM. Executes in Modal sandbox (secure, isolated). Library versions pinned in `SANDBOX_LIBRARIES` dict. Returns `AnalysisResult` with verdict (SUPPORTED/REFUTED/INCONCLUSIVE).
-**Singleton Pattern**: Use `@lru_cache(maxsize=1)` for singletons: `@lru_cache(maxsize=1); def get_service() -> Service: return Service()`. Lazy initialization to avoid requiring dependencies at import time.
----
-## src/utils/ - Utility Rules
-**Models**: All Pydantic models in `src/utils/models.py`. Use frozen models (`model_config = {"frozen": True}`) except where mutation needed. Use `Field()` with descriptions. Validate with constraints.
-**Config**: Settings via Pydantic Settings (`src/utils/config.py`). Load from `.env` automatically. Use `settings` singleton: `from src.utils.config import settings`. Validate API keys with properties: `has_openai_key`, `has_anthropic_key`.
-**Exceptions**: Custom exception hierarchy in `src/utils/exceptions.py`. Base: `DeepCriticalError`. Specific: `SearchError`, `RateLimitError`, `JudgeError`, `ConfigurationError`. Always chain exceptions.
-**LLM Factory**: Centralized LLM model creation in `src/utils/llm_factory.py`. Supports OpenAI, Anthropic, HF Inference. Use `get_model()` or factory functions. Check requirements before initialization.
-**Citation Validator**: Use `validate_references()` from `src/utils/citation_validator.py`. Removes hallucinated citations (URLs not in evidence). Logs warnings. Returns validated report string.
----
-## src/orchestrator_factory.py Rules
-**Purpose**: Factory for creating orchestrators. Supports "simple" (legacy) and "advanced" (magentic) modes. Auto-detects mode based on API key availability.
-**Pattern**: Lazy import for optional dependencies (`_get_magentic_orchestrator_class()`). Handles `ImportError` gracefully with clear error messages.
-**Mode Detection**: `_determine_mode()` checks explicit mode or auto-detects: "advanced" if `settings.has_openai_key`, else "simple". Maps "magentic" → "advanced".
-**Function Signature**: `create_orchestrator(search_handler, judge_handler, config, mode) -> Any`. Simple mode requires handlers. Advanced mode uses MagenticOrchestrator.
-**Error Handling**: Raise `ValueError` with clear messages if requirements not met. Log mode selection with structlog.
----
-## src/orchestrator_hierarchical.py Rules
-**Purpose**: Hierarchical orchestrator using middleware and sub-teams. Adapts Magentic ChatAgent to SubIterationTeam protocol.
-**Pattern**: Uses `SubIterationMiddleware` with `ResearchTeam` and `LLMSubIterationJudge`. Event-driven via callback queue.
-**State Initialization**: Initialize embedding service with graceful fallback. Use `init_magentic_state()` (deprecated, but kept for compatibility).
-**Event Streaming**: Uses `asyncio.Queue` for event coordination. Yields `AgentEvent` objects. Handles event callback pattern with `asyncio.wait()`.
-**Error Handling**: Log errors with context. Yield error events. Process remaining events after task completion.
----
-## src/orchestrator_magentic.py Rules
-**Purpose**: Magentic-based orchestrator using ChatAgent pattern. Each agent has internal LLM. Manager orchestrates agents.
-**Pattern**: Uses `MagenticBuilder` with participants (searcher, hypothesizer, judge, reporter). Manager uses `OpenAIChatClient`. Workflow built in `_build_workflow()`.
-**Event Processing**: `_process_event()` converts Magentic events to `AgentEvent`. Handles: `MagenticOrchestratorMessageEvent`, `MagenticAgentMessageEvent`, `MagenticFinalResultEvent`, `MagenticAgentDeltaEvent`, `WorkflowOutputEvent`.
-**Text Extraction**: `_extract_text()` defensively extracts text from messages. Priority: `.content` → `.text` → `str(message)`. Handles buggy message objects.
-**State Initialization**: Initialize embedding service with graceful fallback. Use `init_magentic_state()` (deprecated).
-**Requirements**: Must call `check_magentic_requirements()` in `__init__`. Requires `agent-framework-core` and OpenAI API key.
-**Event Types**: Maps agent names to event types: "search" → "search_complete", "judge" → "judge_complete", "hypothes" → "hypothesizing", "report" → "synthesizing".
----
-## src/agent_factory/ - Factory Rules
-**Pattern**: Factory functions for creating agents and handlers. Lazy initialization for optional dependencies. Support OpenAI/Anthropic/HF Inference.
-**Judges**: `create_judge_handler()` creates `JudgeHandler` with structured output (`JudgeAssessment`). Supports `MockJudgeHandler`, `HFInferenceJudgeHandler` as fallbacks.
-**Agents**: Factory functions in `agents.py` for all Pydantic AI agents. Pattern: `create_agent_name(model: Any | None = None) -> AgentName`. Use `get_model()` if model not provided.
-**Graph Builder**: `graph_builder.py` contains utilities for building research graphs. Supports iterative and deep research graph construction.
-**Error Handling**: Raise `ConfigurationError` if required API keys missing. Log agent creation. Handle import errors gracefully.
----
-## src/prompts/ - Prompt Rules
-**Pattern**: System prompts stored as module-level constants. Include date injection: `datetime.now().strftime("%Y-%m-%d")`. Format evidence with truncation (1500 chars per item).
-**Judge Prompts**: In `judge.py`. Handle empty evidence case separately. Always request structured JSON output.
-**Hypothesis Prompts**: In `hypothesis.py`. Use diverse evidence selection (MMR algorithm). Sentence-aware truncation.
-**Report Prompts**: In `report.py`. Include full citation details. Use diverse evidence selection (n=20). Emphasize citation validation rules.
----
-## Testing Rules
-**Structure**: Unit tests in `tests/unit/` (mocked, fast). Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`).
-**Mocking**: Use `respx` for httpx mocking. Use `pytest-mock` for general mocking. Mock LLM calls in unit tests (use `MockJudgeHandler`).
-**Fixtures**: Common fixtures in `tests/conftest.py`: `mock_httpx_client`, `mock_llm_response`.
-**Coverage**: Aim for >80% coverage. Test error handling, edge cases, and integration paths.
----
-## File-Specific Agent Rules
-**knowledge_gap.py**: Outputs `KnowledgeGapOutput`. System prompt evaluates research completeness. Handles conversation history. Returns fallback on error.
-**writer.py**: Returns markdown string. System prompt includes citation format examples. Validates inputs. Truncates long findings. Retry logic for transient failures.
-**long_writer.py**: Uses `ReportDraft` input/output. Writes sections iteratively. Reformats references (deduplicates, renumbers). Reformats section headings.
-**proofreader.py**: Takes `ReportDraft`, returns polished markdown. Removes duplicates. Adds summary. Preserves references.
-**tool_selector.py**: Outputs `AgentSelectionPlan`. System prompt lists available agents (WebSearchAgent, SiteCrawlerAgent, RAGAgent). Guidelines for when to use each.
-**thinking.py**: Returns observation string. Generates observations from conversation history. Uses query and background context.
-**input_parser.py**: Outputs `ParsedQuery`. Detects research mode (iterative/deep). Extracts entities and research questions. Improves/refines query.

.env.example DELETED Viewed

@@ -1,48 +0,0 @@
-# ============== LLM CONFIGURATION ==============
-# Provider: "openai" or "anthropic"
-LLM_PROVIDER=openai
-# API Keys (at least one required for full LLM analysis)
-OPENAI_API_KEY=sk-your-key-here
-ANTHROPIC_API_KEY=sk-ant-your-key-here
-# Model names (optional - sensible defaults set in config.py)
-# ANTHROPIC_MODEL=claude-sonnet-4-5-20250929
-# OPENAI_MODEL=gpt-5.1
-# ============== EMBEDDINGS ==============
-# OpenAI Embedding Model (used if LLM_PROVIDER is openai and performing RAG/Embeddings)
-OPENAI_EMBEDDING_MODEL=text-embedding-3-small
-# Local Embedding Model (used for local/offline embeddings)
-LOCAL_EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
-# ============== HUGGINGFACE (FREE TIER) ==============
-# HuggingFace Token - enables Llama 3.1 (best quality free model)
-# Get yours at: https://huggingface.co/settings/tokens
-#
-# WITHOUT HF_TOKEN: Falls back to ungated models (zephyr-7b-beta)
-# WITH HF_TOKEN: Uses Llama 3.1 8B Instruct (requires accepting license)
-#
-# For HuggingFace Spaces deployment:
-#   Set this as a "Secret" in Space Settings -> Variables and secrets
-#   Users/judges don't need their own token - the Space secret is used
-#
-HF_TOKEN=hf_your-token-here
-# ============== AGENT CONFIGURATION ==============
-MAX_ITERATIONS=10
-SEARCH_TIMEOUT=30
-LOG_LEVEL=INFO
-# ============== EXTERNAL SERVICES ==============
-# PubMed (optional - higher rate limits)
-NCBI_API_KEY=your-ncbi-key-here
-# Vector Database (optional - for LlamaIndex RAG)
-CHROMA_DB_PATH=./chroma_db

.github/README.md DELETED Viewed

@@ -1,57 +0,0 @@
-> [!IMPORTANT]
-> **You are reading the Github README!**
->
-> - 📚 **Documentation**: See our [technical documentation](https://deepcritical.github.io/GradioDemo/) for detailed information
-> - 📖 **Demo README**: Check out the [Demo README](..README.md) for  for more information about our MCP Hackathon submission
-> - 🏆 **Hackathon Submission**: Keep reading below for more information about our MCP Hackathon submission
-<div align="center">
-[![GitHub](https://img.shields.io/github/stars/DeepCritical/GradioDemo?style=for-the-badge&logo=github&logoColor=white&label=🐙%20GitHub&labelColor=181717&color=181717)](https://github.com/DeepCritical/GradioDemo)
-[![Documentation](https://img.shields.io/badge/Docs-0080FF?style=for-the-badge&logo=readthedocs&logoColor=white&labelColor=0080FF&color=0080FF)](deepcritical.github.io/GradioDemo/)
-[![Demo](https://img.shields.io/badge/🚀%20Demo-FFD21E?style=for-the-badge&logo=huggingface&logoColor=white&labelColor=FFD21E&color=FFD21E)](https://huggingface.co/spaces/DataQuests/DeepCritical)
-[![codecov](https://codecov.io/gh/DeepCritical/GradioDemo/graph/badge.svg?token=B1f05RCGpz)](https://codecov.io/gh/DeepCritical/GradioDemo)
-[![Join us on Discord](https://img.shields.io/discord/1109943800132010065?label=Discord&logo=discord&style=flat-square)](https://discord.gg/qdfnvSPcqP)
-</div>
-## Quick Start
-### 1. Environment Setup
-```bash
-# Install uv if you haven't already
-pip install uv
-# Sync dependencies
-uv sync --all-extras
-```
-### 2. Run the UI
-```bash
-# Start the Gradio app
-gradio run "src/app.py"
-```
-Open your browser to `http://localhost:7860`.
-### 3. Connect via MCP
-This application exposes a Model Context Protocol (MCP) server, allowing you to use its search tools directly from Claude Desktop or other MCP clients.
-**MCP Server URL**: `http://localhost:7860/gradio_api/mcp/`
-**Claude Desktop Configuration**:
-Add this to your `claude_desktop_config.json`:
-```json
-{
-  "mcpServers": {
-    "deepcritical": {
-      "url": "http://localhost:7860/gradio_api/mcp/"
-    }
-  }
-}
-```

.github/workflows/ci.yml DELETED Viewed

@@ -1,127 +0,0 @@
-name: CI
-on:
-  push:
-    branches: [main, dev, develop]
-  pull_request:
-    branches: [main, dev, develop]
-jobs:
-  test:
-    runs-on: ubuntu-latest
-    strategy:
-      matrix:
-        python-version: ["3.11"]
-    steps:
-      - uses: actions/checkout@v4
-      - name: Set up Python ${{ matrix.python-version }}
-        uses: actions/setup-python@v5
-        with:
-          python-version: ${{ matrix.python-version }}
-      - name: Install dependencies
-        run: |
-          python -m pip install --upgrade pip
-          pip install -e ".[dev]"
-      - name: Lint with ruff
-        run: |
-          ruff check . --exclude tests
-          ruff format --check . --exclude tests
-        continue-on-error: true
-      - name: Type check with mypy
-        run: |
-          mypy src
-        continue-on-error: true
-      - name: Install embedding dependencies
-        run: |
-          pip install -e ".[embeddings]"
-      - name: Run unit tests (excluding OpenAI and embedding providers)
-        env:
-          HF_TOKEN: ${{ secrets.HF_TOKEN }}
-        run: |
-          pytest tests/unit/ -v -m "not openai and not embedding_provider" --tb=short -p no:logfire --cov --cov-branch --cov-report=xml --cov-report=term
-      - name: Run local embeddings tests
-        env:
-          HF_TOKEN: ${{ secrets.HF_TOKEN }}
-        run: |
-          pytest tests/ -v -m "local_embeddings" --tb=short -p no:logfire --cov --cov-branch --cov-report=xml --cov-report=term --cov-append || true
-        continue-on-error: true  # Allow failures if dependencies not available
-      - name: Run HuggingFace integration tests
-        env:
-          HF_TOKEN: ${{ secrets.HF_TOKEN }}
-        run: |
-          pytest tests/integration/ -v -m "huggingface and not embedding_provider" --tb=short -p no:logfire --cov --cov-branch --cov-report=xml --cov-report=term --cov-append || true
-        continue-on-error: true  # Allow failures if HF_TOKEN not set
-      - name: Run non-OpenAI integration tests (excluding embedding providers)
-        env:
-          HF_TOKEN: ${{ secrets.HF_TOKEN }}
-        run: |
-          pytest tests/integration/ -v -m "integration and not openai and not embedding_provider" --tb=short -p no:logfire --cov --cov-branch --cov-report=xml --cov-report=term --cov-append || true
-        continue-on-error: true  # Allow failures if dependencies not available
-      - name: Upload coverage reports to Codecov
-        uses: codecov/codecov-action@v5
-        with:
-          token: ${{ secrets.CODECOV_TOKEN }}
-          slug: DeepCritical/GradioDemo
-          files: ./coverage.xml
-          fail_ci_if_error: false
-        continue-on-error: true
-  docs:
-    runs-on: ubuntu-latest
-    permissions:
-      contents: write
-    if: github.event_name == 'push' && (github.ref == 'refs/heads/main' || github.ref == 'refs/heads/dev' || github.ref == 'refs/heads/develop')
-    steps:
-      - uses: actions/checkout@v4
-        with:
-          fetch-depth: 0
-      - name: Set up Python
-        uses: actions/setup-python@v5
-        with:
-          python-version: '3.11'
-      - name: Install uv
-        uses: astral-sh/setup-uv@v5
-        with:
-          version: "latest"
-      - name: Install dependencies
-        run: |
-          uv sync --extra dev
-      - name: Configure Git
-        run: |
-          git config user.name "github-actions[bot]"
-          git config user.email "github-actions[bot]@users.noreply.github.com"
-          git remote set-url origin https://x-access-token:${{ secrets.GITHUB_TOKEN }}@github.com/${{ github.repository }}.git
-      - name: Deploy to GitHub Pages
-        run: |
-          # mkdocs gh-deploy automatically creates .nojekyll, but let's verify
-          uv run mkdocs gh-deploy --force --message "Deploy docs [skip ci]" --strict
-          # Verify .nojekyll was created in gh-pages branch
-          git fetch origin gh-pages:gh-pages || true
-          git checkout gh-pages || true
-          if [ -f .nojekyll ]; then
-            echo "✓ .nojekyll file exists"
-          else
-            echo "⚠ .nojekyll file missing, creating it..."
-            touch .nojekyll
-            git add .nojekyll
-            git commit -m "Add .nojekyll to disable Jekyll [skip ci]" || true
-            git push origin gh-pages || true
-          fi
-        env:
-          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

.gitignore DELETED Viewed

@@ -1,80 +0,0 @@
-folder/
-.cursor/
-.ruff_cache/
-# Python
-__pycache__/
-*.py[cod]
-*$py.class
-*.so
-.Python
-build/
-develop-eggs/
-dist/
-downloads/
-eggs/
-.eggs/
-lib/
-lib64/
-parts/
-sdist/
-var/
-wheels/
-*.egg-info/
-.installed.cfg
-*.egg
-# Virtual environments
-.venv/
-venv/
-ENV/
-env/
-# IDE
-.vscode/
-.idea/
-*.swp
-*.swo
-# Environment
-.env
-.env.local
-*.local
-# Claude
-.claude/
-# Burner docs (working drafts, not for commit)
-burner_docs/
-# Reference repos (clone locally, don't commit)
-reference_repos/autogen-microsoft/
-reference_repos/claude-agent-sdk/
-reference_repos/pydanticai-research-agent/
-reference_repos/pubmed-mcp-server/
-reference_repos/DeepCritical/
-# Keep the README in reference_repos
-!reference_repos/README.md
-# OS
-.DS_Store
-Thumbs.db
-# Logs
-*.log
-logs/
-# Testing
-.pytest_cache/
-.mypy_cache/
-.coverage
-htmlcov/
-# Database files
-chroma_db/
-*.sqlite3
-# Development directory (personal notes and planning)
-dev/
-# Trigger rebuild Wed Nov 26 17:51:41 EST 2025

.pre-commit-config.yaml DELETED Viewed

@@ -1,64 +0,0 @@
-repos:
-  - repo: https://github.com/astral-sh/ruff-pre-commit
-    rev: v0.4.4
-    hooks:
-      - id: ruff
-        args: [--fix, --exclude, tests]
-        exclude: ^reference_repos/
-      - id: ruff-format
-        args: [--exclude, tests]
-        exclude: ^reference_repos/
-  - repo: https://github.com/pre-commit/mirrors-mypy
-    rev: v1.10.0
-    hooks:
-      - id: mypy
-        files: ^src/
-        exclude: ^folder
-        additional_dependencies:
-          - pydantic>=2.7
-          - pydantic-settings>=2.2
-          - tenacity>=8.2
-          - pydantic-ai>=0.0.16
-        args: [--ignore-missing-imports]
-  - repo: local
-    hooks:
-      - id: pytest-unit
-        name: pytest unit tests (no OpenAI)
-        entry: uv
-        language: system
-        types: [python]
-        args: [
-          "run",
-          "pytest",
-          "tests/unit/",
-          "-v",
-          "-m",
-          "not openai and not embedding_provider",
-          "--tb=short",
-          "-p",
-          "no:logfire",
-        ]
-        pass_filenames: false
-        always_run: true
-        require_serial: false
-      - id: pytest-local-embeddings
-        name: pytest local embeddings tests
-        entry: uv
-        language: system
-        types: [python]
-        args: [
-          "run",
-          "pytest",
-          "tests/",
-          "-v",
-          "-m",
-          "local_embeddings",
-          "--tb=short",
-          "-p",
-          "no:logfire",
-        ]
-        pass_filenames: false
-        always_run: true
-        require_serial: false

.pre-commit-hooks/run_pytest.ps1 DELETED Viewed

@@ -1,19 +0,0 @@
-# PowerShell pytest runner for pre-commit (Windows)
-# Uses uv if available, otherwise falls back to pytest
-if (Get-Command uv -ErrorAction SilentlyContinue) {
-    # Sync dependencies before running tests
-    uv sync
-    uv run pytest $args
-} else {
-    Write-Warning "uv not found, using system pytest (may have missing dependencies)"
-    pytest $args
-}

.pre-commit-hooks/run_pytest.sh DELETED Viewed

@@ -1,20 +0,0 @@
-#!/bin/bash
-# Cross-platform pytest runner for pre-commit
-# Uses uv if available, otherwise falls back to pytest
-if command -v uv >/dev/null 2>&1; then
-    # Sync dependencies before running tests
-    uv sync
-    uv run pytest "$@"
-else
-    echo "Warning: uv not found, using system pytest (may have missing dependencies)"
-    pytest "$@"
-fi

.pre-commit-hooks/run_pytest_embeddings.ps1 DELETED Viewed

@@ -1,14 +0,0 @@
-# PowerShell wrapper to sync embeddings dependencies and run embeddings tests
-$ErrorActionPreference = "Stop"
-if (Get-Command uv -ErrorAction SilentlyContinue) {
-    Write-Host "Syncing embeddings dependencies..."
-    uv sync --extra embeddings
-    Write-Host "Running embeddings tests..."
-    uv run pytest tests/ -v -m local_embeddings --tb=short -p no:logfire
-} else {
-    Write-Error "uv not found"
-    exit 1
-}

.pre-commit-hooks/run_pytest_embeddings.sh DELETED Viewed

@@ -1,15 +0,0 @@
-#!/bin/bash
-# Wrapper script to sync embeddings dependencies and run embeddings tests
-set -e
-if command -v uv >/dev/null 2>&1; then
-    echo "Syncing embeddings dependencies..."
-    uv sync --extra embeddings
-    echo "Running embeddings tests..."
-    uv run pytest tests/ -v -m local_embeddings --tb=short -p no:logfire
-else
-    echo "Error: uv not found"
-    exit 1
-fi

.pre-commit-hooks/run_pytest_unit.ps1 DELETED Viewed

@@ -1,14 +0,0 @@
-# PowerShell wrapper to sync dependencies and run unit tests
-$ErrorActionPreference = "Stop"
-if (Get-Command uv -ErrorAction SilentlyContinue) {
-    Write-Host "Syncing dependencies..."
-    uv sync
-    Write-Host "Running unit tests..."
-    uv run pytest tests/unit/ -v -m "not openai and not embedding_provider" --tb=short -p no:logfire
-} else {
-    Write-Error "uv not found"
-    exit 1
-}

.pre-commit-hooks/run_pytest_unit.sh DELETED Viewed

@@ -1,15 +0,0 @@
-#!/bin/bash
-# Wrapper script to sync dependencies and run unit tests
-set -e
-if command -v uv >/dev/null 2>&1; then
-    echo "Syncing dependencies..."
-    uv sync
-    echo "Running unit tests..."
-    uv run pytest tests/unit/ -v -m "not openai and not embedding_provider" --tb=short -p no:logfire
-else
-    echo "Error: uv not found"
-    exit 1
-fi

.pre-commit-hooks/run_pytest_with_sync.ps1 DELETED Viewed

@@ -1,25 +0,0 @@
-# PowerShell wrapper for pytest runner
-# Ensures uv is available and runs the Python script
-param(
-    [Parameter(Position=0)]
-    [string]$TestType = "unit"
-)
-$ErrorActionPreference = "Stop"
-# Check if uv is available
-if (-not (Get-Command uv -ErrorAction SilentlyContinue)) {
-    Write-Error "uv not found. Please install uv: https://github.com/astral-sh/uv"
-    exit 1
-}
-# Get the script directory
-$ScriptDir = Split-Path -Parent $MyInvocation.MyCommand.Path
-$PythonScript = Join-Path $ScriptDir "run_pytest_with_sync.py"
-# Run the Python script using uv
-uv run python $PythonScript $TestType
-exit $LASTEXITCODE

.pre-commit-hooks/run_pytest_with_sync.py DELETED Viewed

@@ -1,235 +0,0 @@
-#!/usr/bin/env python3
-"""Cross-platform pytest runner that syncs dependencies before running tests."""
-import shutil
-import subprocess
-import sys
-from pathlib import Path
-def clean_caches(project_root: Path) -> None:
-    """Remove pytest and Python cache directories and files.
-    Comprehensively removes all cache files and directories to ensure
-    clean test runs. Only scans specific directories to avoid resource
-    exhaustion from scanning large directories like .venv on Windows.
-    """
-    # Directories to scan for caches (only project code, not dependencies)
-    scan_dirs = ["src", "tests", ".pre-commit-hooks"]
-    # Directories to exclude (to avoid resource issues)
-    exclude_dirs = {
-        ".venv",
-        "venv",
-        "ENV",
-        "env",
-        ".git",
-        "node_modules",
-        "dist",
-        "build",
-        ".eggs",
-        "reference_repos",
-        "folder",
-    }
-    # Comprehensive list of cache patterns to remove
-    cache_patterns = [
-        ".pytest_cache",
-        "__pycache__",
-        "*.pyc",
-        "*.pyo",
-        "*.pyd",
-        ".mypy_cache",
-        ".ruff_cache",
-        ".coverage",
-        "coverage.xml",
-        "htmlcov",
-        ".hypothesis",  # Hypothesis testing framework cache
-        ".tox",  # Tox cache (if used)
-        ".cache",  # General Python cache
-    ]
-    def should_exclude(path: Path) -> bool:
-        """Check if a path should be excluded from cache cleanup."""
-        # Check if any parent directory is in exclude list
-        for parent in path.parents:
-            if parent.name in exclude_dirs:
-                return True
-        # Check if the path itself is excluded
-        if path.name in exclude_dirs:
-            return True
-        return False
-    cleaned = []
-    # Only scan specific directories to avoid resource exhaustion
-    for scan_dir in scan_dirs:
-        scan_path = project_root / scan_dir
-        if not scan_path.exists():
-            continue
-        for pattern in cache_patterns:
-            if "*" in pattern:
-                # Handle glob patterns for files
-                try:
-                    for cache_file in scan_path.rglob(pattern):
-                        if should_exclude(cache_file):
-                            continue
-                        try:
-                            if cache_file.is_file():
-                                cache_file.unlink()
-                                cleaned.append(str(cache_file.relative_to(project_root)))
-                        except OSError:
-                            pass  # Ignore errors (file might be locked or already deleted)
-                except OSError:
-                    pass  # Ignore errors during directory traversal
-            else:
-                # Handle directory patterns
-                try:
-                    for cache_dir in scan_path.rglob(pattern):
-                        if should_exclude(cache_dir):
-                            continue
-                        try:
-                            if cache_dir.is_dir():
-                                shutil.rmtree(cache_dir, ignore_errors=True)
-                                cleaned.append(str(cache_dir.relative_to(project_root)))
-                        except OSError:
-                            pass  # Ignore errors (directory might be locked)
-                except OSError:
-                    pass  # Ignore errors during directory traversal
-    # Also clean root-level caches (like .pytest_cache in project root)
-    root_cache_patterns = [
-        ".pytest_cache",
-        ".mypy_cache",
-        ".ruff_cache",
-        ".coverage",
-        "coverage.xml",
-        "htmlcov",
-        ".hypothesis",
-        ".tox",
-        ".cache",
-        ".pytest",
-    ]
-    for pattern in root_cache_patterns:
-        cache_path = project_root / pattern
-        if cache_path.exists():
-            try:
-                if cache_path.is_dir():
-                    shutil.rmtree(cache_path, ignore_errors=True)
-                elif cache_path.is_file():
-                    cache_path.unlink()
-                cleaned.append(pattern)
-            except OSError:
-                pass
-    # Also remove any .pyc files in root directory
-    try:
-        for pyc_file in project_root.glob("*.pyc"):
-            try:
-                pyc_file.unlink()
-                cleaned.append(pyc_file.name)
-            except OSError:
-                pass
-    except OSError:
-        pass
-    if cleaned:
-        print(
-            f"Cleaned {len(cleaned)} cache items: {', '.join(cleaned[:10])}{'...' if len(cleaned) > 10 else ''}"
-        )
-    else:
-        print("No cache files found to clean")
-def run_command(
-    cmd: list[str], check: bool = True, shell: bool = False, cwd: str | None = None
-) -> int:
-    """Run a command and return exit code."""
-    try:
-        result = subprocess.run(
-            cmd,
-            check=check,
-            shell=shell,
-            cwd=cwd,
-            env=None,  # Use current environment, uv will handle venv
-        )
-        return result.returncode
-    except subprocess.CalledProcessError as e:
-        return e.returncode
-    except FileNotFoundError:
-        print(f"Error: Command not found: {cmd[0]}")
-        return 1
-def main() -> int:
-    """Main entry point."""
-    import os
-    # Get the project root (where pyproject.toml is)
-    script_dir = Path(__file__).parent
-    project_root = script_dir.parent
-    # Change to project root to ensure uv works correctly
-    os.chdir(project_root)
-    # Clean caches before running tests
-    print("Cleaning pytest and Python caches...")
-    clean_caches(project_root)
-    # Check if uv is available
-    if run_command(["uv", "--version"], check=False) != 0:
-        print("Error: uv not found. Please install uv: https://github.com/astral-sh/uv")
-        return 1
-    # Parse arguments
-    test_type = sys.argv[1] if len(sys.argv) > 1 else "unit"
-    extra_args = sys.argv[2:] if len(sys.argv) > 2 else []
-    # Sync dependencies - always include dev
-    # Note: embeddings dependencies are now in main dependencies, not optional
-    # Use --extra dev for [project.optional-dependencies].dev (not --dev which is for [dependency-groups])
-    sync_cmd = ["uv", "sync", "--extra", "dev"]
-    print(f"Syncing dependencies for {test_type} tests...")
-    if run_command(sync_cmd, cwd=project_root) != 0:
-        return 1
-    # Build pytest command - use uv run to ensure correct environment
-    if test_type == "unit":
-        pytest_args = [
-            "tests/unit/",
-            "-v",
-            "-m",
-            "not openai and not embedding_provider",
-            "--tb=short",
-            "-p",
-            "no:logfire",
-            "--cache-clear",  # Clear pytest cache before running
-        ]
-    elif test_type == "embeddings":
-        pytest_args = [
-            "tests/",
-            "-v",
-            "-m",
-            "local_embeddings",
-            "--tb=short",
-            "-p",
-            "no:logfire",
-            "--cache-clear",  # Clear pytest cache before running
-        ]
-    else:
-        pytest_args = []
-    pytest_args.extend(extra_args)
-    # Use uv run python -m pytest to ensure we use the venv's pytest
-    # This is more reliable than uv run pytest which might find system pytest
-    pytest_cmd = ["uv", "run", "python", "-m", "pytest", *pytest_args]
-    print(f"Running {test_type} tests...")
-    return run_command(pytest_cmd, cwd=project_root)
-if __name__ == "__main__":
-    sys.exit(main())

.python-version DELETED Viewed

	@@ -1 +0,0 @@
1	- 3.11

AGENTS.txt DELETED Viewed

@@ -1,236 +0,0 @@
-# DeepCritical Project - Rules
-## Project-Wide Rules
-**Architecture**: Multi-agent research system using Pydantic AI for agent orchestration, supporting iterative and deep research patterns. Uses middleware for state management, budget tracking, and workflow coordination.
-**Type Safety**: ALWAYS use complete type hints. All functions must have parameter and return type annotations. Use `mypy --strict` compliance. Use `TYPE_CHECKING` imports for circular dependencies: `from typing import TYPE_CHECKING; if TYPE_CHECKING: from src.services.embeddings import EmbeddingService`
-**Async Patterns**: ALL I/O operations must be async (`async def`, `await`). Use `asyncio.gather()` for parallel operations. CPU-bound work must use `run_in_executor()`: `loop = asyncio.get_running_loop(); result = await loop.run_in_executor(None, cpu_bound_function, args)`. Never block the event loop.
-**Error Handling**: Use custom exceptions from `src/utils/exceptions.py`: `DeepCriticalError`, `SearchError`, `RateLimitError`, `JudgeError`, `ConfigurationError`. Always chain exceptions: `raise SearchError(...) from e`. Log with structlog: `logger.error("Operation failed", error=str(e), context=value)`.
-**Logging**: Use `structlog` for ALL logging (NOT `print` or `logging`). Import: `import structlog; logger = structlog.get_logger()`. Log with structured data: `logger.info("event", key=value)`. Use appropriate levels: DEBUG, INFO, WARNING, ERROR.
-**Pydantic Models**: All data exchange uses Pydantic models from `src/utils/models.py`. Models are frozen (`model_config = {"frozen": True}`) for immutability. Use `Field()` with descriptions. Validate with `ge=`, `le=`, `min_length=`, `max_length=` constraints.
-**Code Style**: Ruff with 100-char line length. Ignore rules: `PLR0913` (too many arguments), `PLR0912` (too many branches), `PLR0911` (too many returns), `PLR2004` (magic values), `PLW0603` (global statement), `PLC0415` (lazy imports).
-**Docstrings**: Google-style docstrings for all public functions. Include Args, Returns, Raises sections. Use type hints in docstrings only if needed for clarity.
-**Testing**: Unit tests in `tests/unit/` (mocked, fast). Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`). Use `respx` for httpx mocking, `pytest-mock` for general mocking.
-**State Management**: Use `ContextVar` in middleware for thread-safe isolation. Never use global mutable state (except singletons via `@lru_cache`). Use `WorkflowState` from `src/middleware/state_machine.py` for workflow state.
-**Citation Validation**: ALWAYS validate references before returning reports. Use `validate_references()` from `src/utils/citation_validator.py`. Remove hallucinated citations. Log warnings for removed citations.
----
-## src/agents/ - Agent Implementation Rules
-**Pattern**: All agents use Pydantic AI `Agent` class. Agents have structured output types (Pydantic models) or return strings. Use factory functions in `src/agent_factory/agents.py` for creation.
-**Agent Structure**:
-- System prompt as module-level constant (with date injection: `datetime.now().strftime("%Y-%m-%d")`)
-- Agent class with `__init__(model: Any | None = None)`
-- Main method (e.g., `async def evaluate()`, `async def write_report()`)
-- Factory function: `def create_agent_name(model: Any | None = None) -> AgentName`
-**Model Initialization**: Use `get_model()` from `src/agent_factory/judges.py` if no model provided. Support OpenAI/Anthropic/HF Inference via settings.
-**Error Handling**: Return fallback values (e.g., `KnowledgeGapOutput(research_complete=False, outstanding_gaps=[...])`) on failure. Log errors with context. Use retry logic (3 retries) in Pydantic AI Agent initialization.
-**Input Validation**: Validate query/inputs are not empty. Truncate very long inputs with warnings. Handle None values gracefully.
-**Output Types**: Use structured output types from `src/utils/models.py` (e.g., `KnowledgeGapOutput`, `AgentSelectionPlan`, `ReportDraft`). For text output (writer agents), return `str` directly.
-**Agent-Specific Rules**:
-- `knowledge_gap.py`: Outputs `KnowledgeGapOutput`. Evaluates research completeness.
-- `tool_selector.py`: Outputs `AgentSelectionPlan`. Selects tools (RAG/web/database).
-- `writer.py`: Returns markdown string. Includes citations in numbered format.
-- `long_writer.py`: Uses `ReportDraft` input/output. Handles section-by-section writing.
-- `proofreader.py`: Takes `ReportDraft`, returns polished markdown.
-- `thinking.py`: Returns observation string from conversation history.
-- `input_parser.py`: Outputs `ParsedQuery` with research mode detection.
----
-## src/tools/ - Search Tool Rules
-**Protocol**: All tools implement `SearchTool` protocol from `src/tools/base.py`: `name` property and `async def search(query, max_results) -> list[Evidence]`.
-**Rate Limiting**: Use `@retry` decorator from tenacity: `@retry(stop=stop_after_attempt(3), wait=wait_exponential(...))`. Implement `_rate_limit()` method for APIs with limits. Use shared rate limiters from `src/tools/rate_limiter.py`.
-**Error Handling**: Raise `SearchError` or `RateLimitError` on failures. Handle HTTP errors (429, 500, timeout). Return empty list on non-critical errors (log warning).
-**Query Preprocessing**: Use `preprocess_query()` from `src/tools/query_utils.py` to remove noise and expand synonyms.
-**Evidence Conversion**: Convert API responses to `Evidence` objects with `Citation`. Extract metadata (title, url, date, authors). Set relevance scores (0.0-1.0). Handle missing fields gracefully.
-**Tool-Specific Rules**:
-- `pubmed.py`: Use NCBI E-utilities (ESearch → EFetch). Rate limit: 0.34s between requests. Parse XML with `xmltodict`. Handle single vs. multiple articles.
-- `clinicaltrials.py`: Use `requests` library (NOT httpx - WAF blocks httpx). Run in thread pool: `await asyncio.to_thread(requests.get, ...)`. Filter: Only interventional studies, active/completed.
-- `europepmc.py`: Handle preprint markers: `[PREPRINT - Not peer-reviewed]`. Build URLs from DOI or PMID.
-- `rag_tool.py`: Wraps `LlamaIndexRAGService`. Returns Evidence from RAG results. Handles ingestion.
-- `search_handler.py`: Orchestrates parallel searches across multiple tools. Uses `asyncio.gather()` with `return_exceptions=True`. Aggregates results into `SearchResult`.
----
-## src/middleware/ - Middleware Rules
-**State Management**: Use `ContextVar` for thread-safe isolation. `WorkflowState` uses `ContextVar[WorkflowState | None]`. Initialize with `init_workflow_state(embedding_service)`. Access with `get_workflow_state()` (auto-initializes if missing).
-**WorkflowState**: Tracks `evidence: list[Evidence]`, `conversation: Conversation`, `embedding_service: Any`. Methods: `add_evidence()` (deduplicates by URL), `async search_related()` (semantic search).
-**WorkflowManager**: Manages parallel research loops. Methods: `add_loop()`, `run_loops_parallel()`, `update_loop_status()`, `sync_loop_evidence_to_state()`. Uses `asyncio.gather()` for parallel execution. Handles errors per loop (don't fail all if one fails).
-**BudgetTracker**: Tracks tokens, time, iterations per loop and globally. Methods: `create_budget()`, `add_tokens()`, `start_timer()`, `update_timer()`, `increment_iteration()`, `check_budget()`, `can_continue()`. Token estimation: `estimate_tokens(text)` (~4 chars per token), `estimate_llm_call_tokens(prompt, response)`.
-**Models**: All middleware models in `src/utils/models.py`. `IterationData`, `Conversation`, `ResearchLoop`, `BudgetStatus` are used by middleware.
----
-## src/orchestrator/ - Orchestration Rules
-**Research Flows**: Two patterns: `IterativeResearchFlow` (single loop) and `DeepResearchFlow` (plan → parallel loops → synthesis). Both support agent chains (`use_graph=False`) and graph execution (`use_graph=True`).
-**IterativeResearchFlow**: Pattern: Generate observations → Evaluate gaps → Select tools → Execute → Judge → Continue/Complete. Uses `KnowledgeGapAgent`, `ToolSelectorAgent`, `ThinkingAgent`, `WriterAgent`, `JudgeHandler`. Tracks iterations, time, budget.
-**DeepResearchFlow**: Pattern: Planner → Parallel iterative loops per section → Synthesizer. Uses `PlannerAgent`, `IterativeResearchFlow` (per section), `LongWriterAgent` or `ProofreaderAgent`. Uses `WorkflowManager` for parallel execution.
-**Graph Orchestrator**: Uses Pydantic AI Graphs (when available) or agent chains (fallback). Routes based on research mode (iterative/deep/auto). Streams `AgentEvent` objects for UI.
-**State Initialization**: Always call `init_workflow_state()` before running flows. Initialize `BudgetTracker` per loop. Use `WorkflowManager` for parallel coordination.
-**Event Streaming**: Yield `AgentEvent` objects during execution. Event types: "started", "search_complete", "judge_complete", "hypothesizing", "synthesizing", "complete", "error". Include iteration numbers and data payloads.
----
-## src/services/ - Service Rules
-**EmbeddingService**: Local sentence-transformers (NO API key required). All operations async-safe via `run_in_executor()`. ChromaDB for vector storage. Deduplication threshold: 0.85 (85% similarity = duplicate).
-**LlamaIndexRAGService**: Uses OpenAI embeddings (requires `OPENAI_API_KEY`). Methods: `ingest_evidence()`, `retrieve()`, `query()`. Returns documents with metadata (source, title, url, date, authors). Lazy initialization with graceful fallback.
-**StatisticalAnalyzer**: Generates Python code via LLM. Executes in Modal sandbox (secure, isolated). Library versions pinned in `SANDBOX_LIBRARIES` dict. Returns `AnalysisResult` with verdict (SUPPORTED/REFUTED/INCONCLUSIVE).
-**Singleton Pattern**: Use `@lru_cache(maxsize=1)` for singletons: `@lru_cache(maxsize=1); def get_service() -> Service: return Service()`. Lazy initialization to avoid requiring dependencies at import time.
----
-## src/utils/ - Utility Rules
-**Models**: All Pydantic models in `src/utils/models.py`. Use frozen models (`model_config = {"frozen": True}`) except where mutation needed. Use `Field()` with descriptions. Validate with constraints.
-**Config**: Settings via Pydantic Settings (`src/utils/config.py`). Load from `.env` automatically. Use `settings` singleton: `from src.utils.config import settings`. Validate API keys with properties: `has_openai_key`, `has_anthropic_key`.
-**Exceptions**: Custom exception hierarchy in `src/utils/exceptions.py`. Base: `DeepCriticalError`. Specific: `SearchError`, `RateLimitError`, `JudgeError`, `ConfigurationError`. Always chain exceptions.
-**LLM Factory**: Centralized LLM model creation in `src/utils/llm_factory.py`. Supports OpenAI, Anthropic, HF Inference. Use `get_model()` or factory functions. Check requirements before initialization.
-**Citation Validator**: Use `validate_references()` from `src/utils/citation_validator.py`. Removes hallucinated citations (URLs not in evidence). Logs warnings. Returns validated report string.
----
-## src/orchestrator_factory.py Rules
-**Purpose**: Factory for creating orchestrators. Supports "simple" (legacy) and "advanced" (magentic) modes. Auto-detects mode based on API key availability.
-**Pattern**: Lazy import for optional dependencies (`_get_magentic_orchestrator_class()`). Handles `ImportError` gracefully with clear error messages.
-**Mode Detection**: `_determine_mode()` checks explicit mode or auto-detects: "advanced" if `settings.has_openai_key`, else "simple". Maps "magentic" → "advanced".
-**Function Signature**: `create_orchestrator(search_handler, judge_handler, config, mode) -> Any`. Simple mode requires handlers. Advanced mode uses MagenticOrchestrator.
-**Error Handling**: Raise `ValueError` with clear messages if requirements not met. Log mode selection with structlog.
----
-## src/orchestrator_hierarchical.py Rules
-**Purpose**: Hierarchical orchestrator using middleware and sub-teams. Adapts Magentic ChatAgent to SubIterationTeam protocol.
-**Pattern**: Uses `SubIterationMiddleware` with `ResearchTeam` and `LLMSubIterationJudge`. Event-driven via callback queue.
-**State Initialization**: Initialize embedding service with graceful fallback. Use `init_magentic_state()` (deprecated, but kept for compatibility).
-**Event Streaming**: Uses `asyncio.Queue` for event coordination. Yields `AgentEvent` objects. Handles event callback pattern with `asyncio.wait()`.
-**Error Handling**: Log errors with context. Yield error events. Process remaining events after task completion.
----
-## src/orchestrator_magentic.py Rules
-**Purpose**: Magentic-based orchestrator using ChatAgent pattern. Each agent has internal LLM. Manager orchestrates agents.
-**Pattern**: Uses `MagenticBuilder` with participants (searcher, hypothesizer, judge, reporter). Manager uses `OpenAIChatClient`. Workflow built in `_build_workflow()`.
-**Event Processing**: `_process_event()` converts Magentic events to `AgentEvent`. Handles: `MagenticOrchestratorMessageEvent`, `MagenticAgentMessageEvent`, `MagenticFinalResultEvent`, `MagenticAgentDeltaEvent`, `WorkflowOutputEvent`.
-**Text Extraction**: `_extract_text()` defensively extracts text from messages. Priority: `.content` → `.text` → `str(message)`. Handles buggy message objects.
-**State Initialization**: Initialize embedding service with graceful fallback. Use `init_magentic_state()` (deprecated).
-**Requirements**: Must call `check_magentic_requirements()` in `__init__`. Requires `agent-framework-core` and OpenAI API key.
-**Event Types**: Maps agent names to event types: "search" → "search_complete", "judge" → "judge_complete", "hypothes" → "hypothesizing", "report" → "synthesizing".
----
-## src/agent_factory/ - Factory Rules
-**Pattern**: Factory functions for creating agents and handlers. Lazy initialization for optional dependencies. Support OpenAI/Anthropic/HF Inference.
-**Judges**: `create_judge_handler()` creates `JudgeHandler` with structured output (`JudgeAssessment`). Supports `MockJudgeHandler`, `HFInferenceJudgeHandler` as fallbacks.
-**Agents**: Factory functions in `agents.py` for all Pydantic AI agents. Pattern: `create_agent_name(model: Any | None = None) -> AgentName`. Use `get_model()` if model not provided.
-**Graph Builder**: `graph_builder.py` contains utilities for building research graphs. Supports iterative and deep research graph construction.
-**Error Handling**: Raise `ConfigurationError` if required API keys missing. Log agent creation. Handle import errors gracefully.
----
-## src/prompts/ - Prompt Rules
-**Pattern**: System prompts stored as module-level constants. Include date injection: `datetime.now().strftime("%Y-%m-%d")`. Format evidence with truncation (1500 chars per item).
-**Judge Prompts**: In `judge.py`. Handle empty evidence case separately. Always request structured JSON output.
-**Hypothesis Prompts**: In `hypothesis.py`. Use diverse evidence selection (MMR algorithm). Sentence-aware truncation.
-**Report Prompts**: In `report.py`. Include full citation details. Use diverse evidence selection (n=20). Emphasize citation validation rules.
----
-## Testing Rules
-**Structure**: Unit tests in `tests/unit/` (mocked, fast). Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`).
-**Mocking**: Use `respx` for httpx mocking. Use `pytest-mock` for general mocking. Mock LLM calls in unit tests (use `MockJudgeHandler`).
-**Fixtures**: Common fixtures in `tests/conftest.py`: `mock_httpx_client`, `mock_llm_response`.
-**Coverage**: Aim for >80% coverage. Test error handling, edge cases, and integration paths.
----
-## File-Specific Agent Rules
-**knowledge_gap.py**: Outputs `KnowledgeGapOutput`. System prompt evaluates research completeness. Handles conversation history. Returns fallback on error.
-**writer.py**: Returns markdown string. System prompt includes citation format examples. Validates inputs. Truncates long findings. Retry logic for transient failures.
-**long_writer.py**: Uses `ReportDraft` input/output. Writes sections iteratively. Reformats references (deduplicates, renumbers). Reformats section headings.
-**proofreader.py**: Takes `ReportDraft`, returns polished markdown. Removes duplicates. Adds summary. Preserves references.
-**tool_selector.py**: Outputs `AgentSelectionPlan`. System prompt lists available agents (WebSearchAgent, SiteCrawlerAgent, RAGAgent). Guidelines for when to use each.
-**thinking.py**: Returns observation string. Generates observations from conversation history. Uses query and background context.
-**input_parser.py**: Outputs `ParsedQuery`. Detects research mode (iterative/deep). Extracts entities and research questions. Improves/refines query.

CONTRIBUTING.md DELETED Viewed

	@@ -1 +0,0 @@
1	- make sure you run the full pre-commit checks before opening a PR (not draft) otherwise Obstacle is the Way will loose his mind

Dockerfile DELETED Viewed

@@ -1,52 +0,0 @@
-# Dockerfile for DeepCritical
-FROM python:3.11-slim
-# Set working directory
-WORKDIR /app
-# Install system dependencies (curl needed for HEALTHCHECK)
-RUN apt-get update && apt-get install -y \
-    git \
-    curl \
-    && rm -rf /var/lib/apt/lists/*
-# Install uv
-RUN pip install uv==0.5.4
-# Copy project files
-COPY pyproject.toml .
-COPY uv.lock .
-COPY src/ src/
-COPY README.md .
-# Install runtime dependencies only (no dev/test tools)
-RUN uv sync --frozen --no-dev --extra embeddings --extra magentic
-# Create non-root user BEFORE downloading models
-RUN useradd --create-home --shell /bin/bash appuser
-# Set cache directory for HuggingFace models (must be writable by appuser)
-ENV HF_HOME=/app/.cache
-ENV TRANSFORMERS_CACHE=/app/.cache
-# Create cache dir with correct ownership
-RUN mkdir -p /app/.cache && chown -R appuser:appuser /app/.cache
-# Pre-download the embedding model during build (as appuser to set correct ownership)
-USER appuser
-RUN uv run python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('all-MiniLM-L6-v2')"
-# Expose port
-EXPOSE 7860
-# Health check
-HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
-    CMD curl -f http://localhost:7860/ || exit 1
-# Set environment variables
-ENV GRADIO_SERVER_NAME=0.0.0.0
-ENV GRADIO_SERVER_PORT=7860
-ENV PYTHONPATH=/app
-# Run the app
-CMD ["uv", "run", "python", "-m", "src.app"]

Makefile DELETED Viewed

@@ -1,42 +0,0 @@
-.PHONY: install test lint format typecheck check clean all cov cov-html
-# Default target
-all: check
-install:
-	uv sync --all-extras
-	uv run pre-commit install
-test:
-	uv run pytest tests/unit/ -v -m "not openai" -p no:logfire
-test-hf:
-	uv run pytest tests/ -v -m "huggingface" -p no:logfire
-test-all:
-	uv run pytest tests/ -v -p no:logfire
-# Coverage aliases
-cov: test-cov
-test-cov:
-	uv run pytest --cov=src --cov-report=term-missing -m "not openai" -p no:logfire
-cov-html:
-	uv run pytest --cov=src --cov-report=html -p no:logfire
-	@echo "Coverage report: open htmlcov/index.html"
-lint:
-	uv run ruff check src tests
-format:
-	uv run ruff format src tests
-typecheck:
-	uv run mypy src
-check: lint typecheck test-cov
-	@echo "All checks passed!"
-clean:
-	rm -rf .pytest_cache .mypy_cache .ruff_cache __pycache__ .coverage htmlcov
-	find . -type d -name "__pycache__" -exec rm -rf {} + 2>/dev/null || true

README.md CHANGED Viewed

@@ -1,120 +1,15 @@
 ---
-title: Critical Deep Resarch
-emoji: 🐉
-colorFrom: red
-colorTo: yellow
 sdk: gradio
-sdk_version: "6.0.1"
-python_version: "3.11"
 app_file: src/app.py
-hf_oauth: true
-hf_oauth_expiration_minutes: 480
-hf_oauth_scopes:
- - inference-api
-pinned: true
 license: mit
-tags:
-  - mcp-in-action-track-enterprise
-  - mcp-hackathon
-  - drug-repurposing
-  - biomedical-ai
-  - pydantic-ai
-  - llamaindex
-  - modal
 ---
-> [!IMPORTANT]
-> **You are reading the Gradio Demo README!**
->
-> - 📚 **Documentation**: See our [technical documentation](deepcritical.github.io/GradioDemo/) for detailed information
-> - 📖 **Complete README**: Check out the [full README](.github/README.md) for setup, configuration, and contribution guidelines
-> - 🏆 **Hackathon Submission**: Keep reading below for more information about our MCP Hackathon submission
-<div align="center">
-[![GitHub](https://img.shields.io/github/stars/DeepCritical/GradioDemo?style=for-the-badge&logo=github&logoColor=white&label=GitHub&labelColor=181717&color=181717)](https://github.com/DeepCritical/GradioDemo)
-[![Documentation](https://img.shields.io/badge/Docs-0080FF?style=for-the-badge&logo=readthedocs&logoColor=white&labelColor=0080FF&color=0080FF)](deepcritical.github.io/GradioDemo/)
-[![Demo](https://img.shields.io/badge/Demo-FFD21E?style=for-the-badge&logo=huggingface&logoColor=white&labelColor=FFD21E&color=FFD21E)](https://huggingface.co/spaces/DataQuests/DeepCritical)
-[![codecov](https://codecov.io/gh/DeepCritical/GradioDemo/graph/badge.svg?token=B1f05RCGpz)](https://codecov.io/gh/DeepCritical/GradioDemo)
-[![Join us on Discord](https://img.shields.io/discord/1109943800132010065?label=Discord&logo=discord&style=flat-square)](https://discord.gg/qdfnvSPcqP)
-</div>
-# DeepCritical
-## About
-The [Deep Critical Gradio Hackathon Team](### Team) met online in the Alzheimer's Critical Literature Review Group in the Hugging Science initiative. We're building the agent framework we want to use for ai assisted research to [turn the vast amounts of clinical data into cures](https://github.com/DeepCritical/GradioDemo).
-For this hackathon we're proposing a simple yet powerful Deep Research Agent that iteratively looks for the answer until it finds it using general purpose websearch and special purpose retrievers for technical retrievers.
-## Deep Critical In the Medial
-- Social Medial Posts about Deep Critical :
-  -
-  -
-  -
-  -
-  -
-  -
-  -
-## Important information
-- **[readme](.github\README.md)**: configure, deploy , contribute and learn more here.
-- **[docs](deepcritical.github.io/GradioDemo/)**: want to know how all this works ? read our detailed technical documentation here.
-- **[demo](https://huggingface/spaces/DataQuests/DeepCritical)**: Try our demo on huggingface
-- **[team](### Team)**: Join us , or follow us !
-- **[video]**: See our demo video
-## Future Developments
-- [] Apply Deep Research Systems To Generate Short Form Video (up to 5 minutes)
-- [] Visualize Pydantic Graphs as Loading Screens in the UI
-- [] Improve Data Science with more Complex Graph Agents
-- [] Create Deep Critical Drug Reporposing / Discovery Demo
-- [] Create Deep Critical Literal Review
-- [] Create Deep Critical Hypothesis Generator
-- [] Create PyPi Package
-## Completed
-- [x] **Multi-Source Search**: PubMed, ClinicalTrials.gov, bioRxiv/medRxiv
-- [x] **MCP Integration**: Use our tools from Claude Desktop or any MCP client
-- [x] **HuggingFace OAuth**: Sign in with HuggingFace
-- [x] **Modal Sandbox**: Secure execution of AI-generated statistical code
-- [x] **LlamaIndex RAG**: Semantic search and evidence synthesis
-- [x] **HuggingfaceInference**:
-- [x] **HuggingfaceMCP Custom Config To Use Community Tools**:
-- [x] **Strongly Typed Composable Graphs**:
-- [x] **Specialized Research Teams of Agents**:
-### Team
-- ZJ
-- MarioAderman
-- Josephrp
-## Acknowledgements
-- McSwaggins
-- Magentic
-- Huggingface
-- Gradio
-- DeepCritical
-- Sponsors
-- Microsoft
-- Pydantic
-- Llama-index
-- Anthhropic/MCP
-- List of Tools Makers
-## Links
-[![GitHub](https://img.shields.io/github/stars/DeepCritical/GradioDemo?style=for-the-badge&logo=github&logoColor=white&label=🐙%20GitHub&labelColor=181717&color=181717)](https://github.com/DeepCritical/GradioDemo)
-[![Documentation](https://img.shields.io/badge/📚%20Docs-0080FF?style=for-the-badge&logo=readthedocs&logoColor=white&labelColor=0080FF&color=0080FF)](deepcritical.github.io/GradioDemo/)
-[![Demo](https://img.shields.io/badge/🚀%20Demo-FFD21E?style=for-the-badge&logo=huggingface&logoColor=white&labelColor=FFD21E&color=FFD21E)](https://huggingface.co/spaces/DataQuests/DeepCritical)
-[![codecov](https://codecov.io/gh/DeepCritical/GradioDemo/graph/badge.svg?token=B1f05RCGpz)](https://codecov.io/gh/DeepCritical/GradioDemo)
-[![Join us on Discord](https://img.shields.io/discord/1109943800132010065?label=Discord&logo=discord&style=flat-square)](https://discord.gg/qdfnvSPcqP)

 ---
+title: DeepCritical
+emoji: 📈
+colorFrom: blue
+colorTo: purple
 sdk: gradio
+sdk_version: 6.0.0
 app_file: src/app.py
+pinned: false
 license: mit
+short_description: Deep Search for Critical Research [BigData] -> [Actionable]
 ---
+### DeepCritical

dev/.cursorrules DELETED Viewed

@@ -1,241 +0,0 @@
-# DeepCritical Project - Cursor Rules
-## Project-Wide Rules
-**Architecture**: Multi-agent research system using Pydantic AI for agent orchestration, supporting iterative and deep research patterns. Uses middleware for state management, budget tracking, and workflow coordination.
-**Type Safety**: ALWAYS use complete type hints. All functions must have parameter and return type annotations. Use `mypy --strict` compliance. Use `TYPE_CHECKING` imports for circular dependencies: `from typing import TYPE_CHECKING; if TYPE_CHECKING: from src.services.embeddings import EmbeddingService`
-**Async Patterns**: ALL I/O operations must be async (`async def`, `await`). Use `asyncio.gather()` for parallel operations. CPU-bound work must use `run_in_executor()`: `loop = asyncio.get_running_loop(); result = await loop.run_in_executor(None, cpu_bound_function, args)`. Never block the event loop.
-**Error Handling**: Use custom exceptions from `src/utils/exceptions.py`: `DeepCriticalError`, `SearchError`, `RateLimitError`, `JudgeError`, `ConfigurationError`. Always chain exceptions: `raise SearchError(...) from e`. Log with structlog: `logger.error("Operation failed", error=str(e), context=value)`.
-**Logging**: Use `structlog` for ALL logging (NOT `print` or `logging`). Import: `import structlog; logger = structlog.get_logger()`. Log with structured data: `logger.info("event", key=value)`. Use appropriate levels: DEBUG, INFO, WARNING, ERROR.
-**Pydantic Models**: All data exchange uses Pydantic models from `src/utils/models.py`. Models are frozen (`model_config = {"frozen": True}`) for immutability. Use `Field()` with descriptions. Validate with `ge=`, `le=`, `min_length=`, `max_length=` constraints.
-**Code Style**: Ruff with 100-char line length. Ignore rules: `PLR0913` (too many arguments), `PLR0912` (too many branches), `PLR0911` (too many returns), `PLR2004` (magic values), `PLW0603` (global statement), `PLC0415` (lazy imports).
-**Docstrings**: Google-style docstrings for all public functions. Include Args, Returns, Raises sections. Use type hints in docstrings only if needed for clarity.
-**Testing**: Unit tests in `tests/unit/` (mocked, fast). Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`). Use `respx` for httpx mocking, `pytest-mock` for general mocking.
-**State Management**: Use `ContextVar` in middleware for thread-safe isolation. Never use global mutable state (except singletons via `@lru_cache`). Use `WorkflowState` from `src/middleware/state_machine.py` for workflow state.
-**Citation Validation**: ALWAYS validate references before returning reports. Use `validate_references()` from `src/utils/citation_validator.py`. Remove hallucinated citations. Log warnings for removed citations.
----
-## src/agents/ - Agent Implementation Rules
-**Pattern**: All agents use Pydantic AI `Agent` class. Agents have structured output types (Pydantic models) or return strings. Use factory functions in `src/agent_factory/agents.py` for creation.
-**Agent Structure**:
-- System prompt as module-level constant (with date injection: `datetime.now().strftime("%Y-%m-%d")`)
-- Agent class with `__init__(model: Any | None = None)`
-- Main method (e.g., `async def evaluate()`, `async def write_report()`)
-- Factory function: `def create_agent_name(model: Any | None = None) -> AgentName`
-**Model Initialization**: Use `get_model()` from `src/agent_factory/judges.py` if no model provided. Support OpenAI/Anthropic/HF Inference via settings.
-**Error Handling**: Return fallback values (e.g., `KnowledgeGapOutput(research_complete=False, outstanding_gaps=[...])`) on failure. Log errors with context. Use retry logic (3 retries) in Pydantic AI Agent initialization.
-**Input Validation**: Validate query/inputs are not empty. Truncate very long inputs with warnings. Handle None values gracefully.
-**Output Types**: Use structured output types from `src/utils/models.py` (e.g., `KnowledgeGapOutput`, `AgentSelectionPlan`, `ReportDraft`). For text output (writer agents), return `str` directly.
-**Agent-Specific Rules**:
-- `knowledge_gap.py`: Outputs `KnowledgeGapOutput`. Evaluates research completeness.
-- `tool_selector.py`: Outputs `AgentSelectionPlan`. Selects tools (RAG/web/database).
-- `writer.py`: Returns markdown string. Includes citations in numbered format.
-- `long_writer.py`: Uses `ReportDraft` input/output. Handles section-by-section writing.
-- `proofreader.py`: Takes `ReportDraft`, returns polished markdown.
-- `thinking.py`: Returns observation string from conversation history.
-- `input_parser.py`: Outputs `ParsedQuery` with research mode detection.
----
-## src/tools/ - Search Tool Rules
-**Protocol**: All tools implement `SearchTool` protocol from `src/tools/base.py`: `name` property and `async def search(query, max_results) -> list[Evidence]`.
-**Rate Limiting**: Use `@retry` decorator from tenacity: `@retry(stop=stop_after_attempt(3), wait=wait_exponential(...))`. Implement `_rate_limit()` method for APIs with limits. Use shared rate limiters from `src/tools/rate_limiter.py`.
-**Error Handling**: Raise `SearchError` or `RateLimitError` on failures. Handle HTTP errors (429, 500, timeout). Return empty list on non-critical errors (log warning).
-**Query Preprocessing**: Use `preprocess_query()` from `src/tools/query_utils.py` to remove noise and expand synonyms.
-**Evidence Conversion**: Convert API responses to `Evidence` objects with `Citation`. Extract metadata (title, url, date, authors). Set relevance scores (0.0-1.0). Handle missing fields gracefully.
-**Tool-Specific Rules**:
-- `pubmed.py`: Use NCBI E-utilities (ESearch → EFetch). Rate limit: 0.34s between requests. Parse XML with `xmltodict`. Handle single vs. multiple articles.
-- `clinicaltrials.py`: Use `requests` library (NOT httpx - WAF blocks httpx). Run in thread pool: `await asyncio.to_thread(requests.get, ...)`. Filter: Only interventional studies, active/completed.
-- `europepmc.py`: Handle preprint markers: `[PREPRINT - Not peer-reviewed]`. Build URLs from DOI or PMID.
-- `rag_tool.py`: Wraps `LlamaIndexRAGService`. Returns Evidence from RAG results. Handles ingestion.
-- `search_handler.py`: Orchestrates parallel searches across multiple tools. Uses `asyncio.gather()` with `return_exceptions=True`. Aggregates results into `SearchResult`.
----
-## src/middleware/ - Middleware Rules
-**State Management**: Use `ContextVar` for thread-safe isolation. `WorkflowState` uses `ContextVar[WorkflowState | None]`. Initialize with `init_workflow_state(embedding_service)`. Access with `get_workflow_state()` (auto-initializes if missing).
-**WorkflowState**: Tracks `evidence: list[Evidence]`, `conversation: Conversation`, `embedding_service: Any`. Methods: `add_evidence()` (deduplicates by URL), `async search_related()` (semantic search).
-**WorkflowManager**: Manages parallel research loops. Methods: `add_loop()`, `run_loops_parallel()`, `update_loop_status()`, `sync_loop_evidence_to_state()`. Uses `asyncio.gather()` for parallel execution. Handles errors per loop (don't fail all if one fails).
-**BudgetTracker**: Tracks tokens, time, iterations per loop and globally. Methods: `create_budget()`, `add_tokens()`, `start_timer()`, `update_timer()`, `increment_iteration()`, `check_budget()`, `can_continue()`. Token estimation: `estimate_tokens(text)` (~4 chars per token), `estimate_llm_call_tokens(prompt, response)`.
-**Models**: All middleware models in `src/utils/models.py`. `IterationData`, `Conversation`, `ResearchLoop`, `BudgetStatus` are used by middleware.
----
-## src/orchestrator/ - Orchestration Rules
-**Research Flows**: Two patterns: `IterativeResearchFlow` (single loop) and `DeepResearchFlow` (plan → parallel loops → synthesis). Both support agent chains (`use_graph=False`) and graph execution (`use_graph=True`).
-**IterativeResearchFlow**: Pattern: Generate observations → Evaluate gaps → Select tools → Execute → Judge → Continue/Complete. Uses `KnowledgeGapAgent`, `ToolSelectorAgent`, `ThinkingAgent`, `WriterAgent`, `JudgeHandler`. Tracks iterations, time, budget.
-**DeepResearchFlow**: Pattern: Planner → Parallel iterative loops per section → Synthesizer. Uses `PlannerAgent`, `IterativeResearchFlow` (per section), `LongWriterAgent` or `ProofreaderAgent`. Uses `WorkflowManager` for parallel execution.
-**Graph Orchestrator**: Uses Pydantic AI Graphs (when available) or agent chains (fallback). Routes based on research mode (iterative/deep/auto). Streams `AgentEvent` objects for UI.
-**State Initialization**: Always call `init_workflow_state()` before running flows. Initialize `BudgetTracker` per loop. Use `WorkflowManager` for parallel coordination.
-**Event Streaming**: Yield `AgentEvent` objects during execution. Event types: "started", "search_complete", "judge_complete", "hypothesizing", "synthesizing", "complete", "error". Include iteration numbers and data payloads.
----
-## src/services/ - Service Rules
-**EmbeddingService**: Local sentence-transformers (NO API key required). All operations async-safe via `run_in_executor()`. ChromaDB for vector storage. Deduplication threshold: 0.85 (85% similarity = duplicate).
-**LlamaIndexRAGService**: Uses OpenAI embeddings (requires `OPENAI_API_KEY`). Methods: `ingest_evidence()`, `retrieve()`, `query()`. Returns documents with metadata (source, title, url, date, authors). Lazy initialization with graceful fallback.
-**StatisticalAnalyzer**: Generates Python code via LLM. Executes in Modal sandbox (secure, isolated). Library versions pinned in `SANDBOX_LIBRARIES` dict. Returns `AnalysisResult` with verdict (SUPPORTED/REFUTED/INCONCLUSIVE).
-**Singleton Pattern**: Use `@lru_cache(maxsize=1)` for singletons: `@lru_cache(maxsize=1); def get_service() -> Service: return Service()`. Lazy initialization to avoid requiring dependencies at import time.
----
-## src/utils/ - Utility Rules
-**Models**: All Pydantic models in `src/utils/models.py`. Use frozen models (`model_config = {"frozen": True}`) except where mutation needed. Use `Field()` with descriptions. Validate with constraints.
-**Config**: Settings via Pydantic Settings (`src/utils/config.py`). Load from `.env` automatically. Use `settings` singleton: `from src.utils.config import settings`. Validate API keys with properties: `has_openai_key`, `has_anthropic_key`.
-**Exceptions**: Custom exception hierarchy in `src/utils/exceptions.py`. Base: `DeepCriticalError`. Specific: `SearchError`, `RateLimitError`, `JudgeError`, `ConfigurationError`. Always chain exceptions.
-**LLM Factory**: Centralized LLM model creation in `src/utils/llm_factory.py`. Supports OpenAI, Anthropic, HF Inference. Use `get_model()` or factory functions. Check requirements before initialization.
-**Citation Validator**: Use `validate_references()` from `src/utils/citation_validator.py`. Removes hallucinated citations (URLs not in evidence). Logs warnings. Returns validated report string.
----
-## src/orchestrator_factory.py Rules
-**Purpose**: Factory for creating orchestrators. Supports "simple" (legacy) and "advanced" (magentic) modes. Auto-detects mode based on API key availability.
-**Pattern**: Lazy import for optional dependencies (`_get_magentic_orchestrator_class()`). Handles `ImportError` gracefully with clear error messages.
-**Mode Detection**: `_determine_mode()` checks explicit mode or auto-detects: "advanced" if `settings.has_openai_key`, else "simple". Maps "magentic" → "advanced".
-**Function Signature**: `create_orchestrator(search_handler, judge_handler, config, mode) -> Any`. Simple mode requires handlers. Advanced mode uses MagenticOrchestrator.
-**Error Handling**: Raise `ValueError` with clear messages if requirements not met. Log mode selection with structlog.
----
-## src/orchestrator_hierarchical.py Rules
-**Purpose**: Hierarchical orchestrator using middleware and sub-teams. Adapts Magentic ChatAgent to SubIterationTeam protocol.
-**Pattern**: Uses `SubIterationMiddleware` with `ResearchTeam` and `LLMSubIterationJudge`. Event-driven via callback queue.
-**State Initialization**: Initialize embedding service with graceful fallback. Use `init_magentic_state()` (deprecated, but kept for compatibility).
-**Event Streaming**: Uses `asyncio.Queue` for event coordination. Yields `AgentEvent` objects. Handles event callback pattern with `asyncio.wait()`.
-**Error Handling**: Log errors with context. Yield error events. Process remaining events after task completion.
----
-## src/orchestrator_magentic.py Rules
-**Purpose**: Magentic-based orchestrator using ChatAgent pattern. Each agent has internal LLM. Manager orchestrates agents.
-**Pattern**: Uses `MagenticBuilder` with participants (searcher, hypothesizer, judge, reporter). Manager uses `OpenAIChatClient`. Workflow built in `_build_workflow()`.
-**Event Processing**: `_process_event()` converts Magentic events to `AgentEvent`. Handles: `MagenticOrchestratorMessageEvent`, `MagenticAgentMessageEvent`, `MagenticFinalResultEvent`, `MagenticAgentDeltaEvent`, `WorkflowOutputEvent`.
-**Text Extraction**: `_extract_text()` defensively extracts text from messages. Priority: `.content` → `.text` → `str(message)`. Handles buggy message objects.
-**State Initialization**: Initialize embedding service with graceful fallback. Use `init_magentic_state()` (deprecated).
-**Requirements**: Must call `check_magentic_requirements()` in `__init__`. Requires `agent-framework-core` and OpenAI API key.
-**Event Types**: Maps agent names to event types: "search" → "search_complete", "judge" → "judge_complete", "hypothes" → "hypothesizing", "report" → "synthesizing".
----
-## src/agent_factory/ - Factory Rules
-**Pattern**: Factory functions for creating agents and handlers. Lazy initialization for optional dependencies. Support OpenAI/Anthropic/HF Inference.
-**Judges**: `create_judge_handler()` creates `JudgeHandler` with structured output (`JudgeAssessment`). Supports `MockJudgeHandler`, `HFInferenceJudgeHandler` as fallbacks.
-**Agents**: Factory functions in `agents.py` for all Pydantic AI agents. Pattern: `create_agent_name(model: Any | None = None) -> AgentName`. Use `get_model()` if model not provided.
-**Graph Builder**: `graph_builder.py` contains utilities for building research graphs. Supports iterative and deep research graph construction.
-**Error Handling**: Raise `ConfigurationError` if required API keys missing. Log agent creation. Handle import errors gracefully.
----
-## src/prompts/ - Prompt Rules
-**Pattern**: System prompts stored as module-level constants. Include date injection: `datetime.now().strftime("%Y-%m-%d")`. Format evidence with truncation (1500 chars per item).
-**Judge Prompts**: In `judge.py`. Handle empty evidence case separately. Always request structured JSON output.
-**Hypothesis Prompts**: In `hypothesis.py`. Use diverse evidence selection (MMR algorithm). Sentence-aware truncation.
-**Report Prompts**: In `report.py`. Include full citation details. Use diverse evidence selection (n=20). Emphasize citation validation rules.
----
-## Testing Rules
-**Structure**: Unit tests in `tests/unit/` (mocked, fast). Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`).
-**Mocking**: Use `respx` for httpx mocking. Use `pytest-mock` for general mocking. Mock LLM calls in unit tests (use `MockJudgeHandler`).
-**Fixtures**: Common fixtures in `tests/conftest.py`: `mock_httpx_client`, `mock_llm_response`.
-**Coverage**: Aim for >80% coverage. Test error handling, edge cases, and integration paths.
----
-## File-Specific Agent Rules
-**knowledge_gap.py**: Outputs `KnowledgeGapOutput`. System prompt evaluates research completeness. Handles conversation history. Returns fallback on error.
-**writer.py**: Returns markdown string. System prompt includes citation format examples. Validates inputs. Truncates long findings. Retry logic for transient failures.
-**long_writer.py**: Uses `ReportDraft` input/output. Writes sections iteratively. Reformats references (deduplicates, renumbers). Reformats section headings.
-**proofreader.py**: Takes `ReportDraft`, returns polished markdown. Removes duplicates. Adds summary. Preserves references.
-**tool_selector.py**: Outputs `AgentSelectionPlan`. System prompt lists available agents (WebSearchAgent, SiteCrawlerAgent, RAGAgent). Guidelines for when to use each.
-**thinking.py**: Returns observation string. Generates observations from conversation history. Uses query and background context.
-**input_parser.py**: Outputs `ParsedQuery`. Detects research mode (iterative/deep). Extracts entities and research questions. Improves/refines query.

dev/AGENTS.txt DELETED Viewed

@@ -1,236 +0,0 @@
-# DeepCritical Project - Rules
-## Project-Wide Rules
-**Architecture**: Multi-agent research system using Pydantic AI for agent orchestration, supporting iterative and deep research patterns. Uses middleware for state management, budget tracking, and workflow coordination.
-**Type Safety**: ALWAYS use complete type hints. All functions must have parameter and return type annotations. Use `mypy --strict` compliance. Use `TYPE_CHECKING` imports for circular dependencies: `from typing import TYPE_CHECKING; if TYPE_CHECKING: from src.services.embeddings import EmbeddingService`
-**Async Patterns**: ALL I/O operations must be async (`async def`, `await`). Use `asyncio.gather()` for parallel operations. CPU-bound work must use `run_in_executor()`: `loop = asyncio.get_running_loop(); result = await loop.run_in_executor(None, cpu_bound_function, args)`. Never block the event loop.
-**Error Handling**: Use custom exceptions from `src/utils/exceptions.py`: `DeepCriticalError`, `SearchError`, `RateLimitError`, `JudgeError`, `ConfigurationError`. Always chain exceptions: `raise SearchError(...) from e`. Log with structlog: `logger.error("Operation failed", error=str(e), context=value)`.
-**Logging**: Use `structlog` for ALL logging (NOT `print` or `logging`). Import: `import structlog; logger = structlog.get_logger()`. Log with structured data: `logger.info("event", key=value)`. Use appropriate levels: DEBUG, INFO, WARNING, ERROR.
-**Pydantic Models**: All data exchange uses Pydantic models from `src/utils/models.py`. Models are frozen (`model_config = {"frozen": True}`) for immutability. Use `Field()` with descriptions. Validate with `ge=`, `le=`, `min_length=`, `max_length=` constraints.
-**Code Style**: Ruff with 100-char line length. Ignore rules: `PLR0913` (too many arguments), `PLR0912` (too many branches), `PLR0911` (too many returns), `PLR2004` (magic values), `PLW0603` (global statement), `PLC0415` (lazy imports).
-**Docstrings**: Google-style docstrings for all public functions. Include Args, Returns, Raises sections. Use type hints in docstrings only if needed for clarity.
-**Testing**: Unit tests in `tests/unit/` (mocked, fast). Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`). Use `respx` for httpx mocking, `pytest-mock` for general mocking.
-**State Management**: Use `ContextVar` in middleware for thread-safe isolation. Never use global mutable state (except singletons via `@lru_cache`). Use `WorkflowState` from `src/middleware/state_machine.py` for workflow state.
-**Citation Validation**: ALWAYS validate references before returning reports. Use `validate_references()` from `src/utils/citation_validator.py`. Remove hallucinated citations. Log warnings for removed citations.
----
-## src/agents/ - Agent Implementation Rules
-**Pattern**: All agents use Pydantic AI `Agent` class. Agents have structured output types (Pydantic models) or return strings. Use factory functions in `src/agent_factory/agents.py` for creation.
-**Agent Structure**:
-- System prompt as module-level constant (with date injection: `datetime.now().strftime("%Y-%m-%d")`)
-- Agent class with `__init__(model: Any | None = None)`
-- Main method (e.g., `async def evaluate()`, `async def write_report()`)
-- Factory function: `def create_agent_name(model: Any | None = None) -> AgentName`
-**Model Initialization**: Use `get_model()` from `src/agent_factory/judges.py` if no model provided. Support OpenAI/Anthropic/HF Inference via settings.
-**Error Handling**: Return fallback values (e.g., `KnowledgeGapOutput(research_complete=False, outstanding_gaps=[...])`) on failure. Log errors with context. Use retry logic (3 retries) in Pydantic AI Agent initialization.
-**Input Validation**: Validate query/inputs are not empty. Truncate very long inputs with warnings. Handle None values gracefully.
-**Output Types**: Use structured output types from `src/utils/models.py` (e.g., `KnowledgeGapOutput`, `AgentSelectionPlan`, `ReportDraft`). For text output (writer agents), return `str` directly.
-**Agent-Specific Rules**:
-- `knowledge_gap.py`: Outputs `KnowledgeGapOutput`. Evaluates research completeness.
-- `tool_selector.py`: Outputs `AgentSelectionPlan`. Selects tools (RAG/web/database).
-- `writer.py`: Returns markdown string. Includes citations in numbered format.
-- `long_writer.py`: Uses `ReportDraft` input/output. Handles section-by-section writing.
-- `proofreader.py`: Takes `ReportDraft`, returns polished markdown.
-- `thinking.py`: Returns observation string from conversation history.
-- `input_parser.py`: Outputs `ParsedQuery` with research mode detection.
----
-## src/tools/ - Search Tool Rules
-**Protocol**: All tools implement `SearchTool` protocol from `src/tools/base.py`: `name` property and `async def search(query, max_results) -> list[Evidence]`.
-**Rate Limiting**: Use `@retry` decorator from tenacity: `@retry(stop=stop_after_attempt(3), wait=wait_exponential(...))`. Implement `_rate_limit()` method for APIs with limits. Use shared rate limiters from `src/tools/rate_limiter.py`.
-**Error Handling**: Raise `SearchError` or `RateLimitError` on failures. Handle HTTP errors (429, 500, timeout). Return empty list on non-critical errors (log warning).
-**Query Preprocessing**: Use `preprocess_query()` from `src/tools/query_utils.py` to remove noise and expand synonyms.
-**Evidence Conversion**: Convert API responses to `Evidence` objects with `Citation`. Extract metadata (title, url, date, authors). Set relevance scores (0.0-1.0). Handle missing fields gracefully.
-**Tool-Specific Rules**:
-- `pubmed.py`: Use NCBI E-utilities (ESearch → EFetch). Rate limit: 0.34s between requests. Parse XML with `xmltodict`. Handle single vs. multiple articles.
-- `clinicaltrials.py`: Use `requests` library (NOT httpx - WAF blocks httpx). Run in thread pool: `await asyncio.to_thread(requests.get, ...)`. Filter: Only interventional studies, active/completed.
-- `europepmc.py`: Handle preprint markers: `[PREPRINT - Not peer-reviewed]`. Build URLs from DOI or PMID.
-- `rag_tool.py`: Wraps `LlamaIndexRAGService`. Returns Evidence from RAG results. Handles ingestion.
-- `search_handler.py`: Orchestrates parallel searches across multiple tools. Uses `asyncio.gather()` with `return_exceptions=True`. Aggregates results into `SearchResult`.
----
-## src/middleware/ - Middleware Rules
-**State Management**: Use `ContextVar` for thread-safe isolation. `WorkflowState` uses `ContextVar[WorkflowState | None]`. Initialize with `init_workflow_state(embedding_service)`. Access with `get_workflow_state()` (auto-initializes if missing).
-**WorkflowState**: Tracks `evidence: list[Evidence]`, `conversation: Conversation`, `embedding_service: Any`. Methods: `add_evidence()` (deduplicates by URL), `async search_related()` (semantic search).
-**WorkflowManager**: Manages parallel research loops. Methods: `add_loop()`, `run_loops_parallel()`, `update_loop_status()`, `sync_loop_evidence_to_state()`. Uses `asyncio.gather()` for parallel execution. Handles errors per loop (don't fail all if one fails).
-**BudgetTracker**: Tracks tokens, time, iterations per loop and globally. Methods: `create_budget()`, `add_tokens()`, `start_timer()`, `update_timer()`, `increment_iteration()`, `check_budget()`, `can_continue()`. Token estimation: `estimate_tokens(text)` (~4 chars per token), `estimate_llm_call_tokens(prompt, response)`.
-**Models**: All middleware models in `src/utils/models.py`. `IterationData`, `Conversation`, `ResearchLoop`, `BudgetStatus` are used by middleware.
----
-## src/orchestrator/ - Orchestration Rules
-**Research Flows**: Two patterns: `IterativeResearchFlow` (single loop) and `DeepResearchFlow` (plan → parallel loops → synthesis). Both support agent chains (`use_graph=False`) and graph execution (`use_graph=True`).
-**IterativeResearchFlow**: Pattern: Generate observations → Evaluate gaps → Select tools → Execute → Judge → Continue/Complete. Uses `KnowledgeGapAgent`, `ToolSelectorAgent`, `ThinkingAgent`, `WriterAgent`, `JudgeHandler`. Tracks iterations, time, budget.
-**DeepResearchFlow**: Pattern: Planner → Parallel iterative loops per section → Synthesizer. Uses `PlannerAgent`, `IterativeResearchFlow` (per section), `LongWriterAgent` or `ProofreaderAgent`. Uses `WorkflowManager` for parallel execution.
-**Graph Orchestrator**: Uses Pydantic AI Graphs (when available) or agent chains (fallback). Routes based on research mode (iterative/deep/auto). Streams `AgentEvent` objects for UI.
-**State Initialization**: Always call `init_workflow_state()` before running flows. Initialize `BudgetTracker` per loop. Use `WorkflowManager` for parallel coordination.
-**Event Streaming**: Yield `AgentEvent` objects during execution. Event types: "started", "search_complete", "judge_complete", "hypothesizing", "synthesizing", "complete", "error". Include iteration numbers and data payloads.
----
-## src/services/ - Service Rules
-**EmbeddingService**: Local sentence-transformers (NO API key required). All operations async-safe via `run_in_executor()`. ChromaDB for vector storage. Deduplication threshold: 0.85 (85% similarity = duplicate).
-**LlamaIndexRAGService**: Uses OpenAI embeddings (requires `OPENAI_API_KEY`). Methods: `ingest_evidence()`, `retrieve()`, `query()`. Returns documents with metadata (source, title, url, date, authors). Lazy initialization with graceful fallback.
-**StatisticalAnalyzer**: Generates Python code via LLM. Executes in Modal sandbox (secure, isolated). Library versions pinned in `SANDBOX_LIBRARIES` dict. Returns `AnalysisResult` with verdict (SUPPORTED/REFUTED/INCONCLUSIVE).
-**Singleton Pattern**: Use `@lru_cache(maxsize=1)` for singletons: `@lru_cache(maxsize=1); def get_service() -> Service: return Service()`. Lazy initialization to avoid requiring dependencies at import time.
----
-## src/utils/ - Utility Rules
-**Models**: All Pydantic models in `src/utils/models.py`. Use frozen models (`model_config = {"frozen": True}`) except where mutation needed. Use `Field()` with descriptions. Validate with constraints.
-**Config**: Settings via Pydantic Settings (`src/utils/config.py`). Load from `.env` automatically. Use `settings` singleton: `from src.utils.config import settings`. Validate API keys with properties: `has_openai_key`, `has_anthropic_key`.
-**Exceptions**: Custom exception hierarchy in `src/utils/exceptions.py`. Base: `DeepCriticalError`. Specific: `SearchError`, `RateLimitError`, `JudgeError`, `ConfigurationError`. Always chain exceptions.
-**LLM Factory**: Centralized LLM model creation in `src/utils/llm_factory.py`. Supports OpenAI, Anthropic, HF Inference. Use `get_model()` or factory functions. Check requirements before initialization.
-**Citation Validator**: Use `validate_references()` from `src/utils/citation_validator.py`. Removes hallucinated citations (URLs not in evidence). Logs warnings. Returns validated report string.
----
-## src/orchestrator_factory.py Rules
-**Purpose**: Factory for creating orchestrators. Supports "simple" (legacy) and "advanced" (magentic) modes. Auto-detects mode based on API key availability.
-**Pattern**: Lazy import for optional dependencies (`_get_magentic_orchestrator_class()`). Handles `ImportError` gracefully with clear error messages.
-**Mode Detection**: `_determine_mode()` checks explicit mode or auto-detects: "advanced" if `settings.has_openai_key`, else "simple". Maps "magentic" → "advanced".
-**Function Signature**: `create_orchestrator(search_handler, judge_handler, config, mode) -> Any`. Simple mode requires handlers. Advanced mode uses MagenticOrchestrator.
-**Error Handling**: Raise `ValueError` with clear messages if requirements not met. Log mode selection with structlog.
----
-## src/orchestrator_hierarchical.py Rules
-**Purpose**: Hierarchical orchestrator using middleware and sub-teams. Adapts Magentic ChatAgent to SubIterationTeam protocol.
-**Pattern**: Uses `SubIterationMiddleware` with `ResearchTeam` and `LLMSubIterationJudge`. Event-driven via callback queue.
-**State Initialization**: Initialize embedding service with graceful fallback. Use `init_magentic_state()` (deprecated, but kept for compatibility).
-**Event Streaming**: Uses `asyncio.Queue` for event coordination. Yields `AgentEvent` objects. Handles event callback pattern with `asyncio.wait()`.
-**Error Handling**: Log errors with context. Yield error events. Process remaining events after task completion.
----
-## src/orchestrator_magentic.py Rules
-**Purpose**: Magentic-based orchestrator using ChatAgent pattern. Each agent has internal LLM. Manager orchestrates agents.
-**Pattern**: Uses `MagenticBuilder` with participants (searcher, hypothesizer, judge, reporter). Manager uses `OpenAIChatClient`. Workflow built in `_build_workflow()`.
-**Event Processing**: `_process_event()` converts Magentic events to `AgentEvent`. Handles: `MagenticOrchestratorMessageEvent`, `MagenticAgentMessageEvent`, `MagenticFinalResultEvent`, `MagenticAgentDeltaEvent`, `WorkflowOutputEvent`.
-**Text Extraction**: `_extract_text()` defensively extracts text from messages. Priority: `.content` → `.text` → `str(message)`. Handles buggy message objects.
-**State Initialization**: Initialize embedding service with graceful fallback. Use `init_magentic_state()` (deprecated).
-**Requirements**: Must call `check_magentic_requirements()` in `__init__`. Requires `agent-framework-core` and OpenAI API key.
-**Event Types**: Maps agent names to event types: "search" → "search_complete", "judge" → "judge_complete", "hypothes" → "hypothesizing", "report" → "synthesizing".
----
-## src/agent_factory/ - Factory Rules
-**Pattern**: Factory functions for creating agents and handlers. Lazy initialization for optional dependencies. Support OpenAI/Anthropic/HF Inference.
-**Judges**: `create_judge_handler()` creates `JudgeHandler` with structured output (`JudgeAssessment`). Supports `MockJudgeHandler`, `HFInferenceJudgeHandler` as fallbacks.
-**Agents**: Factory functions in `agents.py` for all Pydantic AI agents. Pattern: `create_agent_name(model: Any | None = None) -> AgentName`. Use `get_model()` if model not provided.
-**Graph Builder**: `graph_builder.py` contains utilities for building research graphs. Supports iterative and deep research graph construction.
-**Error Handling**: Raise `ConfigurationError` if required API keys missing. Log agent creation. Handle import errors gracefully.
----
-## src/prompts/ - Prompt Rules
-**Pattern**: System prompts stored as module-level constants. Include date injection: `datetime.now().strftime("%Y-%m-%d")`. Format evidence with truncation (1500 chars per item).
-**Judge Prompts**: In `judge.py`. Handle empty evidence case separately. Always request structured JSON output.
-**Hypothesis Prompts**: In `hypothesis.py`. Use diverse evidence selection (MMR algorithm). Sentence-aware truncation.
-**Report Prompts**: In `report.py`. Include full citation details. Use diverse evidence selection (n=20). Emphasize citation validation rules.
----
-## Testing Rules
-**Structure**: Unit tests in `tests/unit/` (mocked, fast). Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`).
-**Mocking**: Use `respx` for httpx mocking. Use `pytest-mock` for general mocking. Mock LLM calls in unit tests (use `MockJudgeHandler`).
-**Fixtures**: Common fixtures in `tests/conftest.py`: `mock_httpx_client`, `mock_llm_response`.
-**Coverage**: Aim for >80% coverage. Test error handling, edge cases, and integration paths.
----
-## File-Specific Agent Rules
-**knowledge_gap.py**: Outputs `KnowledgeGapOutput`. System prompt evaluates research completeness. Handles conversation history. Returns fallback on error.
-**writer.py**: Returns markdown string. System prompt includes citation format examples. Validates inputs. Truncates long findings. Retry logic for transient failures.
-**long_writer.py**: Uses `ReportDraft` input/output. Writes sections iteratively. Reformats references (deduplicates, renumbers). Reformats section headings.
-**proofreader.py**: Takes `ReportDraft`, returns polished markdown. Removes duplicates. Adds summary. Preserves references.
-**tool_selector.py**: Outputs `AgentSelectionPlan`. System prompt lists available agents (WebSearchAgent, SiteCrawlerAgent, RAGAgent). Guidelines for when to use each.
-**thinking.py**: Returns observation string. Generates observations from conversation history. Uses query and background context.
-**input_parser.py**: Outputs `ParsedQuery`. Detects research mode (iterative/deep). Extracts entities and research questions. Improves/refines query.

dev/Makefile DELETED Viewed

@@ -1,51 +0,0 @@
-.PHONY: install test lint format typecheck check clean all cov cov-html
-# Default target
-all: check
-install:
-	uv sync --all-extras
-	uv run pre-commit install
-test:
-	uv run pytest tests/unit/ -v -m "not openai" -p no:logfire
-test-hf:
-	uv run pytest tests/ -v -m "huggingface" -p no:logfire
-test-all:
-	uv run pytest tests/ -v -p no:logfire
-# Coverage aliases
-cov: test-cov
-test-cov:
-	uv run pytest --cov=src --cov-report=term-missing -m "not openai" -p no:logfire
-cov-html:
-	uv run pytest --cov=src --cov-report=html -p no:logfire
-	@echo "Coverage report: open htmlcov/index.html"
-lint:
-	uv run ruff check src tests
-format:
-	uv run ruff format src tests
-typecheck:
-	uv run mypy src
-check: lint typecheck test-cov
-	@echo "All checks passed!"
-docs-build:
-	uv run mkdocs build
-docs-serve:
-	uv run mkdocs serve
-docs-clean:
-	rm -rf site/
-clean:
-	rm -rf .pytest_cache .mypy_cache .ruff_cache __pycache__ .coverage htmlcov
-	find . -type d -name "__pycache__" -exec rm -rf {} + 2>/dev/null || true

dev/docs_plugins.py DELETED Viewed

@@ -1,74 +0,0 @@
-"""Custom MkDocs extension to handle code anchor format: ```start:end:filepath"""
-import re
-from pathlib import Path
-from markdown import Markdown
-from markdown.extensions import Extension
-from markdown.preprocessors import Preprocessor
-class CodeAnchorPreprocessor(Preprocessor):
-    """Preprocess code blocks with anchor format: ```start:end:filepath"""
-    def __init__(self, md: Markdown, base_path: Path):
-        super().__init__(md)
-        self.base_path = base_path
-        self.pattern = re.compile(r"^```(\d+):(\d+):([^\n]+)\n(.*?)```$", re.MULTILINE | re.DOTALL)
-    def run(self, lines: list[str]) -> list[str]:
-        """Process lines and convert code anchor format to standard code blocks."""
-        text = "\n".join(lines)
-        new_text = self.pattern.sub(self._replace_code_anchor, text)
-        return new_text.split("\n")
-    def _replace_code_anchor(self, match) -> str:
-        """Replace code anchor format with standard code block + link."""
-        start_line = int(match.group(1))
-        end_line = int(match.group(2))
-        file_path = match.group(3).strip()
-        existing_code = match.group(4)
-        # Determine language from file extension
-        ext = Path(file_path).suffix.lower()
-        lang_map = {
-            ".py": "python",
-            ".js": "javascript",
-            ".ts": "typescript",
-            ".md": "markdown",
-            ".yaml": "yaml",
-            ".yml": "yaml",
-            ".toml": "toml",
-            ".json": "json",
-            ".html": "html",
-            ".css": "css",
-            ".sh": "bash",
-        }
-        language = lang_map.get(ext, "python")
-        # Generate GitHub link
-        repo_url = "https://github.com/DeepCritical/GradioDemo"
-        github_link = f"{repo_url}/blob/main/{file_path}#L{start_line}-L{end_line}"
-        # Return standard code block with source link
-        return (
-            f'[View source: `{file_path}` (lines {start_line}-{end_line})]({github_link}){{: target="_blank" }}\n\n'
-            f"```{language}\n{existing_code}\n```"
-        )
-class CodeAnchorExtension(Extension):
-    """Markdown extension for code anchors."""
-    def __init__(self, base_path: str = ".", **kwargs):
-        super().__init__(**kwargs)
-        self.base_path = Path(base_path)
-    def extendMarkdown(self, md: Markdown):  # noqa: N802
-        """Register the preprocessor."""
-        md.preprocessors.register(CodeAnchorPreprocessor(md, self.base_path), "codeanchor", 25)
-def makeExtension(**kwargs):  # noqa: N802
-    """Create the extension."""
-    return CodeAnchorExtension(**kwargs)

docs/api/agents.md DELETED Viewed

@@ -1,270 +0,0 @@
-# Agents API Reference
-This page documents the API for DeepCritical agents.
-## KnowledgeGapAgent
-**Module**: `src.agents.knowledge_gap`
-**Purpose**: Evaluates research state and identifies knowledge gaps.
-### Methods
-#### `evaluate`
-```python
-async def evaluate(
-    self,
-    query: str,
-    background_context: str,
-    conversation_history: Conversation,
-    iteration: int,
-    time_elapsed_minutes: float,
-    max_time_minutes: float
-) -> KnowledgeGapOutput
-```
-Evaluates research completeness and identifies outstanding knowledge gaps.
-**Parameters**:
-- `query`: Research query string
-- `background_context`: Background context for the query
-- `conversation_history`: Conversation history with previous iterations
-- `iteration`: Current iteration number
-- `time_elapsed_minutes`: Elapsed time in minutes
-- `max_time_minutes`: Maximum time limit in minutes
-**Returns**: `KnowledgeGapOutput` with:
-- `research_complete`: Boolean indicating if research is complete
-- `outstanding_gaps`: List of remaining knowledge gaps
-## ToolSelectorAgent
-**Module**: `src.agents.tool_selector`
-**Purpose**: Selects appropriate tools for addressing knowledge gaps.
-### Methods
-#### `select_tools`
-```python
-async def select_tools(
-    self,
-    query: str,
-    knowledge_gaps: list[str],
-    available_tools: list[str]
-) -> AgentSelectionPlan
-```
-Selects tools for addressing knowledge gaps.
-**Parameters**:
-- `query`: Research query string
-- `knowledge_gaps`: List of knowledge gaps to address
-- `available_tools`: List of available tool names
-**Returns**: `AgentSelectionPlan` with list of `AgentTask` objects.
-## WriterAgent
-**Module**: `src.agents.writer`
-**Purpose**: Generates final reports from research findings.
-### Methods
-#### `write_report`
-```python
-async def write_report(
-    self,
-    query: str,
-    findings: str,
-    output_length: str = "medium",
-    output_instructions: str | None = None
-) -> str
-```
-Generates a markdown report from research findings.
-**Parameters**:
-- `query`: Research query string
-- `findings`: Research findings to include in report
-- `output_length`: Desired output length ("short", "medium", "long")
-- `output_instructions`: Additional instructions for report generation
-**Returns**: Markdown string with numbered citations.
-## LongWriterAgent
-**Module**: `src.agents.long_writer`
-**Purpose**: Long-form report generation with section-by-section writing.
-### Methods
-#### `write_next_section`
-```python
-async def write_next_section(
-    self,
-    query: str,
-    draft: ReportDraft,
-    section_title: str,
-    section_content: str
-) -> LongWriterOutput
-```
-Writes the next section of a long-form report.
-**Parameters**:
-- `query`: Research query string
-- `draft`: Current report draft
-- `section_title`: Title of the section to write
-- `section_content`: Content/guidance for the section
-**Returns**: `LongWriterOutput` with updated draft.
-#### `write_report`
-```python
-async def write_report(
-    self,
-    query: str,
-    report_title: str,
-    report_draft: ReportDraft
-) -> str
-```
-Generates final report from draft.
-**Parameters**:
-- `query`: Research query string
-- `report_title`: Title of the report
-- `report_draft`: Complete report draft
-**Returns**: Final markdown report string.
-## ProofreaderAgent
-**Module**: `src.agents.proofreader`
-**Purpose**: Proofreads and polishes report drafts.
-### Methods
-#### `proofread`
-```python
-async def proofread(
-    self,
-    query: str,
-    report_title: str,
-    report_draft: ReportDraft
-) -> str
-```
-Proofreads and polishes a report draft.
-**Parameters**:
-- `query`: Research query string
-- `report_title`: Title of the report
-- `report_draft`: Report draft to proofread
-**Returns**: Polished markdown string.
-## ThinkingAgent
-**Module**: `src.agents.thinking`
-**Purpose**: Generates observations from conversation history.
-### Methods
-#### `generate_observations`
-```python
-async def generate_observations(
-    self,
-    query: str,
-    background_context: str,
-    conversation_history: Conversation
-) -> str
-```
-Generates observations from conversation history.
-**Parameters**:
-- `query`: Research query string
-- `background_context`: Background context
-- `conversation_history`: Conversation history
-**Returns**: Observation string.
-## InputParserAgent
-**Module**: `src.agents.input_parser`
-**Purpose**: Parses and improves user queries, detects research mode.
-### Methods
-#### `parse_query`
-```python
-async def parse_query(
-    self,
-    query: str
-) -> ParsedQuery
-```
-Parses and improves a user query.
-**Parameters**:
-- `query`: Original query string
-**Returns**: `ParsedQuery` with:
-- `original_query`: Original query string
-- `improved_query`: Refined query string
-- `research_mode`: "iterative" or "deep"
-- `key_entities`: List of key entities
-- `research_questions`: List of research questions
-## Factory Functions
-All agents have factory functions in `src.agent_factory.agents`:
-```python
-def create_knowledge_gap_agent(model: Any | None = None) -> KnowledgeGapAgent
-def create_tool_selector_agent(model: Any | None = None) -> ToolSelectorAgent
-def create_writer_agent(model: Any | None = None) -> WriterAgent
-def create_long_writer_agent(model: Any | None = None) -> LongWriterAgent
-def create_proofreader_agent(model: Any | None = None) -> ProofreaderAgent
-def create_thinking_agent(model: Any | None = None) -> ThinkingAgent
-def create_input_parser_agent(model: Any | None = None) -> InputParserAgent
-```
-**Parameters**:
-- `model`: Optional Pydantic AI model. If None, uses `get_model()` from settings.
-**Returns**: Agent instance.
-## See Also
-- [Architecture - Agents](../architecture/agents.md) - Architecture overview
-- [Models API](models.md) - Data models used by agents

docs/api/models.md DELETED Viewed

@@ -1,248 +0,0 @@
-# Models API Reference
-This page documents the Pydantic models used throughout DeepCritical.
-## Evidence
-**Module**: `src.utils.models`
-**Purpose**: Represents evidence from search results.
-```python
-class Evidence(BaseModel):
-    citation: Citation
-    content: str
-    relevance_score: float = Field(ge=0.0, le=1.0)
-    metadata: dict[str, Any] = Field(default_factory=dict)
-```
-**Fields**:
-- `citation`: Citation information (title, URL, date, authors)
-- `content`: Evidence text content
-- `relevance_score`: Relevance score (0.0-1.0)
-- `metadata`: Additional metadata dictionary
-## Citation
-**Module**: `src.utils.models`
-**Purpose**: Citation information for evidence.
-```python
-class Citation(BaseModel):
-    title: str
-    url: str
-    date: str | None = None
-    authors: list[str] = Field(default_factory=list)
-```
-**Fields**:
-- `title`: Article/trial title
-- `url`: Source URL
-- `date`: Publication date (optional)
-- `authors`: List of authors (optional)
-## KnowledgeGapOutput
-**Module**: `src.utils.models`
-**Purpose**: Output from knowledge gap evaluation.
-```python
-class KnowledgeGapOutput(BaseModel):
-    research_complete: bool
-    outstanding_gaps: list[str] = Field(default_factory=list)
-```
-**Fields**:
-- `research_complete`: Boolean indicating if research is complete
-- `outstanding_gaps`: List of remaining knowledge gaps
-## AgentSelectionPlan
-**Module**: `src.utils.models`
-**Purpose**: Plan for tool/agent selection.
-```python
-class AgentSelectionPlan(BaseModel):
-    tasks: list[AgentTask] = Field(default_factory=list)
-```
-**Fields**:
-- `tasks`: List of agent tasks to execute
-## AgentTask
-**Module**: `src.utils.models`
-**Purpose**: Individual agent task.
-```python
-class AgentTask(BaseModel):
-    agent_name: str
-    query: str
-    context: dict[str, Any] = Field(default_factory=dict)
-```
-**Fields**:
-- `agent_name`: Name of agent to use
-- `query`: Task query
-- `context`: Additional context dictionary
-## ReportDraft
-**Module**: `src.utils.models`
-**Purpose**: Draft structure for long-form reports.
-```python
-class ReportDraft(BaseModel):
-    title: str
-    sections: list[ReportSection] = Field(default_factory=list)
-    references: list[Citation] = Field(default_factory=list)
-```
-**Fields**:
-- `title`: Report title
-- `sections`: List of report sections
-- `references`: List of citations
-## ReportSection
-**Module**: `src.utils.models`
-**Purpose**: Individual section in a report draft.
-```python
-class ReportSection(BaseModel):
-    title: str
-    content: str
-    order: int
-```
-**Fields**:
-- `title`: Section title
-- `content`: Section content
-- `order`: Section order number
-## ParsedQuery
-**Module**: `src.utils.models`
-**Purpose**: Parsed and improved query.
-```python
-class ParsedQuery(BaseModel):
-    original_query: str
-    improved_query: str
-    research_mode: Literal["iterative", "deep"]
-    key_entities: list[str] = Field(default_factory=list)
-    research_questions: list[str] = Field(default_factory=list)
-```
-**Fields**:
-- `original_query`: Original query string
-- `improved_query`: Refined query string
-- `research_mode`: Research mode ("iterative" or "deep")
-- `key_entities`: List of key entities
-- `research_questions`: List of research questions
-## Conversation
-**Module**: `src.utils.models`
-**Purpose**: Conversation history with iterations.
-```python
-class Conversation(BaseModel):
-    iterations: list[IterationData] = Field(default_factory=list)
-```
-**Fields**:
-- `iterations`: List of iteration data
-## IterationData
-**Module**: `src.utils.models`
-**Purpose**: Data for a single iteration.
-```python
-class IterationData(BaseModel):
-    iteration: int
-    observations: str | None = None
-    knowledge_gaps: list[str] = Field(default_factory=list)
-    tool_calls: list[dict[str, Any]] = Field(default_factory=list)
-    findings: str | None = None
-    thoughts: str | None = None
-```
-**Fields**:
-- `iteration`: Iteration number
-- `observations`: Generated observations
-- `knowledge_gaps`: Identified knowledge gaps
-- `tool_calls`: Tool calls made
-- `findings`: Findings from tools
-- `thoughts`: Agent thoughts
-## AgentEvent
-**Module**: `src.utils.models`
-**Purpose**: Event emitted during research execution.
-```python
-class AgentEvent(BaseModel):
-    type: str
-    iteration: int | None = None
-    data: dict[str, Any] = Field(default_factory=dict)
-```
-**Fields**:
-- `type`: Event type (e.g., "started", "search_complete", "complete")
-- `iteration`: Iteration number (optional)
-- `data`: Event data dictionary
-## BudgetStatus
-**Module**: `src.utils.models`
-**Purpose**: Current budget status.
-```python
-class BudgetStatus(BaseModel):
-    tokens_used: int
-    tokens_limit: int
-    time_elapsed_seconds: float
-    time_limit_seconds: float
-    iterations: int
-    iterations_limit: int
-```
-**Fields**:
-- `tokens_used`: Tokens used so far
-- `tokens_limit`: Token limit
-- `time_elapsed_seconds`: Elapsed time in seconds
-- `time_limit_seconds`: Time limit in seconds
-- `iterations`: Current iteration count
-- `iterations_limit`: Iteration limit
-## See Also
-- [Architecture - Agents](../architecture/agents.md) - How models are used
-- [Configuration](../configuration/index.md) - Model configuration

docs/api/orchestrators.md DELETED Viewed

@@ -1,195 +0,0 @@
-# Orchestrators API Reference
-This page documents the API for DeepCritical orchestrators.
-## IterativeResearchFlow
-**Module**: `src.orchestrator.research_flow`
-**Purpose**: Single-loop research with search-judge-synthesize cycles.
-### Methods
-#### `run`
-```python
-async def run(
-    self,
-    query: str,
-    background_context: str = "",
-    max_iterations: int | None = None,
-    max_time_minutes: float | None = None,
-    token_budget: int | None = None
-) -> AsyncGenerator[AgentEvent, None]
-```
-Runs iterative research flow.
-**Parameters**:
-- `query`: Research query string
-- `background_context`: Background context (default: "")
-- `max_iterations`: Maximum iterations (default: from settings)
-- `max_time_minutes`: Maximum time in minutes (default: from settings)
-- `token_budget`: Token budget (default: from settings)
-**Yields**: `AgentEvent` objects for:
-- `started`: Research started
-- `search_complete`: Search completed
-- `judge_complete`: Evidence evaluation completed
-- `synthesizing`: Generating report
-- `complete`: Research completed
-- `error`: Error occurred
-## DeepResearchFlow
-**Module**: `src.orchestrator.research_flow`
-**Purpose**: Multi-section parallel research with planning and synthesis.
-### Methods
-#### `run`
-```python
-async def run(
-    self,
-    query: str,
-    background_context: str = "",
-    max_iterations_per_section: int | None = None,
-    max_time_minutes: float | None = None,
-    token_budget: int | None = None
-) -> AsyncGenerator[AgentEvent, None]
-```
-Runs deep research flow.
-**Parameters**:
-- `query`: Research query string
-- `background_context`: Background context (default: "")
-- `max_iterations_per_section`: Maximum iterations per section (default: from settings)
-- `max_time_minutes`: Maximum time in minutes (default: from settings)
-- `token_budget`: Token budget (default: from settings)
-**Yields**: `AgentEvent` objects for:
-- `started`: Research started
-- `planning`: Creating research plan
-- `looping`: Running parallel research loops
-- `synthesizing`: Synthesizing results
-- `complete`: Research completed
-- `error`: Error occurred
-## GraphOrchestrator
-**Module**: `src.orchestrator.graph_orchestrator`
-**Purpose**: Graph-based execution using Pydantic AI agents as nodes.
-### Methods
-#### `run`
-```python
-async def run(
-    self,
-    query: str,
-    research_mode: str = "auto",
-    use_graph: bool = True
-) -> AsyncGenerator[AgentEvent, None]
-```
-Runs graph-based research orchestration.
-**Parameters**:
-- `query`: Research query string
-- `research_mode`: Research mode ("iterative", "deep", or "auto")
-- `use_graph`: Whether to use graph execution (default: True)
-**Yields**: `AgentEvent` objects during graph execution.
-## Orchestrator Factory
-**Module**: `src.orchestrator_factory`
-**Purpose**: Factory for creating orchestrators.
-### Functions
-#### `create_orchestrator`
-```python
-def create_orchestrator(
-    search_handler: SearchHandlerProtocol,
-    judge_handler: JudgeHandlerProtocol,
-    config: dict[str, Any],
-    mode: str | None = None
-) -> Any
-```
-Creates an orchestrator instance.
-**Parameters**:
-- `search_handler`: Search handler protocol implementation
-- `judge_handler`: Judge handler protocol implementation
-- `config`: Configuration dictionary
-- `mode`: Orchestrator mode ("simple", "advanced", "magentic", or None for auto-detect)
-**Returns**: Orchestrator instance.
-**Raises**:
-- `ValueError`: If requirements not met
-**Modes**:
-- `"simple"`: Legacy orchestrator
-- `"advanced"` or `"magentic"`: Magentic orchestrator (requires OpenAI API key)
-- `None`: Auto-detect based on API key availability
-## MagenticOrchestrator
-**Module**: `src.orchestrator_magentic`
-**Purpose**: Multi-agent coordination using Microsoft Agent Framework.
-### Methods
-#### `run`
-```python
-async def run(
-    self,
-    query: str,
-    max_rounds: int = 15,
-    max_stalls: int = 3
-) -> AsyncGenerator[AgentEvent, None]
-```
-Runs Magentic orchestration.
-**Parameters**:
-- `query`: Research query string
-- `max_rounds`: Maximum rounds (default: 15)
-- `max_stalls`: Maximum stalls before reset (default: 3)
-**Yields**: `AgentEvent` objects converted from Magentic events.
-**Requirements**:
-- `agent-framework-core` package
-- OpenAI API key
-## See Also
-- [Architecture - Orchestrators](../architecture/orchestrators.md) - Architecture overview
-- [Graph Orchestration](../architecture/graph-orchestration.md) - Graph execution details

docs/api/services.md DELETED Viewed

@@ -1,201 +0,0 @@
-# Services API Reference
-This page documents the API for DeepCritical services.
-## EmbeddingService
-**Module**: `src.services.embeddings`
-**Purpose**: Local sentence-transformers for semantic search and deduplication.
-### Methods
-#### `embed`
-```python
-async def embed(self, text: str) -> list[float]
-```
-Generates embedding for a text string.
-**Parameters**:
-- `text`: Text to embed
-**Returns**: Embedding vector as list of floats.
-#### `embed_batch`
-```python
-async def embed_batch(self, texts: list[str]) -> list[list[float]]
-```
-Generates embeddings for multiple texts.
-**Parameters**:
-- `texts`: List of texts to embed
-**Returns**: List of embedding vectors.
-#### `similarity`
-```python
-async def similarity(self, text1: str, text2: str) -> float
-```
-Calculates similarity between two texts.
-**Parameters**:
-- `text1`: First text
-- `text2`: Second text
-**Returns**: Similarity score (0.0-1.0).
-#### `find_duplicates`
-```python
-async def find_duplicates(
-    self,
-    texts: list[str],
-    threshold: float = 0.85
-) -> list[tuple[int, int]]
-```
-Finds duplicate texts based on similarity threshold.
-**Parameters**:
-- `texts`: List of texts to check
-- `threshold`: Similarity threshold (default: 0.85)
-**Returns**: List of (index1, index2) tuples for duplicate pairs.
-### Factory Function
-#### `get_embedding_service`
-```python
-@lru_cache(maxsize=1)
-def get_embedding_service() -> EmbeddingService
-```
-Returns singleton EmbeddingService instance.
-## LlamaIndexRAGService
-**Module**: `src.services.rag`
-**Purpose**: Retrieval-Augmented Generation using LlamaIndex.
-### Methods
-#### `ingest_evidence`
-```python
-async def ingest_evidence(self, evidence: list[Evidence]) -> None
-```
-Ingests evidence into RAG service.
-**Parameters**:
-- `evidence`: List of Evidence objects to ingest
-**Note**: Requires OpenAI API key for embeddings.
-#### `retrieve`
-```python
-async def retrieve(
-    self,
-    query: str,
-    top_k: int = 5
-) -> list[Document]
-```
-Retrieves relevant documents for a query.
-**Parameters**:
-- `query`: Search query string
-- `top_k`: Number of top results to return (default: 5)
-**Returns**: List of Document objects with metadata.
-#### `query`
-```python
-async def query(
-    self,
-    query: str,
-    top_k: int = 5
-) -> str
-```
-Queries RAG service and returns formatted results.
-**Parameters**:
-- `query`: Search query string
-- `top_k`: Number of top results to return (default: 5)
-**Returns**: Formatted query results as string.
-### Factory Function
-#### `get_rag_service`
-```python
-@lru_cache(maxsize=1)
-def get_rag_service() -> LlamaIndexRAGService | None
-```
-Returns singleton LlamaIndexRAGService instance, or None if OpenAI key not available.
-## StatisticalAnalyzer
-**Module**: `src.services.statistical_analyzer`
-**Purpose**: Secure execution of AI-generated statistical code.
-### Methods
-#### `analyze`
-```python
-async def analyze(
-    self,
-    hypothesis: str,
-    evidence: list[Evidence],
-    data_description: str | None = None
-) -> AnalysisResult
-```
-Analyzes a hypothesis using statistical methods.
-**Parameters**:
-- `hypothesis`: Hypothesis to analyze
-- `evidence`: List of Evidence objects
-- `data_description`: Optional data description
-**Returns**: `AnalysisResult` with:
-- `verdict`: SUPPORTED, REFUTED, or INCONCLUSIVE
-- `code`: Generated analysis code
-- `output`: Execution output
-- `error`: Error message if execution failed
-**Note**: Requires Modal credentials for sandbox execution.
-## See Also
-- [Architecture - Services](../architecture/services.md) - Architecture overview
-- [Configuration](../configuration/index.md) - Service configuration

docs/api/tools.md DELETED Viewed

@@ -1,235 +0,0 @@
-# Tools API Reference
-This page documents the API for DeepCritical search tools.
-## SearchTool Protocol
-All tools implement the `SearchTool` protocol:
-```python
-class SearchTool(Protocol):
-    @property
-    def name(self) -> str: ...
-    async def search(
-        self,
-        query: str,
-        max_results: int = 10
-    ) -> list[Evidence]: ...
-```
-## PubMedTool
-**Module**: `src.tools.pubmed`
-**Purpose**: Search peer-reviewed biomedical literature from PubMed.
-### Properties
-#### `name`
-```python
-@property
-def name(self) -> str
-```
-Returns tool name: `"pubmed"`
-### Methods
-#### `search`
-```python
-async def search(
-    self,
-    query: str,
-    max_results: int = 10
-) -> list[Evidence]
-```
-Searches PubMed for articles.
-**Parameters**:
-- `query`: Search query string
-- `max_results`: Maximum number of results to return (default: 10)
-**Returns**: List of `Evidence` objects with PubMed articles.
-**Raises**:
-- `SearchError`: If search fails
-- `RateLimitError`: If rate limit is exceeded
-## ClinicalTrialsTool
-**Module**: `src.tools.clinicaltrials`
-**Purpose**: Search ClinicalTrials.gov for interventional studies.
-### Properties
-#### `name`
-```python
-@property
-def name(self) -> str
-```
-Returns tool name: `"clinicaltrials"`
-### Methods
-#### `search`
-```python
-async def search(
-    self,
-    query: str,
-    max_results: int = 10
-) -> list[Evidence]
-```
-Searches ClinicalTrials.gov for trials.
-**Parameters**:
-- `query`: Search query string
-- `max_results`: Maximum number of results to return (default: 10)
-**Returns**: List of `Evidence` objects with clinical trials.
-**Note**: Only returns interventional studies with status: COMPLETED, ACTIVE_NOT_RECRUITING, RECRUITING, ENROLLING_BY_INVITATION
-**Raises**:
-- `SearchError`: If search fails
-## EuropePMCTool
-**Module**: `src.tools.europepmc`
-**Purpose**: Search Europe PMC for preprints and peer-reviewed articles.
-### Properties
-#### `name`
-```python
-@property
-def name(self) -> str
-```
-Returns tool name: `"europepmc"`
-### Methods
-#### `search`
-```python
-async def search(
-    self,
-    query: str,
-    max_results: int = 10
-) -> list[Evidence]
-```
-Searches Europe PMC for articles and preprints.
-**Parameters**:
-- `query`: Search query string
-- `max_results`: Maximum number of results to return (default: 10)
-**Returns**: List of `Evidence` objects with articles/preprints.
-**Note**: Includes both preprints (marked with `[PREPRINT - Not peer-reviewed]`) and peer-reviewed articles.
-**Raises**:
-- `SearchError`: If search fails
-## RAGTool
-**Module**: `src.tools.rag_tool`
-**Purpose**: Semantic search within collected evidence.
-### Properties
-#### `name`
-```python
-@property
-def name(self) -> str
-```
-Returns tool name: `"rag"`
-### Methods
-#### `search`
-```python
-async def search(
-    self,
-    query: str,
-    max_results: int = 10
-) -> list[Evidence]
-```
-Searches collected evidence using semantic similarity.
-**Parameters**:
-- `query`: Search query string
-- `max_results`: Maximum number of results to return (default: 10)
-**Returns**: List of `Evidence` objects from collected evidence.
-**Note**: Requires evidence to be ingested into RAG service first.
-## SearchHandler
-**Module**: `src.tools.search_handler`
-**Purpose**: Orchestrates parallel searches across multiple tools.
-### Methods
-#### `search`
-```python
-async def search(
-    self,
-    query: str,
-    tools: list[SearchTool] | None = None,
-    max_results_per_tool: int = 10
-) -> SearchResult
-```
-Searches multiple tools in parallel.
-**Parameters**:
-- `query`: Search query string
-- `tools`: List of tools to use (default: all available tools)
-- `max_results_per_tool`: Maximum results per tool (default: 10)
-**Returns**: `SearchResult` with:
-- `evidence`: Aggregated list of evidence
-- `tool_results`: Results per tool
-- `total_count`: Total number of results
-**Note**: Uses `asyncio.gather()` for parallel execution. Handles tool failures gracefully.
-## See Also
-- [Architecture - Tools](../architecture/tools.md) - Architecture overview
-- [Models API](models.md) - Data models used by tools

docs/architecture/agents.md DELETED Viewed

@@ -1,192 +0,0 @@
-# Agents Architecture
-DeepCritical uses Pydantic AI agents for all AI-powered operations. All agents follow a consistent pattern and use structured output types.
-## Agent Pattern
-All agents use the Pydantic AI `Agent` class with the following structure:
-- **System Prompt**: Module-level constant with date injection
-- **Agent Class**: `__init__(model: Any | None = None)`
-- **Main Method**: Async method (e.g., `async def evaluate()`, `async def write_report()`)
-- **Factory Function**: `def create_agent_name(model: Any | None = None) -> AgentName`
-## Model Initialization
-Agents use `get_model()` from `src/agent_factory/judges.py` if no model is provided. This supports:
-- OpenAI models
-- Anthropic models
-- HuggingFace Inference API models
-The model selection is based on the configured `LLM_PROVIDER` in settings.
-## Error Handling
-Agents return fallback values on failure rather than raising exceptions:
-- `KnowledgeGapOutput(research_complete=False, outstanding_gaps=[...])`
-- Empty strings for text outputs
-- Default structured outputs
-All errors are logged with context using structlog.
-## Input Validation
-All agents validate inputs:
-- Check that queries/inputs are not empty
-- Truncate very long inputs with warnings
-- Handle None values gracefully
-## Output Types
-Agents use structured output types from `src/utils/models.py`:
-- `KnowledgeGapOutput`: Research completeness evaluation
-- `AgentSelectionPlan`: Tool selection plan
-- `ReportDraft`: Long-form report structure
-- `ParsedQuery`: Query parsing and mode detection
-For text output (writer agents), agents return `str` directly.
-## Agent Types
-### Knowledge Gap Agent
-**File**: `src/agents/knowledge_gap.py`
-**Purpose**: Evaluates research state and identifies knowledge gaps.
-**Output**: `KnowledgeGapOutput` with:
-- `research_complete`: Boolean indicating if research is complete
-- `outstanding_gaps`: List of remaining knowledge gaps
-**Methods**:
-- `async def evaluate(query, background_context, conversation_history, iteration, time_elapsed_minutes, max_time_minutes) -> KnowledgeGapOutput`
-### Tool Selector Agent
-**File**: `src/agents/tool_selector.py`
-**Purpose**: Selects appropriate tools for addressing knowledge gaps.
-**Output**: `AgentSelectionPlan` with list of `AgentTask` objects.
-**Available Agents**:
-- `WebSearchAgent`: General web search for fresh information
-- `SiteCrawlerAgent`: Research specific entities/companies
-- `RAGAgent`: Semantic search within collected evidence
-### Writer Agent
-**File**: `src/agents/writer.py`
-**Purpose**: Generates final reports from research findings.
-**Output**: Markdown string with numbered citations.
-**Methods**:
-- `async def write_report(query, findings, output_length, output_instructions) -> str`
-**Features**:
-- Validates inputs
-- Truncates very long findings (max 50000 chars) with warning
-- Retry logic for transient failures (3 retries)
-- Citation validation before returning
-### Long Writer Agent
-**File**: `src/agents/long_writer.py`
-**Purpose**: Long-form report generation with section-by-section writing.
-**Input/Output**: Uses `ReportDraft` models.
-**Methods**:
-- `async def write_next_section(query, draft, section_title, section_content) -> LongWriterOutput`
-- `async def write_report(query, report_title, report_draft) -> str`
-**Features**:
-- Writes sections iteratively
-- Aggregates references across sections
-- Reformats section headings and references
-- Deduplicates and renumbers references
-### Proofreader Agent
-**File**: `src/agents/proofreader.py`
-**Purpose**: Proofreads and polishes report drafts.
-**Input**: `ReportDraft`
-**Output**: Polished markdown string
-**Methods**:
-- `async def proofread(query, report_title, report_draft) -> str`
-**Features**:
-- Removes duplicate content across sections
-- Adds executive summary if multiple sections
-- Preserves all references and citations
-- Improves flow and readability
-### Thinking Agent
-**File**: `src/agents/thinking.py`
-**Purpose**: Generates observations from conversation history.
-**Output**: Observation string
-**Methods**:
-- `async def generate_observations(query, background_context, conversation_history) -> str`
-### Input Parser Agent
-**File**: `src/agents/input_parser.py`
-**Purpose**: Parses and improves user queries, detects research mode.
-**Output**: `ParsedQuery` with:
-- `original_query`: Original query string
-- `improved_query`: Refined query string
-- `research_mode`: "iterative" or "deep"
-- `key_entities`: List of key entities
-- `research_questions`: List of research questions
-## Factory Functions
-All agents have factory functions in `src/agent_factory/agents.py`:
-```python
-def create_knowledge_gap_agent(model: Any | None = None) -> KnowledgeGapAgent
-def create_tool_selector_agent(model: Any | None = None) -> ToolSelectorAgent
-def create_writer_agent(model: Any | None = None) -> WriterAgent
-# ... etc
-```
-Factory functions:
-- Use `get_model()` if no model provided
-- Raise `ConfigurationError` if creation fails
-- Log agent creation
-## See Also
-- [Orchestrators](orchestrators.md) - How agents are orchestrated
-- [API Reference - Agents](../api/agents.md) - API documentation
-- [Contributing - Code Style](../contributing/code-style.md) - Development guidelines

docs/architecture/graph-orchestration.md DELETED Viewed

@@ -1,152 +0,0 @@
-# Graph Orchestration Architecture
-## Overview
-Phase 4 implements a graph-based orchestration system for research workflows using Pydantic AI agents as nodes. This enables better parallel execution, conditional routing, and state management compared to simple agent chains.
-## Graph Structure
-### Nodes
-Graph nodes represent different stages in the research workflow:
-1. **Agent Nodes**: Execute Pydantic AI agents
-   - Input: Prompt/query
-   - Output: Structured or unstructured response
-   - Examples: `KnowledgeGapAgent`, `ToolSelectorAgent`, `ThinkingAgent`
-2. **State Nodes**: Update or read workflow state
-   - Input: Current state
-   - Output: Updated state
-   - Examples: Update evidence, update conversation history
-3. **Decision Nodes**: Make routing decisions based on conditions
-   - Input: Current state/results
-   - Output: Next node ID
-   - Examples: Continue research vs. complete research
-4. **Parallel Nodes**: Execute multiple nodes concurrently
-   - Input: List of node IDs
-   - Output: Aggregated results
-   - Examples: Parallel iterative research loops
-### Edges
-Edges define transitions between nodes:
-1. **Sequential Edges**: Always traversed (no condition)
-   - From: Source node
-   - To: Target node
-   - Condition: None (always True)
-2. **Conditional Edges**: Traversed based on condition
-   - From: Source node
-   - To: Target node
-   - Condition: Callable that returns bool
-   - Example: If research complete → go to writer, else → continue loop
-3. **Parallel Edges**: Used for parallel execution branches
-   - From: Parallel node
-   - To: Multiple target nodes
-   - Execution: All targets run concurrently
-## Graph Patterns
-### Iterative Research Graph
-```
-[Input] → [Thinking] → [Knowledge Gap] → [Decision: Complete?]
-                                              ↓ No          ↓ Yes
-                                    [Tool Selector]    [Writer]
-                                              ↓
-                                    [Execute Tools] → [Loop Back]
-```
-### Deep Research Graph
-```
-[Input] → [Planner] → [Parallel Iterative Loops] → [Synthesizer]
-                           ↓         ↓         ↓
-                        [Loop1]  [Loop2]  [Loop3]
-```
-## State Management
-State is managed via `WorkflowState` using `ContextVar` for thread-safe isolation:
-- **Evidence**: Collected evidence from searches
-- **Conversation**: Iteration history (gaps, tool calls, findings, thoughts)
-- **Embedding Service**: For semantic search
-State transitions occur at state nodes, which update the global workflow state.
-## Execution Flow
-1. **Graph Construction**: Build graph from nodes and edges
-2. **Graph Validation**: Ensure graph is valid (no cycles, all nodes reachable)
-3. **Graph Execution**: Traverse graph from entry node
-4. **Node Execution**: Execute each node based on type
-5. **Edge Evaluation**: Determine next node(s) based on edges
-6. **Parallel Execution**: Use `asyncio.gather()` for parallel nodes
-7. **State Updates**: Update state at state nodes
-8. **Event Streaming**: Yield events during execution for UI
-## Conditional Routing
-Decision nodes evaluate conditions and return next node IDs:
-- **Knowledge Gap Decision**: If `research_complete` → writer, else → tool selector
-- **Budget Decision**: If budget exceeded → exit, else → continue
-- **Iteration Decision**: If max iterations → exit, else → continue
-## Parallel Execution
-Parallel nodes execute multiple nodes concurrently:
-- Each parallel branch runs independently
-- Results are aggregated after all branches complete
-- State is synchronized after parallel execution
-- Errors in one branch don't stop other branches
-## Budget Enforcement
-Budget constraints are enforced at decision nodes:
-- **Token Budget**: Track LLM token usage
-- **Time Budget**: Track elapsed time
-- **Iteration Budget**: Track iteration count
-If any budget is exceeded, execution routes to exit node.
-## Error Handling
-Errors are handled at multiple levels:
-1. **Node Level**: Catch errors in individual node execution
-2. **Graph Level**: Handle errors during graph traversal
-3. **State Level**: Rollback state changes on error
-Errors are logged and yield error events for UI.
-## Backward Compatibility
-Graph execution is optional via feature flag:
-- `USE_GRAPH_EXECUTION=true`: Use graph-based execution
-- `USE_GRAPH_EXECUTION=false`: Use agent chain execution (existing)
-This allows gradual migration and fallback if needed.

docs/architecture/graph_orchestration.md DELETED Viewed

@@ -1,235 +0,0 @@
-# Graph Orchestration Architecture
-## Graph Patterns
-### Iterative Research Graph
-```
-[Input] → [Thinking] → [Knowledge Gap] → [Decision: Complete?]
-                                              ↓ No          ↓ Yes
-                                    [Tool Selector]    [Writer]
-                                              ↓
-                                    [Execute Tools] → [Loop Back]
-```
-### Deep Research Graph
-```
-[Input] → [Planner] → [Parallel Iterative Loops] → [Synthesizer]
-                           ↓         ↓         ↓
-                        [Loop1]  [Loop2]  [Loop3]
-```
-### Deep Research
-```mermaid
-sequenceDiagram
-    actor User
-    participant GraphOrchestrator
-    participant InputParser
-    participant GraphBuilder
-    participant GraphExecutor
-    participant Agent
-    participant BudgetTracker
-    participant WorkflowState
-    User->>GraphOrchestrator: run(query)
-    GraphOrchestrator->>InputParser: detect_research_mode(query)
-    InputParser-->>GraphOrchestrator: mode (iterative/deep)
-    GraphOrchestrator->>GraphBuilder: build_graph(mode)
-    GraphBuilder-->>GraphOrchestrator: ResearchGraph
-    GraphOrchestrator->>WorkflowState: init_workflow_state()
-    GraphOrchestrator->>BudgetTracker: create_budget()
-    GraphOrchestrator->>GraphExecutor: _execute_graph(graph)
-    loop For each node in graph
-        GraphExecutor->>Agent: execute_node(agent_node)
-        Agent->>Agent: process_input
-        Agent-->>GraphExecutor: result
-        GraphExecutor->>WorkflowState: update_state(result)
-        GraphExecutor->>BudgetTracker: add_tokens(used)
-        GraphExecutor->>BudgetTracker: check_budget()
-        alt Budget exceeded
-            GraphExecutor->>GraphOrchestrator: emit(error_event)
-        else Continue
-            GraphExecutor->>GraphOrchestrator: emit(progress_event)
-        end
-    end
-    GraphOrchestrator->>User: AsyncGenerator[AgentEvent]
-```
-### Iterative Research
-```mermaid
-sequenceDiagram
-    participant IterativeFlow
-    participant ThinkingAgent
-    participant KnowledgeGapAgent
-    participant ToolSelector
-    participant ToolExecutor
-    participant JudgeHandler
-    participant WriterAgent
-    IterativeFlow->>IterativeFlow: run(query)
-    loop Until complete or max_iterations
-        IterativeFlow->>ThinkingAgent: generate_observations()
-        ThinkingAgent-->>IterativeFlow: observations
-        IterativeFlow->>KnowledgeGapAgent: evaluate_gaps()
-        KnowledgeGapAgent-->>IterativeFlow: KnowledgeGapOutput
-        alt Research complete
-            IterativeFlow->>WriterAgent: create_final_report()
-            WriterAgent-->>IterativeFlow: final_report
-        else Gaps remain
-            IterativeFlow->>ToolSelector: select_agents(gap)
-            ToolSelector-->>IterativeFlow: AgentSelectionPlan
-            IterativeFlow->>ToolExecutor: execute_tool_tasks()
-            ToolExecutor-->>IterativeFlow: ToolAgentOutput[]
-            IterativeFlow->>JudgeHandler: assess_evidence()
-            JudgeHandler-->>IterativeFlow: should_continue
-        end
-    end
-```
-## Graph Structure
-### Nodes
-Graph nodes represent different stages in the research workflow:
-1. **Agent Nodes**: Execute Pydantic AI agents
-   - Input: Prompt/query
-   - Output: Structured or unstructured response
-   - Examples: `KnowledgeGapAgent`, `ToolSelectorAgent`, `ThinkingAgent`
-2. **State Nodes**: Update or read workflow state
-   - Input: Current state
-   - Output: Updated state
-   - Examples: Update evidence, update conversation history
-3. **Decision Nodes**: Make routing decisions based on conditions
-   - Input: Current state/results
-   - Output: Next node ID
-   - Examples: Continue research vs. complete research
-4. **Parallel Nodes**: Execute multiple nodes concurrently
-   - Input: List of node IDs
-   - Output: Aggregated results
-   - Examples: Parallel iterative research loops
-### Edges
-Edges define transitions between nodes:
-1. **Sequential Edges**: Always traversed (no condition)
-   - From: Source node
-   - To: Target node
-   - Condition: None (always True)
-2. **Conditional Edges**: Traversed based on condition
-   - From: Source node
-   - To: Target node
-   - Condition: Callable that returns bool
-   - Example: If research complete → go to writer, else → continue loop
-3. **Parallel Edges**: Used for parallel execution branches
-   - From: Parallel node
-   - To: Multiple target nodes
-   - Execution: All targets run concurrently
-## State Management
-State is managed via `WorkflowState` using `ContextVar` for thread-safe isolation:
-- **Evidence**: Collected evidence from searches
-- **Conversation**: Iteration history (gaps, tool calls, findings, thoughts)
-- **Embedding Service**: For semantic search
-State transitions occur at state nodes, which update the global workflow state.
-## Execution Flow
-1. **Graph Construction**: Build graph from nodes and edges
-2. **Graph Validation**: Ensure graph is valid (no cycles, all nodes reachable)
-3. **Graph Execution**: Traverse graph from entry node
-4. **Node Execution**: Execute each node based on type
-5. **Edge Evaluation**: Determine next node(s) based on edges
-6. **Parallel Execution**: Use `asyncio.gather()` for parallel nodes
-7. **State Updates**: Update state at state nodes
-8. **Event Streaming**: Yield events during execution for UI
-## Conditional Routing
-Decision nodes evaluate conditions and return next node IDs:
-- **Knowledge Gap Decision**: If `research_complete` → writer, else → tool selector
-- **Budget Decision**: If budget exceeded → exit, else → continue
-- **Iteration Decision**: If max iterations → exit, else → continue
-## Parallel Execution
-Parallel nodes execute multiple nodes concurrently:
-- Each parallel branch runs independently
-- Results are aggregated after all branches complete
-- State is synchronized after parallel execution
-- Errors in one branch don't stop other branches
-## Budget Enforcement
-Budget constraints are enforced at decision nodes:
-- **Token Budget**: Track LLM token usage
-- **Time Budget**: Track elapsed time
-- **Iteration Budget**: Track iteration count
-If any budget is exceeded, execution routes to exit node.
-## Error Handling
-Errors are handled at multiple levels:
-1. **Node Level**: Catch errors in individual node execution
-2. **Graph Level**: Handle errors during graph traversal
-3. **State Level**: Rollback state changes on error
-Errors are logged and yield error events for UI.
-## Backward Compatibility
-Graph execution is optional via feature flag:
-- `USE_GRAPH_EXECUTION=true`: Use graph-based execution
-- `USE_GRAPH_EXECUTION=false`: Use agent chain execution (existing)
-This allows gradual migration and fallback if needed.
-## See Also
-- [Orchestrators](orchestrators.md) - Overview of all orchestrator patterns
-- [Workflows](workflows.md) - Workflow diagrams and patterns
-- [Workflow Diagrams](workflow-diagrams.md) - Detailed workflow diagrams
-- [API Reference - Orchestrators](../api/orchestrators.md) - API documentation

docs/architecture/middleware.md DELETED Viewed

@@ -1,142 +0,0 @@
-# Middleware Architecture
-DeepCritical uses middleware for state management, budget tracking, and workflow coordination.
-## State Management
-### WorkflowState
-**File**: `src/middleware/state_machine.py`
-**Purpose**: Thread-safe state management for research workflows
-**Implementation**: Uses `ContextVar` for thread-safe isolation
-**State Components**:
-- `evidence: list[Evidence]`: Collected evidence from searches
-- `conversation: Conversation`: Iteration history (gaps, tool calls, findings, thoughts)
-- `embedding_service: Any`: Embedding service for semantic search
-**Methods**:
-- `add_evidence(evidence: Evidence)`: Adds evidence with URL-based deduplication
-- `async search_related(query: str, top_k: int = 5) -> list[Evidence]`: Semantic search
-**Initialization**:
-```python
-from src.middleware.state_machine import init_workflow_state
-init_workflow_state(embedding_service)
-```
-**Access**:
-```python
-from src.middleware.state_machine import get_workflow_state
-state = get_workflow_state()  # Auto-initializes if missing
-```
-## Workflow Manager
-**File**: `src/middleware/workflow_manager.py`
-**Purpose**: Coordinates parallel research loops
-**Methods**:
-- `add_loop(loop: ResearchLoop)`: Add a research loop to manage
-- `async run_loops_parallel() -> list[ResearchLoop]`: Run all loops in parallel
-- `update_loop_status(loop_id: str, status: str)`: Update loop status
-- `sync_loop_evidence_to_state()`: Synchronize evidence from loops to global state
-**Features**:
-- Uses `asyncio.gather()` for parallel execution
-- Handles errors per loop (doesn't fail all if one fails)
-- Tracks loop status: `pending`, `running`, `completed`, `failed`, `cancelled`
-- Evidence deduplication across parallel loops
-**Usage**:
-```python
-from src.middleware.workflow_manager import WorkflowManager
-manager = WorkflowManager()
-manager.add_loop(loop1)
-manager.add_loop(loop2)
-completed_loops = await manager.run_loops_parallel()
-```
-## Budget Tracker
-**File**: `src/middleware/budget_tracker.py`
-**Purpose**: Tracks and enforces resource limits
-**Budget Components**:
-- **Tokens**: LLM token usage
-- **Time**: Elapsed time in seconds
-- **Iterations**: Number of iterations
-**Methods**:
-- `create_budget(token_limit, time_limit_seconds, iterations_limit) -> BudgetStatus`
-- `add_tokens(tokens: int)`: Add token usage
-- `start_timer()`: Start time tracking
-- `update_timer()`: Update elapsed time
-- `increment_iteration()`: Increment iteration count
-- `check_budget() -> BudgetStatus`: Check current budget status
-- `can_continue() -> bool`: Check if research can continue
-**Token Estimation**:
-- `estimate_tokens(text: str) -> int`: ~4 chars per token
-- `estimate_llm_call_tokens(prompt: str, response: str) -> int`: Estimate LLM call tokens
-**Usage**:
-```python
-from src.middleware.budget_tracker import BudgetTracker
-tracker = BudgetTracker()
-budget = tracker.create_budget(
-    token_limit=100000,
-    time_limit_seconds=600,
-    iterations_limit=10
-)
-tracker.start_timer()
-# ... research operations ...
-if not tracker.can_continue():
-    # Budget exceeded, stop research
-    pass
-```
-## Models
-All middleware models are defined in `src/utils/models.py`:
-- `IterationData`: Data for a single iteration
-- `Conversation`: Conversation history with iterations
-- `ResearchLoop`: Research loop state and configuration
-- `BudgetStatus`: Current budget status
-## Thread Safety
-All middleware components use `ContextVar` for thread-safe isolation:
-- Each request/thread has its own workflow state
-- No global mutable state
-- Safe for concurrent requests
-## See Also
-- [Orchestrators](orchestrators.md) - How middleware is used in orchestration
-- [API Reference - Orchestrators](../api/orchestrators.md) - API documentation
-- [Contributing - Code Style](../contributing/code-style.md) - Development guidelines

docs/architecture/orchestrators.md DELETED Viewed

@@ -1,198 +0,0 @@
-# Orchestrators Architecture
-DeepCritical supports multiple orchestration patterns for research workflows.
-## Research Flows
-### IterativeResearchFlow
-**File**: `src/orchestrator/research_flow.py`
-**Pattern**: Generate observations → Evaluate gaps → Select tools → Execute → Judge → Continue/Complete
-**Agents Used**:
-- `KnowledgeGapAgent`: Evaluates research completeness
-- `ToolSelectorAgent`: Selects tools for addressing gaps
-- `ThinkingAgent`: Generates observations
-- `WriterAgent`: Creates final report
-- `JudgeHandler`: Assesses evidence sufficiency
-**Features**:
-- Tracks iterations, time, budget
-- Supports graph execution (`use_graph=True`) and agent chains (`use_graph=False`)
-- Iterates until research complete or constraints met
-**Usage**:
-```python
-from src.orchestrator.research_flow import IterativeResearchFlow
-flow = IterativeResearchFlow(
-    search_handler=search_handler,
-    judge_handler=judge_handler,
-    use_graph=False
-)
-async for event in flow.run(query):
-    # Handle events
-    pass
-```
-### DeepResearchFlow
-**File**: `src/orchestrator/research_flow.py`
-**Pattern**: Planner → Parallel iterative loops per section → Synthesizer
-**Agents Used**:
-- `PlannerAgent`: Breaks query into report sections
-- `IterativeResearchFlow`: Per-section research (parallel)
-- `LongWriterAgent` or `ProofreaderAgent`: Final synthesis
-**Features**:
-- Uses `WorkflowManager` for parallel execution
-- Budget tracking per section and globally
-- State synchronization across parallel loops
-- Supports graph execution and agent chains
-**Usage**:
-```python
-from src.orchestrator.research_flow import DeepResearchFlow
-flow = DeepResearchFlow(
-    search_handler=search_handler,
-    judge_handler=judge_handler,
-    use_graph=True
-)
-async for event in flow.run(query):
-    # Handle events
-    pass
-```
-## Graph Orchestrator
-**File**: `src/orchestrator/graph_orchestrator.py`
-**Purpose**: Graph-based execution using Pydantic AI agents as nodes
-**Features**:
-- Uses Pydantic AI Graphs (when available) or agent chains (fallback)
-- Routes based on research mode (iterative/deep/auto)
-- Streams `AgentEvent` objects for UI
-**Node Types**:
-- **Agent Nodes**: Execute Pydantic AI agents
-- **State Nodes**: Update or read workflow state
-- **Decision Nodes**: Make routing decisions
-- **Parallel Nodes**: Execute multiple nodes concurrently
-**Edge Types**:
-- **Sequential Edges**: Always traversed
-- **Conditional Edges**: Traversed based on condition
-- **Parallel Edges**: Used for parallel execution branches
-## Orchestrator Factory
-**File**: `src/orchestrator_factory.py`
-**Purpose**: Factory for creating orchestrators
-**Modes**:
-- **Simple**: Legacy orchestrator (backward compatible)
-- **Advanced**: Magentic orchestrator (requires OpenAI API key)
-- **Auto-detect**: Chooses based on API key availability
-**Usage**:
-```python
-from src.orchestrator_factory import create_orchestrator
-orchestrator = create_orchestrator(
-    search_handler=search_handler,
-    judge_handler=judge_handler,
-    config={},
-    mode="advanced"  # or "simple" or None for auto-detect
-)
-```
-## Magentic Orchestrator
-**File**: `src/orchestrator_magentic.py`
-**Purpose**: Multi-agent coordination using Microsoft Agent Framework
-**Features**:
-- Uses `agent-framework-core`
-- ChatAgent pattern with internal LLMs per agent
-- `MagenticBuilder` with participants: searcher, hypothesizer, judge, reporter
-- Manager orchestrates agents via `OpenAIChatClient`
-- Requires OpenAI API key (function calling support)
-- Event-driven: converts Magentic events to `AgentEvent` for UI streaming
-**Requirements**:
-- `agent-framework-core` package
-- OpenAI API key
-## Hierarchical Orchestrator
-**File**: `src/orchestrator_hierarchical.py`
-**Purpose**: Hierarchical orchestrator using middleware and sub-teams
-**Features**:
-- Uses `SubIterationMiddleware` with `ResearchTeam` and `LLMSubIterationJudge`
-- Adapts Magentic ChatAgent to `SubIterationTeam` protocol
-- Event-driven via `asyncio.Queue` for coordination
-- Supports sub-iteration patterns for complex research tasks
-## Legacy Simple Mode
-**File**: `src/legacy_orchestrator.py`
-**Purpose**: Linear search-judge-synthesize loop
-**Features**:
-- Uses `SearchHandlerProtocol` and `JudgeHandlerProtocol`
-- Generator-based design yielding `AgentEvent` objects
-- Backward compatibility for simple use cases
-## State Initialization
-All orchestrators must initialize workflow state:
-```python
-from src.middleware.state_machine import init_workflow_state
-from src.services.embeddings import get_embedding_service
-embedding_service = get_embedding_service()
-init_workflow_state(embedding_service)
-```
-## Event Streaming
-All orchestrators yield `AgentEvent` objects:
-**Event Types**:
-- `started`: Research started
-- `search_complete`: Search completed
-- `judge_complete`: Evidence evaluation completed
-- `hypothesizing`: Generating hypotheses
-- `synthesizing`: Synthesizing results
-- `complete`: Research completed
-- `error`: Error occurred
-**Event Structure**:
-```python
-class AgentEvent:
-    type: str
-    iteration: int | None
-    data: dict[str, Any]
-```
-## See Also
-- [Graph Orchestration](graph-orchestration.md) - Graph-based execution details
-- [Graph Orchestration (Detailed)](graph_orchestration.md) - Detailed graph architecture
-- [Workflows](workflows.md) - Workflow diagrams and patterns
-- [Workflow Diagrams](workflow-diagrams.md) - Detailed workflow diagrams
-- [API Reference - Orchestrators](../api/orchestrators.md) - API documentation

docs/architecture/services.md DELETED Viewed

@@ -1,142 +0,0 @@
-# Services Architecture
-DeepCritical provides several services for embeddings, RAG, and statistical analysis.
-## Embedding Service
-**File**: `src/services/embeddings.py`
-**Purpose**: Local sentence-transformers for semantic search and deduplication
-**Features**:
-- **No API Key Required**: Uses local sentence-transformers models
-- **Async-Safe**: All operations use `run_in_executor()` to avoid blocking
-- **ChromaDB Storage**: Vector storage for embeddings
-- **Deduplication**: 0.85 similarity threshold (85% similarity = duplicate)
-**Model**: Configurable via `settings.local_embedding_model` (default: `all-MiniLM-L6-v2`)
-**Methods**:
-- `async def embed(text: str) -> list[float]`: Generate embeddings
-- `async def embed_batch(texts: list[str]) -> list[list[float]]`: Batch embedding
-- `async def similarity(text1: str, text2: str) -> float`: Calculate similarity
-- `async def find_duplicates(texts: list[str], threshold: float = 0.85) -> list[tuple[int, int]]`: Find duplicates
-**Usage**:
-```python
-from src.services.embeddings import get_embedding_service
-service = get_embedding_service()
-embedding = await service.embed("text to embed")
-```
-## LlamaIndex RAG Service
-**File**: `src/services/rag.py`
-**Purpose**: Retrieval-Augmented Generation using LlamaIndex
-**Features**:
-- **OpenAI Embeddings**: Requires `OPENAI_API_KEY`
-- **ChromaDB Storage**: Vector database for document storage
-- **Metadata Preservation**: Preserves source, title, URL, date, authors
-- **Lazy Initialization**: Graceful fallback if OpenAI key not available
-**Methods**:
-- `async def ingest_evidence(evidence: list[Evidence]) -> None`: Ingest evidence into RAG
-- `async def retrieve(query: str, top_k: int = 5) -> list[Document]`: Retrieve relevant documents
-- `async def query(query: str, top_k: int = 5) -> str`: Query with RAG
-**Usage**:
-```python
-from src.services.rag import get_rag_service
-service = get_rag_service()
-if service:
-    documents = await service.retrieve("query", top_k=5)
-```
-## Statistical Analyzer
-**File**: `src/services/statistical_analyzer.py`
-**Purpose**: Secure execution of AI-generated statistical code
-**Features**:
-- **Modal Sandbox**: Secure, isolated execution environment
-- **Code Generation**: Generates Python code via LLM
-- **Library Pinning**: Version-pinned libraries in `SANDBOX_LIBRARIES`
-- **Network Isolation**: `block_network=True` by default
-**Libraries Available**:
-- pandas, numpy, scipy
-- matplotlib, scikit-learn
-- statsmodels
-**Output**: `AnalysisResult` with:
-- `verdict`: SUPPORTED, REFUTED, or INCONCLUSIVE
-- `code`: Generated analysis code
-- `output`: Execution output
-- `error`: Error message if execution failed
-**Usage**:
-```python
-from src.services.statistical_analyzer import StatisticalAnalyzer
-analyzer = StatisticalAnalyzer()
-result = await analyzer.analyze(
-    hypothesis="Metformin reduces cancer risk",
-    evidence=evidence_list
-)
-```
-## Singleton Pattern
-All services use the singleton pattern with `@lru_cache(maxsize=1)`:
-```python
-@lru_cache(maxsize=1)
-def get_embedding_service() -> EmbeddingService:
-    return EmbeddingService()
-```
-This ensures:
-- Single instance per process
-- Lazy initialization
-- No dependencies required at import time
-## Service Availability
-Services check availability before use:
-```python
-from src.utils.config import settings
-if settings.modal_available:
-    # Use Modal sandbox
-    pass
-if settings.has_openai_key:
-    # Use OpenAI embeddings for RAG
-    pass
-```
-## See Also
-- [Tools](tools.md) - How services are used by search tools
-- [API Reference - Services](../api/services.md) - API documentation
-- [Configuration](../configuration/index.md) - Service configuration

docs/architecture/tools.md DELETED Viewed

@@ -1,175 +0,0 @@
-# Tools Architecture
-DeepCritical implements a protocol-based search tool system for retrieving evidence from multiple sources.
-## SearchTool Protocol
-All tools implement the `SearchTool` protocol from `src/tools/base.py`:
-```python
-class SearchTool(Protocol):
-    @property
-    def name(self) -> str: ...
-    async def search(
-        self,
-        query: str,
-        max_results: int = 10
-    ) -> list[Evidence]: ...
-```
-## Rate Limiting
-All tools use the `@retry` decorator from tenacity:
-```python
-@retry(
-    stop=stop_after_attempt(3),
-    wait=wait_exponential(...)
-)
-async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
-    # Implementation
-```
-Tools with API rate limits implement `_rate_limit()` method and use shared rate limiters from `src/tools/rate_limiter.py`.
-## Error Handling
-Tools raise custom exceptions:
-- `SearchError`: General search failures
-- `RateLimitError`: Rate limit exceeded
-Tools handle HTTP errors (429, 500, timeout) and return empty lists on non-critical errors (with warning logs).
-## Query Preprocessing
-Tools use `preprocess_query()` from `src/tools/query_utils.py` to:
-- Remove noise from queries
-- Expand synonyms
-- Normalize query format
-## Evidence Conversion
-All tools convert API responses to `Evidence` objects with:
-- `Citation`: Title, URL, date, authors
-- `content`: Evidence text
-- `relevance_score`: 0.0-1.0 relevance score
-- `metadata`: Additional metadata
-Missing fields are handled gracefully with defaults.
-## Tool Implementations
-### PubMed Tool
-**File**: `src/tools/pubmed.py`
-**API**: NCBI E-utilities (ESearch → EFetch)
-**Rate Limiting**:
-- 0.34s between requests (3 req/sec without API key)
-- 0.1s between requests (10 req/sec with NCBI API key)
-**Features**:
-- XML parsing with `xmltodict`
-- Handles single vs. multiple articles
-- Query preprocessing
-- Evidence conversion with metadata extraction
-### ClinicalTrials Tool
-**File**: `src/tools/clinicaltrials.py`
-**API**: ClinicalTrials.gov API v2
-**Important**: Uses `requests` library (NOT httpx) because WAF blocks httpx TLS fingerprint.
-**Execution**: Runs in thread pool: `await asyncio.to_thread(requests.get, ...)`
-**Filtering**:
-- Only interventional studies
-- Status: `COMPLETED`, `ACTIVE_NOT_RECRUITING`, `RECRUITING`, `ENROLLING_BY_INVITATION`
-**Features**:
-- Parses nested JSON structure
-- Extracts trial metadata
-- Evidence conversion
-### Europe PMC Tool
-**File**: `src/tools/europepmc.py`
-**API**: Europe PMC REST API
-**Features**:
-- Handles preprint markers: `[PREPRINT - Not peer-reviewed]`
-- Builds URLs from DOI or PMID
-- Checks `pubTypeList` for preprint detection
-- Includes both preprints and peer-reviewed articles
-### RAG Tool
-**File**: `src/tools/rag_tool.py`
-**Purpose**: Semantic search within collected evidence
-**Implementation**: Wraps `LlamaIndexRAGService`
-**Features**:
-- Returns Evidence from RAG results
-- Handles evidence ingestion
-- Semantic similarity search
-- Metadata preservation
-### Search Handler
-**File**: `src/tools/search_handler.py`
-**Purpose**: Orchestrates parallel searches across multiple tools
-**Features**:
-- Uses `asyncio.gather()` with `return_exceptions=True`
-- Aggregates results into `SearchResult`
-- Handles tool failures gracefully
-- Deduplicates results by URL
-## Tool Registration
-Tools are registered in the search handler:
-```python
-from src.tools.pubmed import PubMedTool
-from src.tools.clinicaltrials import ClinicalTrialsTool
-from src.tools.europepmc import EuropePMCTool
-search_handler = SearchHandler(
-    tools=[
-        PubMedTool(),
-        ClinicalTrialsTool(),
-        EuropePMCTool(),
-    ]
-)
-```
-## See Also
-- [Services](services.md) - RAG and embedding services
-- [API Reference - Tools](../api/tools.md) - API documentation
-- [Contributing - Implementation Patterns](../contributing/implementation-patterns.md) - Development guidelines

docs/architecture/workflow-diagrams.md DELETED Viewed

@@ -1,670 +0,0 @@
-# DeepCritical Workflow - Simplified Magentic Architecture
-> **Architecture Pattern**: Microsoft Magentic Orchestration
-> **Design Philosophy**: Simple, dynamic, manager-driven coordination
-> **Key Innovation**: Intelligent manager replaces rigid sequential phases
----
-## 1. High-Level Magentic Workflow
-```mermaid
-flowchart TD
-    Start([User Query]) --> Manager[Magentic Manager<br/>Plan • Select • Assess • Adapt]
-    Manager -->|Plans| Task1[Task Decomposition]
-    Task1 --> Manager
-    Manager -->|Selects & Executes| HypAgent[Hypothesis Agent]
-    Manager -->|Selects & Executes| SearchAgent[Search Agent]
-    Manager -->|Selects & Executes| AnalysisAgent[Analysis Agent]
-    Manager -->|Selects & Executes| ReportAgent[Report Agent]
-    HypAgent -->|Results| Manager
-    SearchAgent -->|Results| Manager
-    AnalysisAgent -->|Results| Manager
-    ReportAgent -->|Results| Manager
-    Manager -->|Assesses Quality| Decision{Good Enough?}
-    Decision -->|No - Refine| Manager
-    Decision -->|No - Different Agent| Manager
-    Decision -->|No - Stalled| Replan[Reset Plan]
-    Replan --> Manager
-    Decision -->|Yes| Synthesis[Synthesize Final Result]
-    Synthesis --> Output([Research Report])
-    style Start fill:#e1f5e1
-    style Manager fill:#ffe6e6
-    style HypAgent fill:#fff4e6
-    style SearchAgent fill:#fff4e6
-    style AnalysisAgent fill:#fff4e6
-    style ReportAgent fill:#fff4e6
-    style Decision fill:#ffd6d6
-    style Synthesis fill:#d4edda
-    style Output fill:#e1f5e1
-```
-## 2. Magentic Manager: The 6-Phase Cycle
-```mermaid
-flowchart LR
-    P1[1. Planning<br/>Analyze task<br/>Create strategy] --> P2[2. Agent Selection<br/>Pick best agent<br/>for subtask]
-    P2 --> P3[3. Execution<br/>Run selected<br/>agent with tools]
-    P3 --> P4[4. Assessment<br/>Evaluate quality<br/>Check progress]
-    P4 --> Decision{Quality OK?<br/>Progress made?}
-    Decision -->|Yes| P6[6. Synthesis<br/>Combine results<br/>Generate report]
-    Decision -->|No| P5[5. Iteration<br/>Adjust plan<br/>Try again]
-    P5 --> P2
-    P6 --> Done([Complete])
-    style P1 fill:#fff4e6
-    style P2 fill:#ffe6e6
-    style P3 fill:#e6f3ff
-    style P4 fill:#ffd6d6
-    style P5 fill:#fff3cd
-    style P6 fill:#d4edda
-    style Done fill:#e1f5e1
-```
-## 3. Simplified Agent Architecture
-```mermaid
-graph TB
-    subgraph "Orchestration Layer"
-        Manager[Magentic Manager<br/>• Plans workflow<br/>• Selects agents<br/>• Assesses quality<br/>• Adapts strategy]
-        SharedContext[(Shared Context<br/>• Hypotheses<br/>• Search Results<br/>• Analysis<br/>• Progress)]
-        Manager <--> SharedContext
-    end
-    subgraph "Specialist Agents"
-        HypAgent[Hypothesis Agent<br/>• Domain understanding<br/>• Hypothesis generation<br/>• Testability refinement]
-        SearchAgent[Search Agent<br/>• Multi-source search<br/>• RAG retrieval<br/>• Result ranking]
-        AnalysisAgent[Analysis Agent<br/>• Evidence extraction<br/>• Statistical analysis<br/>• Code execution]
-        ReportAgent[Report Agent<br/>• Report assembly<br/>• Visualization<br/>• Citation formatting]
-    end
-    subgraph "MCP Tools"
-        WebSearch[Web Search<br/>PubMed • arXiv • bioRxiv]
-        CodeExec[Code Execution<br/>Sandboxed Python]
-        RAG[RAG Retrieval<br/>Vector DB • Embeddings]
-        Viz[Visualization<br/>Charts • Graphs]
-    end
-    Manager -->|Selects & Directs| HypAgent
-    Manager -->|Selects & Directs| SearchAgent
-    Manager -->|Selects & Directs| AnalysisAgent
-    Manager -->|Selects & Directs| ReportAgent
-    HypAgent --> SharedContext
-    SearchAgent --> SharedContext
-    AnalysisAgent --> SharedContext
-    ReportAgent --> SharedContext
-    SearchAgent --> WebSearch
-    SearchAgent --> RAG
-    AnalysisAgent --> CodeExec
-    ReportAgent --> CodeExec
-    ReportAgent --> Viz
-    style Manager fill:#ffe6e6
-    style SharedContext fill:#ffe6f0
-    style HypAgent fill:#fff4e6
-    style SearchAgent fill:#fff4e6
-    style AnalysisAgent fill:#fff4e6
-    style ReportAgent fill:#fff4e6
-    style WebSearch fill:#e6f3ff
-    style CodeExec fill:#e6f3ff
-    style RAG fill:#e6f3ff
-    style Viz fill:#e6f3ff
-```
-## 4. Dynamic Workflow Example
-```mermaid
-sequenceDiagram
-    participant User
-    participant Manager
-    participant HypAgent
-    participant SearchAgent
-    participant AnalysisAgent
-    participant ReportAgent
-    User->>Manager: "Research protein folding in Alzheimer's"
-    Note over Manager: PLAN: Generate hypotheses → Search → Analyze → Report
-    Manager->>HypAgent: Generate 3 hypotheses
-    HypAgent-->>Manager: Returns 3 hypotheses
-    Note over Manager: ASSESS: Good quality, proceed
-    Manager->>SearchAgent: Search literature for hypothesis 1
-    SearchAgent-->>Manager: Returns 15 papers
-    Note over Manager: ASSESS: Good results, continue
-    Manager->>SearchAgent: Search for hypothesis 2
-    SearchAgent-->>Manager: Only 2 papers found
-    Note over Manager: ASSESS: Insufficient, refine search
-    Manager->>SearchAgent: Refined query for hypothesis 2
-    SearchAgent-->>Manager: Returns 12 papers
-    Note over Manager: ASSESS: Better, proceed
-    Manager->>AnalysisAgent: Analyze evidence for all hypotheses
-    AnalysisAgent-->>Manager: Returns analysis with code
-    Note over Manager: ASSESS: Complete, generate report
-    Manager->>ReportAgent: Create comprehensive report
-    ReportAgent-->>Manager: Returns formatted report
-    Note over Manager: SYNTHESIZE: Combine all results
-    Manager->>User: Final Research Report
-```
-## 5. Manager Decision Logic
-```mermaid
-flowchart TD
-    Start([Manager Receives Task]) --> Plan[Create Initial Plan]
-    Plan --> Select[Select Agent for Next Subtask]
-    Select --> Execute[Execute Agent]
-    Execute --> Collect[Collect Results]
-    Collect --> Assess[Assess Quality & Progress]
-    Assess --> Q1{Quality Sufficient?}
-    Q1 -->|No| Q2{Same Agent Can Fix?}
-    Q2 -->|Yes| Feedback[Provide Specific Feedback]
-    Feedback --> Execute
-    Q2 -->|No| Different[Try Different Agent]
-    Different --> Select
-    Q1 -->|Yes| Q3{Task Complete?}
-    Q3 -->|No| Q4{Making Progress?}
-    Q4 -->|Yes| Select
-    Q4 -->|No - Stalled| Replan[Reset Plan & Approach]
-    Replan --> Plan
-    Q3 -->|Yes| Synth[Synthesize Final Result]
-    Synth --> Done([Return Report])
-    style Start fill:#e1f5e1
-    style Plan fill:#fff4e6
-    style Select fill:#ffe6e6
-    style Execute fill:#e6f3ff
-    style Assess fill:#ffd6d6
-    style Q1 fill:#ffe6e6
-    style Q2 fill:#ffe6e6
-    style Q3 fill:#ffe6e6
-    style Q4 fill:#ffe6e6
-    style Synth fill:#d4edda
-    style Done fill:#e1f5e1
-```
-## 6. Hypothesis Agent Workflow
-```mermaid
-flowchart LR
-    Input[Research Query] --> Domain[Identify Domain<br/>& Key Concepts]
-    Domain --> Context[Retrieve Background<br/>Knowledge]
-    Context --> Generate[Generate 3-5<br/>Initial Hypotheses]
-    Generate --> Refine[Refine for<br/>Testability]
-    Refine --> Rank[Rank by<br/>Quality Score]
-    Rank --> Output[Return Top<br/>Hypotheses]
-    Output --> Struct[Hypothesis Structure:<br/>• Statement<br/>• Rationale<br/>• Testability Score<br/>• Data Requirements<br/>• Expected Outcomes]
-    style Input fill:#e1f5e1
-    style Output fill:#fff4e6
-    style Struct fill:#e6f3ff
-```
-## 7. Search Agent Workflow
-```mermaid
-flowchart TD
-    Input[Hypotheses] --> Strategy[Formulate Search<br/>Strategy per Hypothesis]
-    Strategy --> Multi[Multi-Source Search]
-    Multi --> PubMed[PubMed Search<br/>via MCP]
-    Multi --> ArXiv[arXiv Search<br/>via MCP]
-    Multi --> BioRxiv[bioRxiv Search<br/>via MCP]
-    PubMed --> Aggregate[Aggregate Results]
-    ArXiv --> Aggregate
-    BioRxiv --> Aggregate
-    Aggregate --> Filter[Filter & Rank<br/>by Relevance]
-    Filter --> Dedup[Deduplicate<br/>Cross-Reference]
-    Dedup --> Embed[Embed Documents<br/>via MCP]
-    Embed --> Vector[(Vector DB)]
-    Vector --> RAGRetrieval[RAG Retrieval<br/>Top-K per Hypothesis]
-    RAGRetrieval --> Output[Return Contextualized<br/>Search Results]
-    style Input fill:#fff4e6
-    style Multi fill:#ffe6e6
-    style Vector fill:#ffe6f0
-    style Output fill:#e6f3ff
-```
-## 8. Analysis Agent Workflow
-```mermaid
-flowchart TD
-    Input1[Hypotheses] --> Extract
-    Input2[Search Results] --> Extract[Extract Evidence<br/>per Hypothesis]
-    Extract --> Methods[Determine Analysis<br/>Methods Needed]
-    Methods --> Branch{Requires<br/>Computation?}
-    Branch -->|Yes| GenCode[Generate Python<br/>Analysis Code]
-    Branch -->|No| Qual[Qualitative<br/>Synthesis]
-    GenCode --> Execute[Execute Code<br/>via MCP Sandbox]
-    Execute --> Interpret1[Interpret<br/>Results]
-    Qual --> Interpret2[Interpret<br/>Findings]
-    Interpret1 --> Synthesize[Synthesize Evidence<br/>Across Sources]
-    Interpret2 --> Synthesize
-    Synthesize --> Verdict[Determine Verdict<br/>per Hypothesis]
-    Verdict --> Support[• Supported<br/>• Refuted<br/>• Inconclusive]
-    Support --> Gaps[Identify Knowledge<br/>Gaps & Limitations]
-    Gaps --> Output[Return Analysis<br/>Report]
-    style Input1 fill:#fff4e6
-    style Input2 fill:#e6f3ff
-    style Execute fill:#ffe6e6
-    style Output fill:#e6ffe6
-```
-## 9. Report Agent Workflow
-```mermaid
-flowchart TD
-    Input1[Query] --> Assemble
-    Input2[Hypotheses] --> Assemble
-    Input3[Search Results] --> Assemble
-    Input4[Analysis] --> Assemble[Assemble Report<br/>Sections]
-    Assemble --> Exec[Executive Summary]
-    Assemble --> Intro[Introduction]
-    Assemble --> Methods[Methods]
-    Assemble --> Results[Results per<br/>Hypothesis]
-    Assemble --> Discussion[Discussion]
-    Assemble --> Future[Future Directions]
-    Assemble --> Refs[References]
-    Results --> VizCheck{Needs<br/>Visualization?}
-    VizCheck -->|Yes| GenViz[Generate Viz Code]
-    GenViz --> ExecViz[Execute via MCP<br/>Create Charts]
-    ExecViz --> Combine
-    VizCheck -->|No| Combine[Combine All<br/>Sections]
-    Exec --> Combine
-    Intro --> Combine
-    Methods --> Combine
-    Discussion --> Combine
-    Future --> Combine
-    Refs --> Combine
-    Combine --> Format[Format Output]
-    Format --> MD[Markdown]
-    Format --> PDF[PDF]
-    Format --> JSON[JSON]
-    MD --> Output[Return Final<br/>Report]
-    PDF --> Output
-    JSON --> Output
-    style Input1 fill:#e1f5e1
-    style Input2 fill:#fff4e6
-    style Input3 fill:#e6f3ff
-    style Input4 fill:#e6ffe6
-    style Output fill:#d4edda
-```
-## 10. Data Flow & Event Streaming
-```mermaid
-flowchart TD
-    User[👤 User] -->|Research Query| UI[Gradio UI]
-    UI -->|Submit| Manager[Magentic Manager]
-    Manager -->|Event: Planning| UI
-    Manager -->|Select Agent| HypAgent[Hypothesis Agent]
-    HypAgent -->|Event: Delta/Message| UI
-    HypAgent -->|Hypotheses| Context[(Shared Context)]
-    Context -->|Retrieved by| Manager
-    Manager -->|Select Agent| SearchAgent[Search Agent]
-    SearchAgent -->|MCP Request| WebSearch[Web Search Tool]
-    WebSearch -->|Results| SearchAgent
-    SearchAgent -->|Event: Delta/Message| UI
-    SearchAgent -->|Documents| Context
-    SearchAgent -->|Embeddings| VectorDB[(Vector DB)]
-    Context -->|Retrieved by| Manager
-    Manager -->|Select Agent| AnalysisAgent[Analysis Agent]
-    AnalysisAgent -->|MCP Request| CodeExec[Code Execution Tool]
-    CodeExec -->|Results| AnalysisAgent
-    AnalysisAgent -->|Event: Delta/Message| UI
-    AnalysisAgent -->|Analysis| Context
-    Context -->|Retrieved by| Manager
-    Manager -->|Select Agent| ReportAgent[Report Agent]
-    ReportAgent -->|MCP Request| CodeExec
-    ReportAgent -->|Event: Delta/Message| UI
-    ReportAgent -->|Report| Context
-    Manager -->|Event: Final Result| UI
-    UI -->|Display| User
-    style User fill:#e1f5e1
-    style UI fill:#e6f3ff
-    style Manager fill:#ffe6e6
-    style Context fill:#ffe6f0
-    style VectorDB fill:#ffe6f0
-    style WebSearch fill:#f0f0f0
-    style CodeExec fill:#f0f0f0
-```
-## 11. MCP Tool Architecture
-```mermaid
-graph TB
-    subgraph "Agent Layer"
-        Manager[Magentic Manager]
-        HypAgent[Hypothesis Agent]
-        SearchAgent[Search Agent]
-        AnalysisAgent[Analysis Agent]
-        ReportAgent[Report Agent]
-    end
-    subgraph "MCP Protocol Layer"
-        Registry[MCP Tool Registry<br/>• Discovers tools<br/>• Routes requests<br/>• Manages connections]
-    end
-    subgraph "MCP Servers"
-        Server1[Web Search Server<br/>localhost:8001<br/>• PubMed<br/>• arXiv<br/>• bioRxiv]
-        Server2[Code Execution Server<br/>localhost:8002<br/>• Sandboxed Python<br/>• Package management]
-        Server3[RAG Server<br/>localhost:8003<br/>• Vector embeddings<br/>• Similarity search]
-        Server4[Visualization Server<br/>localhost:8004<br/>• Chart generation<br/>• Plot rendering]
-    end
-    subgraph "External Services"
-        PubMed[PubMed API]
-        ArXiv[arXiv API]
-        BioRxiv[bioRxiv API]
-        Modal[Modal Sandbox]
-        ChromaDB[(ChromaDB)]
-    end
-    SearchAgent -->|Request| Registry
-    AnalysisAgent -->|Request| Registry
-    ReportAgent -->|Request| Registry
-    Registry --> Server1
-    Registry --> Server2
-    Registry --> Server3
-    Registry --> Server4
-    Server1 --> PubMed
-    Server1 --> ArXiv
-    Server1 --> BioRxiv
-    Server2 --> Modal
-    Server3 --> ChromaDB
-    style Manager fill:#ffe6e6
-    style Registry fill:#fff4e6
-    style Server1 fill:#e6f3ff
-    style Server2 fill:#e6f3ff
-    style Server3 fill:#e6f3ff
-    style Server4 fill:#e6f3ff
-```
-## 12. Progress Tracking & Stall Detection
-```mermaid
-stateDiagram-v2
-    [*] --> Initialization: User Query
-    Initialization --> Planning: Manager starts
-    Planning --> AgentExecution: Select agent
-    AgentExecution --> Assessment: Collect results
-    Assessment --> QualityCheck: Evaluate output
-    QualityCheck --> AgentExecution: Poor quality<br/>(retry < max_rounds)
-    QualityCheck --> Planning: Poor quality<br/>(try different agent)
-    QualityCheck --> NextAgent: Good quality<br/>(task incomplete)
-    QualityCheck --> Synthesis: Good quality<br/>(task complete)
-    NextAgent --> AgentExecution: Select next agent
-    state StallDetection <<choice>>
-    Assessment --> StallDetection: Check progress
-    StallDetection --> Planning: No progress<br/>(stall count < max)
-    StallDetection --> ErrorRecovery: No progress<br/>(max stalls reached)
-    ErrorRecovery --> PartialReport: Generate partial results
-    PartialReport --> [*]
-    Synthesis --> FinalReport: Combine all outputs
-    FinalReport --> [*]
-    note right of QualityCheck
-        Manager assesses:
-        • Output completeness
-        • Quality metrics
-        • Progress made
-    end note
-    note right of StallDetection
-        Stall = no new progress
-        after agent execution
-        Triggers plan reset
-    end note
-```
-## 13. Gradio UI Integration
-```mermaid
-graph TD
-    App[Gradio App<br/>DeepCritical Research Agent]
-    App --> Input[Input Section]
-    App --> Status[Status Section]
-    App --> Output[Output Section]
-    Input --> Query[Research Question<br/>Text Area]
-    Input --> Controls[Controls]
-    Controls --> MaxHyp[Max Hypotheses: 1-10]
-    Controls --> MaxRounds[Max Rounds: 5-20]
-    Controls --> Submit[Start Research Button]
-    Status --> Log[Real-time Event Log<br/>• Manager planning<br/>• Agent selection<br/>• Execution updates<br/>• Quality assessment]
-    Status --> Progress[Progress Tracker<br/>• Current agent<br/>• Round count<br/>• Stall count]
-    Output --> Tabs[Tabbed Results]
-    Tabs --> Tab1[Hypotheses Tab<br/>Generated hypotheses with scores]
-    Tabs --> Tab2[Search Results Tab<br/>Papers & sources found]
-    Tabs --> Tab3[Analysis Tab<br/>Evidence & verdicts]
-    Tabs --> Tab4[Report Tab<br/>Final research report]
-    Tab4 --> Download[Download Report<br/>MD / PDF / JSON]
-    Submit -.->|Triggers| Workflow[Magentic Workflow]
-    Workflow -.->|MagenticOrchestratorMessageEvent| Log
-    Workflow -.->|MagenticAgentDeltaEvent| Log
-    Workflow -.->|MagenticAgentMessageEvent| Log
-    Workflow -.->|MagenticFinalResultEvent| Tab4
-    style App fill:#e1f5e1
-    style Input fill:#fff4e6
-    style Status fill:#e6f3ff
-    style Output fill:#e6ffe6
-    style Workflow fill:#ffe6e6
-```
-## 14. Complete System Context
-```mermaid
-graph LR
-    User[👤 Researcher<br/>Asks research questions] -->|Submits query| DC[DeepCritical<br/>Magentic Workflow]
-    DC -->|Literature search| PubMed[PubMed API<br/>Medical papers]
-    DC -->|Preprint search| ArXiv[arXiv API<br/>Scientific preprints]
-    DC -->|Biology search| BioRxiv[bioRxiv API<br/>Biology preprints]
-    DC -->|Agent reasoning| Claude[Claude API<br/>Sonnet 4 / Opus]
-    DC -->|Code execution| Modal[Modal Sandbox<br/>Safe Python env]
-    DC -->|Vector storage| Chroma[ChromaDB<br/>Embeddings & RAG]
-    DC -->|Deployed on| HF[HuggingFace Spaces<br/>Gradio 6.0]
-    PubMed -->|Results| DC
-    ArXiv -->|Results| DC
-    BioRxiv -->|Results| DC
-    Claude -->|Responses| DC
-    Modal -->|Output| DC
-    Chroma -->|Context| DC
-    DC -->|Research report| User
-    style User fill:#e1f5e1
-    style DC fill:#ffe6e6
-    style PubMed fill:#e6f3ff
-    style ArXiv fill:#e6f3ff
-    style BioRxiv fill:#e6f3ff
-    style Claude fill:#ffd6d6
-    style Modal fill:#f0f0f0
-    style Chroma fill:#ffe6f0
-    style HF fill:#d4edda
-```
-## 15. Workflow Timeline (Simplified)
-```mermaid
-gantt
-    title DeepCritical Magentic Workflow - Typical Execution
-    dateFormat mm:ss
-    axisFormat %M:%S
-    section Manager Planning
-    Initial planning         :p1, 00:00, 10s
-    section Hypothesis Agent
-    Generate hypotheses      :h1, after p1, 30s
-    Manager assessment       :h2, after h1, 5s
-    section Search Agent
-    Search hypothesis 1      :s1, after h2, 20s
-    Search hypothesis 2      :s2, after s1, 20s
-    Search hypothesis 3      :s3, after s2, 20s
-    RAG processing          :s4, after s3, 15s
-    Manager assessment      :s5, after s4, 5s
-    section Analysis Agent
-    Evidence extraction     :a1, after s5, 15s
-    Code generation        :a2, after a1, 20s
-    Code execution         :a3, after a2, 25s
-    Synthesis              :a4, after a3, 20s
-    Manager assessment     :a5, after a4, 5s
-    section Report Agent
-    Report assembly        :r1, after a5, 30s
-    Visualization          :r2, after r1, 15s
-    Formatting             :r3, after r2, 10s
-    section Manager Synthesis
-    Final synthesis        :f1, after r3, 10s
-```
----
-## Key Differences from Original Design
-| Aspect | Original (Judge-in-Loop) | New (Magentic) |
-|--------|-------------------------|----------------|
-| **Control Flow** | Fixed sequential phases | Dynamic agent selection |
-| **Quality Control** | Separate Judge Agent | Manager assessment built-in |
-| **Retry Logic** | Phase-level with feedback | Agent-level with adaptation |
-| **Flexibility** | Rigid 4-phase pipeline | Adaptive workflow |
-| **Complexity** | 5 agents (including Judge) | 4 agents (no Judge) |
-| **Progress Tracking** | Manual state management | Built-in round/stall detection |
-| **Agent Coordination** | Sequential handoff | Manager-driven dynamic selection |
-| **Error Recovery** | Retry same phase | Try different agent or replan |
----
-## Simplified Design Principles
-1. **Manager is Intelligent**: LLM-powered manager handles planning, selection, and quality assessment
-2. **No Separate Judge**: Manager's assessment phase replaces dedicated Judge Agent
-3. **Dynamic Workflow**: Agents can be called multiple times in any order based on need
-4. **Built-in Safety**: max_round_count (15) and max_stall_count (3) prevent infinite loops
-5. **Event-Driven UI**: Real-time streaming updates to Gradio interface
-6. **MCP-Powered Tools**: All external capabilities via Model Context Protocol
-7. **Shared Context**: Centralized state accessible to all agents
-8. **Progress Awareness**: Manager tracks what's been done and what's needed
----
-## Legend
-- 🔴 **Red/Pink**: Manager, orchestration, decision-making
-- 🟡 **Yellow/Orange**: Specialist agents, processing
-- 🔵 **Blue**: Data, tools, MCP services
-- 🟣 **Purple/Pink**: Storage, databases, state
-- 🟢 **Green**: User interactions, final outputs
-- ⚪ **Gray**: External services, APIs
----
-## Implementation Highlights
-**Simple 4-Agent Setup:**
-```python
-workflow = (
-    MagenticBuilder()
-    .participants(
-        hypothesis=HypothesisAgent(tools=[background_tool]),
-        search=SearchAgent(tools=[web_search, rag_tool]),
-        analysis=AnalysisAgent(tools=[code_execution]),
-        report=ReportAgent(tools=[code_execution, visualization])
-    )
-    .with_standard_manager(
-        chat_client=AnthropicClient(model="claude-sonnet-4"),
-        max_round_count=15,    # Prevent infinite loops
-        max_stall_count=3      # Detect stuck workflows
-    )
-    .build()
-)
-```
-**Manager handles quality assessment in its instructions:**
-- Checks hypothesis quality (testable, novel, clear)
-- Validates search results (relevant, authoritative, recent)
-- Assesses analysis soundness (methodology, evidence, conclusions)
-- Ensures report completeness (all sections, proper citations)
-No separate Judge Agent needed - manager does it all!
----
-**Document Version**: 2.0 (Magentic Simplified)
-**Last Updated**: 2025-11-24
-**Architecture**: Microsoft Magentic Orchestration Pattern
-**Agents**: 4 (Hypothesis, Search, Analysis, Report) + 1 Manager
-**License**: MIT
-## See Also
-- [Orchestrators](orchestrators.md) - Overview of all orchestrator patterns
-- [Graph Orchestration](graph-orchestration.md) - Graph-based execution overview
-- [Graph Orchestration (Detailed)](graph_orchestration.md) - Detailed graph architecture
-- [Workflows](workflows.md) - Workflow patterns summary
-- [API Reference - Orchestrators](../api/orchestrators.md) - API documentation

docs/architecture/workflows.md DELETED Viewed

@@ -1,662 +0,0 @@
-# DeepCritical Workflow - Simplified Magentic Architecture
-> **Architecture Pattern**: Microsoft Magentic Orchestration
-> **Design Philosophy**: Simple, dynamic, manager-driven coordination
-> **Key Innovation**: Intelligent manager replaces rigid sequential phases
----
-## 1. High-Level Magentic Workflow
-```mermaid
-flowchart TD
-    Start([User Query]) --> Manager[Magentic Manager<br/>Plan • Select • Assess • Adapt]
-    Manager -->|Plans| Task1[Task Decomposition]
-    Task1 --> Manager
-    Manager -->|Selects & Executes| HypAgent[Hypothesis Agent]
-    Manager -->|Selects & Executes| SearchAgent[Search Agent]
-    Manager -->|Selects & Executes| AnalysisAgent[Analysis Agent]
-    Manager -->|Selects & Executes| ReportAgent[Report Agent]
-    HypAgent -->|Results| Manager
-    SearchAgent -->|Results| Manager
-    AnalysisAgent -->|Results| Manager
-    ReportAgent -->|Results| Manager
-    Manager -->|Assesses Quality| Decision{Good Enough?}
-    Decision -->|No - Refine| Manager
-    Decision -->|No - Different Agent| Manager
-    Decision -->|No - Stalled| Replan[Reset Plan]
-    Replan --> Manager
-    Decision -->|Yes| Synthesis[Synthesize Final Result]
-    Synthesis --> Output([Research Report])
-    style Start fill:#e1f5e1
-    style Manager fill:#ffe6e6
-    style HypAgent fill:#fff4e6
-    style SearchAgent fill:#fff4e6
-    style AnalysisAgent fill:#fff4e6
-    style ReportAgent fill:#fff4e6
-    style Decision fill:#ffd6d6
-    style Synthesis fill:#d4edda
-    style Output fill:#e1f5e1
-```
-## 2. Magentic Manager: The 6-Phase Cycle
-```mermaid
-flowchart LR
-    P1[1. Planning<br/>Analyze task<br/>Create strategy] --> P2[2. Agent Selection<br/>Pick best agent<br/>for subtask]
-    P2 --> P3[3. Execution<br/>Run selected<br/>agent with tools]
-    P3 --> P4[4. Assessment<br/>Evaluate quality<br/>Check progress]
-    P4 --> Decision{Quality OK?<br/>Progress made?}
-    Decision -->|Yes| P6[6. Synthesis<br/>Combine results<br/>Generate report]
-    Decision -->|No| P5[5. Iteration<br/>Adjust plan<br/>Try again]
-    P5 --> P2
-    P6 --> Done([Complete])
-    style P1 fill:#fff4e6
-    style P2 fill:#ffe6e6
-    style P3 fill:#e6f3ff
-    style P4 fill:#ffd6d6
-    style P5 fill:#fff3cd
-    style P6 fill:#d4edda
-    style Done fill:#e1f5e1
-```
-## 3. Simplified Agent Architecture
-```mermaid
-graph TB
-    subgraph "Orchestration Layer"
-        Manager[Magentic Manager<br/>• Plans workflow<br/>• Selects agents<br/>• Assesses quality<br/>• Adapts strategy]
-        SharedContext[(Shared Context<br/>• Hypotheses<br/>• Search Results<br/>• Analysis<br/>• Progress)]
-        Manager <--> SharedContext
-    end
-    subgraph "Specialist Agents"
-        HypAgent[Hypothesis Agent<br/>• Domain understanding<br/>• Hypothesis generation<br/>• Testability refinement]
-        SearchAgent[Search Agent<br/>• Multi-source search<br/>• RAG retrieval<br/>• Result ranking]
-        AnalysisAgent[Analysis Agent<br/>• Evidence extraction<br/>• Statistical analysis<br/>• Code execution]
-        ReportAgent[Report Agent<br/>• Report assembly<br/>• Visualization<br/>• Citation formatting]
-    end
-    subgraph "MCP Tools"
-        WebSearch[Web Search<br/>PubMed • arXiv • bioRxiv]
-        CodeExec[Code Execution<br/>Sandboxed Python]
-        RAG[RAG Retrieval<br/>Vector DB • Embeddings]
-        Viz[Visualization<br/>Charts • Graphs]
-    end
-    Manager -->|Selects & Directs| HypAgent
-    Manager -->|Selects & Directs| SearchAgent
-    Manager -->|Selects & Directs| AnalysisAgent
-    Manager -->|Selects & Directs| ReportAgent
-    HypAgent --> SharedContext
-    SearchAgent --> SharedContext
-    AnalysisAgent --> SharedContext
-    ReportAgent --> SharedContext
-    SearchAgent --> WebSearch
-    SearchAgent --> RAG
-    AnalysisAgent --> CodeExec
-    ReportAgent --> CodeExec
-    ReportAgent --> Viz
-    style Manager fill:#ffe6e6
-    style SharedContext fill:#ffe6f0
-    style HypAgent fill:#fff4e6
-    style SearchAgent fill:#fff4e6
-    style AnalysisAgent fill:#fff4e6
-    style ReportAgent fill:#fff4e6
-    style WebSearch fill:#e6f3ff
-    style CodeExec fill:#e6f3ff
-    style RAG fill:#e6f3ff
-    style Viz fill:#e6f3ff
-```
-## 4. Dynamic Workflow Example
-```mermaid
-sequenceDiagram
-    participant User
-    participant Manager
-    participant HypAgent
-    participant SearchAgent
-    participant AnalysisAgent
-    participant ReportAgent
-    User->>Manager: "Research protein folding in Alzheimer's"
-    Note over Manager: PLAN: Generate hypotheses → Search → Analyze → Report
-    Manager->>HypAgent: Generate 3 hypotheses
-    HypAgent-->>Manager: Returns 3 hypotheses
-    Note over Manager: ASSESS: Good quality, proceed
-    Manager->>SearchAgent: Search literature for hypothesis 1
-    SearchAgent-->>Manager: Returns 15 papers
-    Note over Manager: ASSESS: Good results, continue
-    Manager->>SearchAgent: Search for hypothesis 2
-    SearchAgent-->>Manager: Only 2 papers found
-    Note over Manager: ASSESS: Insufficient, refine search
-    Manager->>SearchAgent: Refined query for hypothesis 2
-    SearchAgent-->>Manager: Returns 12 papers
-    Note over Manager: ASSESS: Better, proceed
-    Manager->>AnalysisAgent: Analyze evidence for all hypotheses
-    AnalysisAgent-->>Manager: Returns analysis with code
-    Note over Manager: ASSESS: Complete, generate report
-    Manager->>ReportAgent: Create comprehensive report
-    ReportAgent-->>Manager: Returns formatted report
-    Note over Manager: SYNTHESIZE: Combine all results
-    Manager->>User: Final Research Report
-```
-## 5. Manager Decision Logic
-```mermaid
-flowchart TD
-    Start([Manager Receives Task]) --> Plan[Create Initial Plan]
-    Plan --> Select[Select Agent for Next Subtask]
-    Select --> Execute[Execute Agent]
-    Execute --> Collect[Collect Results]
-    Collect --> Assess[Assess Quality & Progress]
-    Assess --> Q1{Quality Sufficient?}
-    Q1 -->|No| Q2{Same Agent Can Fix?}
-    Q2 -->|Yes| Feedback[Provide Specific Feedback]
-    Feedback --> Execute
-    Q2 -->|No| Different[Try Different Agent]
-    Different --> Select
-    Q1 -->|Yes| Q3{Task Complete?}
-    Q3 -->|No| Q4{Making Progress?}
-    Q4 -->|Yes| Select
-    Q4 -->|No - Stalled| Replan[Reset Plan & Approach]
-    Replan --> Plan
-    Q3 -->|Yes| Synth[Synthesize Final Result]
-    Synth --> Done([Return Report])
-    style Start fill:#e1f5e1
-    style Plan fill:#fff4e6
-    style Select fill:#ffe6e6
-    style Execute fill:#e6f3ff
-    style Assess fill:#ffd6d6
-    style Q1 fill:#ffe6e6
-    style Q2 fill:#ffe6e6
-    style Q3 fill:#ffe6e6
-    style Q4 fill:#ffe6e6
-    style Synth fill:#d4edda
-    style Done fill:#e1f5e1
-```
-## 6. Hypothesis Agent Workflow
-```mermaid
-flowchart LR
-    Input[Research Query] --> Domain[Identify Domain<br/>& Key Concepts]
-    Domain --> Context[Retrieve Background<br/>Knowledge]
-    Context --> Generate[Generate 3-5<br/>Initial Hypotheses]
-    Generate --> Refine[Refine for<br/>Testability]
-    Refine --> Rank[Rank by<br/>Quality Score]
-    Rank --> Output[Return Top<br/>Hypotheses]
-    Output --> Struct[Hypothesis Structure:<br/>• Statement<br/>• Rationale<br/>• Testability Score<br/>• Data Requirements<br/>• Expected Outcomes]
-    style Input fill:#e1f5e1
-    style Output fill:#fff4e6
-    style Struct fill:#e6f3ff
-```
-## 7. Search Agent Workflow
-```mermaid
-flowchart TD
-    Input[Hypotheses] --> Strategy[Formulate Search<br/>Strategy per Hypothesis]
-    Strategy --> Multi[Multi-Source Search]
-    Multi --> PubMed[PubMed Search<br/>via MCP]
-    Multi --> ArXiv[arXiv Search<br/>via MCP]
-    Multi --> BioRxiv[bioRxiv Search<br/>via MCP]
-    PubMed --> Aggregate[Aggregate Results]
-    ArXiv --> Aggregate
-    BioRxiv --> Aggregate
-    Aggregate --> Filter[Filter & Rank<br/>by Relevance]
-    Filter --> Dedup[Deduplicate<br/>Cross-Reference]
-    Dedup --> Embed[Embed Documents<br/>via MCP]
-    Embed --> Vector[(Vector DB)]
-    Vector --> RAGRetrieval[RAG Retrieval<br/>Top-K per Hypothesis]
-    RAGRetrieval --> Output[Return Contextualized<br/>Search Results]
-    style Input fill:#fff4e6
-    style Multi fill:#ffe6e6
-    style Vector fill:#ffe6f0
-    style Output fill:#e6f3ff
-```
-## 8. Analysis Agent Workflow
-```mermaid
-flowchart TD
-    Input1[Hypotheses] --> Extract
-    Input2[Search Results] --> Extract[Extract Evidence<br/>per Hypothesis]
-    Extract --> Methods[Determine Analysis<br/>Methods Needed]
-    Methods --> Branch{Requires<br/>Computation?}
-    Branch -->|Yes| GenCode[Generate Python<br/>Analysis Code]
-    Branch -->|No| Qual[Qualitative<br/>Synthesis]
-    GenCode --> Execute[Execute Code<br/>via MCP Sandbox]
-    Execute --> Interpret1[Interpret<br/>Results]
-    Qual --> Interpret2[Interpret<br/>Findings]
-    Interpret1 --> Synthesize[Synthesize Evidence<br/>Across Sources]
-    Interpret2 --> Synthesize
-    Synthesize --> Verdict[Determine Verdict<br/>per Hypothesis]
-    Verdict --> Support[• Supported<br/>• Refuted<br/>• Inconclusive]
-    Support --> Gaps[Identify Knowledge<br/>Gaps & Limitations]
-    Gaps --> Output[Return Analysis<br/>Report]
-    style Input1 fill:#fff4e6
-    style Input2 fill:#e6f3ff
-    style Execute fill:#ffe6e6
-    style Output fill:#e6ffe6
-```
-## 9. Report Agent Workflow
-```mermaid
-flowchart TD
-    Input1[Query] --> Assemble
-    Input2[Hypotheses] --> Assemble
-    Input3[Search Results] --> Assemble
-    Input4[Analysis] --> Assemble[Assemble Report<br/>Sections]
-    Assemble --> Exec[Executive Summary]
-    Assemble --> Intro[Introduction]
-    Assemble --> Methods[Methods]
-    Assemble --> Results[Results per<br/>Hypothesis]
-    Assemble --> Discussion[Discussion]
-    Assemble --> Future[Future Directions]
-    Assemble --> Refs[References]
-    Results --> VizCheck{Needs<br/>Visualization?}
-    VizCheck -->|Yes| GenViz[Generate Viz Code]
-    GenViz --> ExecViz[Execute via MCP<br/>Create Charts]
-    ExecViz --> Combine
-    VizCheck -->|No| Combine[Combine All<br/>Sections]
-    Exec --> Combine
-    Intro --> Combine
-    Methods --> Combine
-    Discussion --> Combine
-    Future --> Combine
-    Refs --> Combine
-    Combine --> Format[Format Output]
-    Format --> MD[Markdown]
-    Format --> PDF[PDF]
-    Format --> JSON[JSON]
-    MD --> Output[Return Final<br/>Report]
-    PDF --> Output
-    JSON --> Output
-    style Input1 fill:#e1f5e1
-    style Input2 fill:#fff4e6
-    style Input3 fill:#e6f3ff
-    style Input4 fill:#e6ffe6
-    style Output fill:#d4edda
-```
-## 10. Data Flow & Event Streaming
-```mermaid
-flowchart TD
-    User[👤 User] -->|Research Query| UI[Gradio UI]
-    UI -->|Submit| Manager[Magentic Manager]
-    Manager -->|Event: Planning| UI
-    Manager -->|Select Agent| HypAgent[Hypothesis Agent]
-    HypAgent -->|Event: Delta/Message| UI
-    HypAgent -->|Hypotheses| Context[(Shared Context)]
-    Context -->|Retrieved by| Manager
-    Manager -->|Select Agent| SearchAgent[Search Agent]
-    SearchAgent -->|MCP Request| WebSearch[Web Search Tool]
-    WebSearch -->|Results| SearchAgent
-    SearchAgent -->|Event: Delta/Message| UI
-    SearchAgent -->|Documents| Context
-    SearchAgent -->|Embeddings| VectorDB[(Vector DB)]
-    Context -->|Retrieved by| Manager
-    Manager -->|Select Agent| AnalysisAgent[Analysis Agent]
-    AnalysisAgent -->|MCP Request| CodeExec[Code Execution Tool]
-    CodeExec -->|Results| AnalysisAgent
-    AnalysisAgent -->|Event: Delta/Message| UI
-    AnalysisAgent -->|Analysis| Context
-    Context -->|Retrieved by| Manager
-    Manager -->|Select Agent| ReportAgent[Report Agent]
-    ReportAgent -->|MCP Request| CodeExec
-    ReportAgent -->|Event: Delta/Message| UI
-    ReportAgent -->|Report| Context
-    Manager -->|Event: Final Result| UI
-    UI -->|Display| User
-    style User fill:#e1f5e1
-    style UI fill:#e6f3ff
-    style Manager fill:#ffe6e6
-    style Context fill:#ffe6f0
-    style VectorDB fill:#ffe6f0
-    style WebSearch fill:#f0f0f0
-    style CodeExec fill:#f0f0f0
-```
-## 11. MCP Tool Architecture
-```mermaid
-graph TB
-    subgraph "Agent Layer"
-        Manager[Magentic Manager]
-        HypAgent[Hypothesis Agent]
-        SearchAgent[Search Agent]
-        AnalysisAgent[Analysis Agent]
-        ReportAgent[Report Agent]
-    end
-    subgraph "MCP Protocol Layer"
-        Registry[MCP Tool Registry<br/>• Discovers tools<br/>• Routes requests<br/>• Manages connections]
-    end
-    subgraph "MCP Servers"
-        Server1[Web Search Server<br/>localhost:8001<br/>• PubMed<br/>• arXiv<br/>• bioRxiv]
-        Server2[Code Execution Server<br/>localhost:8002<br/>• Sandboxed Python<br/>• Package management]
-        Server3[RAG Server<br/>localhost:8003<br/>• Vector embeddings<br/>• Similarity search]
-        Server4[Visualization Server<br/>localhost:8004<br/>• Chart generation<br/>• Plot rendering]
-    end
-    subgraph "External Services"
-        PubMed[PubMed API]
-        ArXiv[arXiv API]
-        BioRxiv[bioRxiv API]
-        Modal[Modal Sandbox]
-        ChromaDB[(ChromaDB)]
-    end
-    SearchAgent -->|Request| Registry
-    AnalysisAgent -->|Request| Registry
-    ReportAgent -->|Request| Registry
-    Registry --> Server1
-    Registry --> Server2
-    Registry --> Server3
-    Registry --> Server4
-    Server1 --> PubMed
-    Server1 --> ArXiv
-    Server1 --> BioRxiv
-    Server2 --> Modal
-    Server3 --> ChromaDB
-    style Manager fill:#ffe6e6
-    style Registry fill:#fff4e6
-    style Server1 fill:#e6f3ff
-    style Server2 fill:#e6f3ff
-    style Server3 fill:#e6f3ff
-    style Server4 fill:#e6f3ff
-```
-## 12. Progress Tracking & Stall Detection
-```mermaid
-stateDiagram-v2
-    [*] --> Initialization: User Query
-    Initialization --> Planning: Manager starts
-    Planning --> AgentExecution: Select agent
-    AgentExecution --> Assessment: Collect results
-    Assessment --> QualityCheck: Evaluate output
-    QualityCheck --> AgentExecution: Poor quality<br/>(retry < max_rounds)
-    QualityCheck --> Planning: Poor quality<br/>(try different agent)
-    QualityCheck --> NextAgent: Good quality<br/>(task incomplete)
-    QualityCheck --> Synthesis: Good quality<br/>(task complete)
-    NextAgent --> AgentExecution: Select next agent
-    state StallDetection <<choice>>
-    Assessment --> StallDetection: Check progress
-    StallDetection --> Planning: No progress<br/>(stall count < max)
-    StallDetection --> ErrorRecovery: No progress<br/>(max stalls reached)
-    ErrorRecovery --> PartialReport: Generate partial results
-    PartialReport --> [*]
-    Synthesis --> FinalReport: Combine all outputs
-    FinalReport --> [*]
-    note right of QualityCheck
-        Manager assesses:
-        • Output completeness
-        • Quality metrics
-        • Progress made
-    end note
-    note right of StallDetection
-        Stall = no new progress
-        after agent execution
-        Triggers plan reset
-    end note
-```
-## 13. Gradio UI Integration
-```mermaid
-graph TD
-    App[Gradio App<br/>DeepCritical Research Agent]
-    App --> Input[Input Section]
-    App --> Status[Status Section]
-    App --> Output[Output Section]
-    Input --> Query[Research Question<br/>Text Area]
-    Input --> Controls[Controls]
-    Controls --> MaxHyp[Max Hypotheses: 1-10]
-    Controls --> MaxRounds[Max Rounds: 5-20]
-    Controls --> Submit[Start Research Button]
-    Status --> Log[Real-time Event Log<br/>• Manager planning<br/>• Agent selection<br/>• Execution updates<br/>• Quality assessment]
-    Status --> Progress[Progress Tracker<br/>• Current agent<br/>• Round count<br/>• Stall count]
-    Output --> Tabs[Tabbed Results]
-    Tabs --> Tab1[Hypotheses Tab<br/>Generated hypotheses with scores]
-    Tabs --> Tab2[Search Results Tab<br/>Papers & sources found]
-    Tabs --> Tab3[Analysis Tab<br/>Evidence & verdicts]
-    Tabs --> Tab4[Report Tab<br/>Final research report]
-    Tab4 --> Download[Download Report<br/>MD / PDF / JSON]
-    Submit -.->|Triggers| Workflow[Magentic Workflow]
-    Workflow -.->|MagenticOrchestratorMessageEvent| Log
-    Workflow -.->|MagenticAgentDeltaEvent| Log
-    Workflow -.->|MagenticAgentMessageEvent| Log
-    Workflow -.->|MagenticFinalResultEvent| Tab4
-    style App fill:#e1f5e1
-    style Input fill:#fff4e6
-    style Status fill:#e6f3ff
-    style Output fill:#e6ffe6
-    style Workflow fill:#ffe6e6
-```
-## 14. Complete System Context
-```mermaid
-graph LR
-    User[👤 Researcher<br/>Asks research questions] -->|Submits query| DC[DeepCritical<br/>Magentic Workflow]
-    DC -->|Literature search| PubMed[PubMed API<br/>Medical papers]
-    DC -->|Preprint search| ArXiv[arXiv API<br/>Scientific preprints]
-    DC -->|Biology search| BioRxiv[bioRxiv API<br/>Biology preprints]
-    DC -->|Agent reasoning| Claude[Claude API<br/>Sonnet 4 / Opus]
-    DC -->|Code execution| Modal[Modal Sandbox<br/>Safe Python env]
-    DC -->|Vector storage| Chroma[ChromaDB<br/>Embeddings & RAG]
-    DC -->|Deployed on| HF[HuggingFace Spaces<br/>Gradio 6.0]
-    PubMed -->|Results| DC
-    ArXiv -->|Results| DC
-    BioRxiv -->|Results| DC
-    Claude -->|Responses| DC
-    Modal -->|Output| DC
-    Chroma -->|Context| DC
-    DC -->|Research report| User
-    style User fill:#e1f5e1
-    style DC fill:#ffe6e6
-    style PubMed fill:#e6f3ff
-    style ArXiv fill:#e6f3ff
-    style BioRxiv fill:#e6f3ff
-    style Claude fill:#ffd6d6
-    style Modal fill:#f0f0f0
-    style Chroma fill:#ffe6f0
-    style HF fill:#d4edda
-```
-## 15. Workflow Timeline (Simplified)
-```mermaid
-gantt
-    title DeepCritical Magentic Workflow - Typical Execution
-    dateFormat mm:ss
-    axisFormat %M:%S
-    section Manager Planning
-    Initial planning         :p1, 00:00, 10s
-    section Hypothesis Agent
-    Generate hypotheses      :h1, after p1, 30s
-    Manager assessment       :h2, after h1, 5s
-    section Search Agent
-    Search hypothesis 1      :s1, after h2, 20s
-    Search hypothesis 2      :s2, after s1, 20s
-    Search hypothesis 3      :s3, after s2, 20s
-    RAG processing          :s4, after s3, 15s
-    Manager assessment      :s5, after s4, 5s
-    section Analysis Agent
-    Evidence extraction     :a1, after s5, 15s
-    Code generation        :a2, after a1, 20s
-    Code execution         :a3, after a2, 25s
-    Synthesis              :a4, after a3, 20s
-    Manager assessment     :a5, after a4, 5s
-    section Report Agent
-    Report assembly        :r1, after a5, 30s
-    Visualization          :r2, after r1, 15s
-    Formatting             :r3, after r2, 10s
-    section Manager Synthesis
-    Final synthesis        :f1, after r3, 10s
-```
----
-## Key Differences from Original Design
-| Aspect | Original (Judge-in-Loop) | New (Magentic) |
-|--------|-------------------------|----------------|
-| **Control Flow** | Fixed sequential phases | Dynamic agent selection |
-| **Quality Control** | Separate Judge Agent | Manager assessment built-in |
-| **Retry Logic** | Phase-level with feedback | Agent-level with adaptation |
-| **Flexibility** | Rigid 4-phase pipeline | Adaptive workflow |
-| **Complexity** | 5 agents (including Judge) | 4 agents (no Judge) |
-| **Progress Tracking** | Manual state management | Built-in round/stall detection |
-| **Agent Coordination** | Sequential handoff | Manager-driven dynamic selection |
-| **Error Recovery** | Retry same phase | Try different agent or replan |
----
-## Simplified Design Principles
-1. **Manager is Intelligent**: LLM-powered manager handles planning, selection, and quality assessment
-2. **No Separate Judge**: Manager's assessment phase replaces dedicated Judge Agent
-3. **Dynamic Workflow**: Agents can be called multiple times in any order based on need
-4. **Built-in Safety**: max_round_count (15) and max_stall_count (3) prevent infinite loops
-5. **Event-Driven UI**: Real-time streaming updates to Gradio interface
-6. **MCP-Powered Tools**: All external capabilities via Model Context Protocol
-7. **Shared Context**: Centralized state accessible to all agents
-8. **Progress Awareness**: Manager tracks what's been done and what's needed
----
-## Legend
-- 🔴 **Red/Pink**: Manager, orchestration, decision-making
-- 🟡 **Yellow/Orange**: Specialist agents, processing
-- 🔵 **Blue**: Data, tools, MCP services
-- 🟣 **Purple/Pink**: Storage, databases, state
-- 🟢 **Green**: User interactions, final outputs
-- ⚪ **Gray**: External services, APIs
----
-## Implementation Highlights
-**Simple 4-Agent Setup:**
-```python
-workflow = (
-    MagenticBuilder()
-    .participants(
-        hypothesis=HypothesisAgent(tools=[background_tool]),
-        search=SearchAgent(tools=[web_search, rag_tool]),
-        analysis=AnalysisAgent(tools=[code_execution]),
-        report=ReportAgent(tools=[code_execution, visualization])
-    )
-    .with_standard_manager(
-        chat_client=AnthropicClient(model="claude-sonnet-4"),
-        max_round_count=15,    # Prevent infinite loops
-        max_stall_count=3      # Detect stuck workflows
-    )
-    .build()
-)
-```
-**Manager handles quality assessment in its instructions:**
-- Checks hypothesis quality (testable, novel, clear)
-- Validates search results (relevant, authoritative, recent)
-- Assesses analysis soundness (methodology, evidence, conclusions)
-- Ensures report completeness (all sections, proper citations)
-No separate Judge Agent needed - manager does it all!
----
-**Document Version**: 2.0 (Magentic Simplified)
-**Last Updated**: 2025-11-24
-**Architecture**: Microsoft Magentic Orchestration Pattern
-**Agents**: 4 (Hypothesis, Search, Analysis, Report) + 1 Manager
-**License**: MIT

docs/configuration/CONFIGURATION.md DELETED Viewed

@@ -1,743 +0,0 @@
-# Configuration Guide
-## Overview
-DeepCritical uses **Pydantic Settings** for centralized configuration management. All settings are defined in the `Settings` class in `src/utils/config.py` and can be configured via environment variables or a `.env` file.
-The configuration system provides:
-- **Type Safety**: Strongly-typed fields with Pydantic validation
-- **Environment File Support**: Automatically loads from `.env` file (if present)
-- **Case-Insensitive**: Environment variables are case-insensitive
-- **Singleton Pattern**: Global `settings` instance for easy access throughout the codebase
-- **Validation**: Automatic validation on load with helpful error messages
-## Quick Start
-1. Create a `.env` file in the project root
-2. Set at least one LLM API key (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, or `HF_TOKEN`)
-3. Optionally configure other services as needed
-4. The application will automatically load and validate your configuration
-## Configuration System Architecture
-### Settings Class
-The `Settings` class extends `BaseSettings` from `pydantic_settings` and defines all application configuration:
-```13:21:src/utils/config.py
-class Settings(BaseSettings):
-    """Strongly-typed application settings."""
-    model_config = SettingsConfigDict(
-        env_file=".env",
-        env_file_encoding="utf-8",
-        case_sensitive=False,
-        extra="ignore",
-    )
-```
-### Singleton Instance
-A global `settings` instance is available for import:
-```234:235:src/utils/config.py
-# Singleton for easy import
-settings = get_settings()
-```
-### Usage Pattern
-Access configuration throughout the codebase:
-```python
-from src.utils.config import settings
-# Check if API keys are available
-if settings.has_openai_key:
-    # Use OpenAI
-    pass
-# Access configuration values
-max_iterations = settings.max_iterations
-web_search_provider = settings.web_search_provider
-```
-## Required Configuration
-### LLM Provider
-You must configure at least one LLM provider. The system supports:
-- **OpenAI**: Requires `OPENAI_API_KEY`
-- **Anthropic**: Requires `ANTHROPIC_API_KEY`
-- **HuggingFace**: Optional `HF_TOKEN` or `HUGGINGFACE_API_KEY` (can work without key for public models)
-#### OpenAI Configuration
-```bash
-LLM_PROVIDER=openai
-OPENAI_API_KEY=your_openai_api_key_here
-OPENAI_MODEL=gpt-5.1
-```
-The default model is defined in the `Settings` class:
-```29:29:src/utils/config.py
-    openai_model: str = Field(default="gpt-5.1", description="OpenAI model name")
-```
-#### Anthropic Configuration
-```bash
-LLM_PROVIDER=anthropic
-ANTHROPIC_API_KEY=your_anthropic_api_key_here
-ANTHROPIC_MODEL=claude-sonnet-4-5-20250929
-```
-The default model is defined in the `Settings` class:
-```30:32:src/utils/config.py
-    anthropic_model: str = Field(
-        default="claude-sonnet-4-5-20250929", description="Anthropic model"
-    )
-```
-#### HuggingFace Configuration
-HuggingFace can work without an API key for public models, but an API key provides higher rate limits:
-```bash
-# Option 1: Using HF_TOKEN (preferred)
-HF_TOKEN=your_huggingface_token_here
-# Option 2: Using HUGGINGFACE_API_KEY (alternative)
-HUGGINGFACE_API_KEY=your_huggingface_api_key_here
-# Default model
-HUGGINGFACE_MODEL=meta-llama/Llama-3.1-8B-Instruct
-```
-The HuggingFace token can be set via either environment variable:
-```33:35:src/utils/config.py
-    hf_token: str | None = Field(
-        default=None, alias="HF_TOKEN", description="HuggingFace API token"
-    )
-```
-```57:59:src/utils/config.py
-    huggingface_api_key: str | None = Field(
-        default=None, description="HuggingFace API token (HF_TOKEN or HUGGINGFACE_API_KEY)"
-    )
-```
-## Optional Configuration
-### Embedding Configuration
-DeepCritical supports multiple embedding providers for semantic search and RAG:
-```bash
-# Embedding Provider: "openai", "local", or "huggingface"
-EMBEDDING_PROVIDER=local
-# OpenAI Embedding Model (used by LlamaIndex RAG)
-OPENAI_EMBEDDING_MODEL=text-embedding-3-small
-# Local Embedding Model (sentence-transformers, used by EmbeddingService)
-LOCAL_EMBEDDING_MODEL=all-MiniLM-L6-v2
-# HuggingFace Embedding Model
-HUGGINGFACE_EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
-```
-The embedding provider configuration:
-```47:50:src/utils/config.py
-    embedding_provider: Literal["openai", "local", "huggingface"] = Field(
-        default="local",
-        description="Embedding provider to use",
-    )
-```
-**Note**: OpenAI embeddings require `OPENAI_API_KEY`. The local provider (default) uses sentence-transformers and requires no API key.
-### Web Search Configuration
-DeepCritical supports multiple web search providers:
-```bash
-# Web Search Provider: "serper", "searchxng", "brave", "tavily", or "duckduckgo"
-# Default: "duckduckgo" (no API key required)
-WEB_SEARCH_PROVIDER=duckduckgo
-# Serper API Key (for Google search via Serper)
-SERPER_API_KEY=your_serper_api_key_here
-# SearchXNG Host URL (for self-hosted search)
-SEARCHXNG_HOST=http://localhost:8080
-# Brave Search API Key
-BRAVE_API_KEY=your_brave_api_key_here
-# Tavily API Key
-TAVILY_API_KEY=your_tavily_api_key_here
-```
-The web search provider configuration:
-```71:74:src/utils/config.py
-    web_search_provider: Literal["serper", "searchxng", "brave", "tavily", "duckduckgo"] = Field(
-        default="duckduckgo",
-        description="Web search provider to use",
-    )
-```
-**Note**: DuckDuckGo is the default and requires no API key, making it ideal for development and testing.
-### PubMed Configuration
-PubMed search supports optional NCBI API key for higher rate limits:
-```bash
-# NCBI API Key (optional, for higher rate limits: 10 req/sec vs 3 req/sec)
-NCBI_API_KEY=your_ncbi_api_key_here
-```
-The PubMed tool uses this configuration:
-```22:29:src/tools/pubmed.py
-    def __init__(self, api_key: str | None = None) -> None:
-        self.api_key = api_key or settings.ncbi_api_key
-        # Ignore placeholder values from .env.example
-        if self.api_key == "your-ncbi-key-here":
-            self.api_key = None
-        # Use shared rate limiter
-        self._limiter = get_pubmed_limiter(self.api_key)
-```
-### Agent Configuration
-Control agent behavior and research loop execution:
-```bash
-# Maximum iterations per research loop (1-50, default: 10)
-MAX_ITERATIONS=10
-# Search timeout in seconds
-SEARCH_TIMEOUT=30
-# Use graph-based execution for research flows
-USE_GRAPH_EXECUTION=false
-```
-The agent configuration fields:
-```80:85:src/utils/config.py
-    # Agent Configuration
-    max_iterations: int = Field(default=10, ge=1, le=50)
-    search_timeout: int = Field(default=30, description="Seconds to wait for search")
-    use_graph_execution: bool = Field(
-        default=False, description="Use graph-based execution for research flows"
-    )
-```
-### Budget & Rate Limiting Configuration
-Control resource limits for research loops:
-```bash
-# Default token budget per research loop (1000-1000000, default: 100000)
-DEFAULT_TOKEN_LIMIT=100000
-# Default time limit per research loop in minutes (1-120, default: 10)
-DEFAULT_TIME_LIMIT_MINUTES=10
-# Default iterations limit per research loop (1-50, default: 10)
-DEFAULT_ITERATIONS_LIMIT=10
-```
-The budget configuration with validation:
-```87:105:src/utils/config.py
-    # Budget & Rate Limiting Configuration
-    default_token_limit: int = Field(
-        default=100000,
-        ge=1000,
-        le=1000000,
-        description="Default token budget per research loop",
-    )
-    default_time_limit_minutes: int = Field(
-        default=10,
-        ge=1,
-        le=120,
-        description="Default time limit per research loop (minutes)",
-    )
-    default_iterations_limit: int = Field(
-        default=10,
-        ge=1,
-        le=50,
-        description="Default iterations limit per research loop",
-    )
-```
-### RAG Service Configuration
-Configure the Retrieval-Augmented Generation service:
-```bash
-# ChromaDB collection name for RAG
-RAG_COLLECTION_NAME=deepcritical_evidence
-# Number of top results to retrieve from RAG (1-50, default: 5)
-RAG_SIMILARITY_TOP_K=5
-# Automatically ingest evidence into RAG
-RAG_AUTO_INGEST=true
-```
-The RAG configuration:
-```127:141:src/utils/config.py
-    # RAG Service Configuration
-    rag_collection_name: str = Field(
-        default="deepcritical_evidence",
-        description="ChromaDB collection name for RAG",
-    )
-    rag_similarity_top_k: int = Field(
-        default=5,
-        ge=1,
-        le=50,
-        description="Number of top results to retrieve from RAG",
-    )
-    rag_auto_ingest: bool = Field(
-        default=True,
-        description="Automatically ingest evidence into RAG",
-    )
-```
-### ChromaDB Configuration
-Configure the vector database for embeddings and RAG:
-```bash
-# ChromaDB storage path
-CHROMA_DB_PATH=./chroma_db
-# Whether to persist ChromaDB to disk
-CHROMA_DB_PERSIST=true
-# ChromaDB server host (for remote ChromaDB, optional)
-CHROMA_DB_HOST=localhost
-# ChromaDB server port (for remote ChromaDB, optional)
-CHROMA_DB_PORT=8000
-```
-The ChromaDB configuration:
-```113:125:src/utils/config.py
-    chroma_db_path: str = Field(default="./chroma_db", description="ChromaDB storage path")
-    chroma_db_persist: bool = Field(
-        default=True,
-        description="Whether to persist ChromaDB to disk",
-    )
-    chroma_db_host: str | None = Field(
-        default=None,
-        description="ChromaDB server host (for remote ChromaDB)",
-    )
-    chroma_db_port: int | None = Field(
-        default=None,
-        description="ChromaDB server port (for remote ChromaDB)",
-    )
-```
-### External Services
-#### Modal Configuration
-Modal is used for secure sandbox execution of statistical analysis:
-```bash
-# Modal Token ID (for Modal sandbox execution)
-MODAL_TOKEN_ID=your_modal_token_id_here
-# Modal Token Secret
-MODAL_TOKEN_SECRET=your_modal_token_secret_here
-```
-The Modal configuration:
-```110:112:src/utils/config.py
-    # External Services
-    modal_token_id: str | None = Field(default=None, description="Modal token ID")
-    modal_token_secret: str | None = Field(default=None, description="Modal token secret")
-```
-### Logging Configuration
-Configure structured logging:
-```bash
-# Log Level: "DEBUG", "INFO", "WARNING", or "ERROR"
-LOG_LEVEL=INFO
-```
-The logging configuration:
-```107:108:src/utils/config.py
-    # Logging
-    log_level: Literal["DEBUG", "INFO", "WARNING", "ERROR"] = "INFO"
-```
-Logging is configured via the `configure_logging()` function:
-```212:231:src/utils/config.py
-def configure_logging(settings: Settings) -> None:
-    """Configure structured logging with the configured log level."""
-    # Set stdlib logging level from settings
-    logging.basicConfig(
-        level=getattr(logging, settings.log_level),
-        format="%(message)s",
-    )
-    structlog.configure(
-        processors=[
-            structlog.stdlib.filter_by_level,
-            structlog.stdlib.add_logger_name,
-            structlog.stdlib.add_log_level,
-            structlog.processors.TimeStamper(fmt="iso"),
-            structlog.processors.JSONRenderer(),
-        ],
-        wrapper_class=structlog.stdlib.BoundLogger,
-        context_class=dict,
-        logger_factory=structlog.stdlib.LoggerFactory(),
-    )
-```
-## Configuration Properties
-The `Settings` class provides helpful properties for checking configuration state:
-### API Key Availability
-Check which API keys are available:
-```171:189:src/utils/config.py
-    @property
-    def has_openai_key(self) -> bool:
-        """Check if OpenAI API key is available."""
-        return bool(self.openai_api_key)
-    @property
-    def has_anthropic_key(self) -> bool:
-        """Check if Anthropic API key is available."""
-        return bool(self.anthropic_api_key)
-    @property
-    def has_huggingface_key(self) -> bool:
-        """Check if HuggingFace API key is available."""
-        return bool(self.huggingface_api_key or self.hf_token)
-    @property
-    def has_any_llm_key(self) -> bool:
-        """Check if any LLM API key is available."""
-        return self.has_openai_key or self.has_anthropic_key or self.has_huggingface_key
-```
-**Usage:**
-```python
-from src.utils.config import settings
-# Check API key availability
-if settings.has_openai_key:
-    # Use OpenAI
-    pass
-if settings.has_anthropic_key:
-    # Use Anthropic
-    pass
-if settings.has_huggingface_key:
-    # Use HuggingFace
-    pass
-if settings.has_any_llm_key:
-    # At least one LLM is available
-    pass
-```
-### Service Availability
-Check if external services are configured:
-```143:146:src/utils/config.py
-    @property
-    def modal_available(self) -> bool:
-        """Check if Modal credentials are configured."""
-        return bool(self.modal_token_id and self.modal_token_secret)
-```
-```191:204:src/utils/config.py
-    @property
-    def web_search_available(self) -> bool:
-        """Check if web search is available (either no-key provider or API key present)."""
-        if self.web_search_provider == "duckduckgo":
-            return True  # No API key required
-        if self.web_search_provider == "serper":
-            return bool(self.serper_api_key)
-        if self.web_search_provider == "searchxng":
-            return bool(self.searchxng_host)
-        if self.web_search_provider == "brave":
-            return bool(self.brave_api_key)
-        if self.web_search_provider == "tavily":
-            return bool(self.tavily_api_key)
-        return False
-```
-**Usage:**
-```python
-from src.utils.config import settings
-# Check service availability
-if settings.modal_available:
-    # Use Modal sandbox
-    pass
-if settings.web_search_available:
-    # Web search is configured
-    pass
-```
-### API Key Retrieval
-Get the API key for the configured provider:
-```148:160:src/utils/config.py
-    def get_api_key(self) -> str:
-        """Get the API key for the configured provider."""
-        if self.llm_provider == "openai":
-            if not self.openai_api_key:
-                raise ConfigurationError("OPENAI_API_KEY not set")
-            return self.openai_api_key
-        if self.llm_provider == "anthropic":
-            if not self.anthropic_api_key:
-                raise ConfigurationError("ANTHROPIC_API_KEY not set")
-            return self.anthropic_api_key
-        raise ConfigurationError(f"Unknown LLM provider: {self.llm_provider}")
-```
-For OpenAI-specific operations (e.g., Magentic mode):
-```162:169:src/utils/config.py
-    def get_openai_api_key(self) -> str:
-        """Get OpenAI API key (required for Magentic function calling)."""
-        if not self.openai_api_key:
-            raise ConfigurationError(
-                "OPENAI_API_KEY not set. Magentic mode requires OpenAI for function calling. "
-                "Use mode='simple' for other providers."
-            )
-        return self.openai_api_key
-```
-## Configuration Usage in Codebase
-The configuration system is used throughout the codebase:
-### LLM Factory
-The LLM factory uses settings to create appropriate models:
-```129:144:src/utils/llm_factory.py
-    if settings.llm_provider == "huggingface":
-        model_name = settings.huggingface_model or "meta-llama/Llama-3.1-8B-Instruct"
-        hf_provider = HuggingFaceProvider(api_key=settings.hf_token)
-        return HuggingFaceModel(model_name, provider=hf_provider)
-    if settings.llm_provider == "openai":
-        if not settings.openai_api_key:
-            raise ConfigurationError("OPENAI_API_KEY not set for pydantic-ai")
-        provider = OpenAIProvider(api_key=settings.openai_api_key)
-        return OpenAIModel(settings.openai_model, provider=provider)
-    if settings.llm_provider == "anthropic":
-        if not settings.anthropic_api_key:
-            raise ConfigurationError("ANTHROPIC_API_KEY not set for pydantic-ai")
-        anthropic_provider = AnthropicProvider(api_key=settings.anthropic_api_key)
-        return AnthropicModel(settings.anthropic_model, provider=anthropic_provider)
-```
-### Embedding Service
-The embedding service uses local embedding model configuration:
-```29:31:src/services/embeddings.py
-    def __init__(self, model_name: str | None = None):
-        self._model_name = model_name or settings.local_embedding_model
-        self._model = SentenceTransformer(self._model_name)
-```
-### Orchestrator Factory
-The orchestrator factory uses settings to determine mode:
-```69:80:src/orchestrator_factory.py
-def _determine_mode(explicit_mode: str | None) -> str:
-    """Determine which mode to use."""
-    if explicit_mode:
-        if explicit_mode in ("magentic", "advanced"):
-            return "advanced"
-        return "simple"
-    # Auto-detect: advanced if paid API key available
-    if settings.has_openai_key:
-        return "advanced"
-    return "simple"
-```
-## Environment Variables Reference
-### Required (at least one LLM)
-- `OPENAI_API_KEY` - OpenAI API key (required for OpenAI provider)
-- `ANTHROPIC_API_KEY` - Anthropic API key (required for Anthropic provider)
-- `HF_TOKEN` or `HUGGINGFACE_API_KEY` - HuggingFace API token (optional, can work without for public models)
-#### LLM Configuration Variables
-- `LLM_PROVIDER` - Provider to use: `"openai"`, `"anthropic"`, or `"huggingface"` (default: `"huggingface"`)
-- `OPENAI_MODEL` - OpenAI model name (default: `"gpt-5.1"`)
-- `ANTHROPIC_MODEL` - Anthropic model name (default: `"claude-sonnet-4-5-20250929"`)
-- `HUGGINGFACE_MODEL` - HuggingFace model ID (default: `"meta-llama/Llama-3.1-8B-Instruct"`)
-#### Embedding Configuration Variables
-- `EMBEDDING_PROVIDER` - Provider: `"openai"`, `"local"`, or `"huggingface"` (default: `"local"`)
-- `OPENAI_EMBEDDING_MODEL` - OpenAI embedding model (default: `"text-embedding-3-small"`)
-- `LOCAL_EMBEDDING_MODEL` - Local sentence-transformers model (default: `"all-MiniLM-L6-v2"`)
-- `HUGGINGFACE_EMBEDDING_MODEL` - HuggingFace embedding model (default: `"sentence-transformers/all-MiniLM-L6-v2"`)
-#### Web Search Configuration Variables
-- `WEB_SEARCH_PROVIDER` - Provider: `"serper"`, `"searchxng"`, `"brave"`, `"tavily"`, or `"duckduckgo"` (default: `"duckduckgo"`)
-- `SERPER_API_KEY` - Serper API key (required for Serper provider)
-- `SEARCHXNG_HOST` - SearchXNG host URL (required for SearchXNG provider)
-- `BRAVE_API_KEY` - Brave Search API key (required for Brave provider)
-- `TAVILY_API_KEY` - Tavily API key (required for Tavily provider)
-#### PubMed Configuration Variables
-- `NCBI_API_KEY` - NCBI API key (optional, increases rate limit from 3 to 10 req/sec)
-#### Agent Configuration Variables
-- `MAX_ITERATIONS` - Maximum iterations per research loop (1-50, default: `10`)
-- `SEARCH_TIMEOUT` - Search timeout in seconds (default: `30`)
-- `USE_GRAPH_EXECUTION` - Use graph-based execution (default: `false`)
-#### Budget Configuration Variables
-- `DEFAULT_TOKEN_LIMIT` - Default token budget per research loop (1000-1000000, default: `100000`)
-- `DEFAULT_TIME_LIMIT_MINUTES` - Default time limit in minutes (1-120, default: `10`)
-- `DEFAULT_ITERATIONS_LIMIT` - Default iterations limit (1-50, default: `10`)
-#### RAG Configuration Variables
-- `RAG_COLLECTION_NAME` - ChromaDB collection name (default: `"deepcritical_evidence"`)
-- `RAG_SIMILARITY_TOP_K` - Number of top results to retrieve (1-50, default: `5`)
-- `RAG_AUTO_INGEST` - Automatically ingest evidence into RAG (default: `true`)
-#### ChromaDB Configuration Variables
-- `CHROMA_DB_PATH` - ChromaDB storage path (default: `"./chroma_db"`)
-- `CHROMA_DB_PERSIST` - Whether to persist ChromaDB to disk (default: `true`)
-- `CHROMA_DB_HOST` - ChromaDB server host (optional, for remote ChromaDB)
-- `CHROMA_DB_PORT` - ChromaDB server port (optional, for remote ChromaDB)
-#### External Services Variables
-- `MODAL_TOKEN_ID` - Modal token ID (optional, for Modal sandbox execution)
-- `MODAL_TOKEN_SECRET` - Modal token secret (optional, for Modal sandbox execution)
-#### Logging Configuration Variables
-- `LOG_LEVEL` - Log level: `"DEBUG"`, `"INFO"`, `"WARNING"`, or `"ERROR"` (default: `"INFO"`)
-## Validation
-Settings are validated on load using Pydantic validation:
-- **Type Checking**: All fields are strongly typed
-- **Range Validation**: Numeric fields have min/max constraints (e.g., `ge=1, le=50` for `max_iterations`)
-- **Literal Validation**: Enum fields only accept specific values (e.g., `Literal["openai", "anthropic", "huggingface"]`)
-- **Required Fields**: API keys are checked when accessed via `get_api_key()` or `get_openai_api_key()`
-### Validation Examples
-The `max_iterations` field has range validation:
-```81:81:src/utils/config.py
-    max_iterations: int = Field(default=10, ge=1, le=50)
-```
-The `llm_provider` field has literal validation:
-```26:28:src/utils/config.py
-    llm_provider: Literal["openai", "anthropic", "huggingface"] = Field(
-        default="openai", description="Which LLM provider to use"
-    )
-```
-## Error Handling
-Configuration errors raise `ConfigurationError` from `src/utils/exceptions.py`:
-```22:25:src/utils/exceptions.py
-class ConfigurationError(DeepCriticalError):
-    """Raised when configuration is invalid."""
-    pass
-```
-### Error Handling Example
-```python
-from src.utils.config import settings
-from src.utils.exceptions import ConfigurationError
-try:
-    api_key = settings.get_api_key()
-except ConfigurationError as e:
-    print(f"Configuration error: {e}")
-```
-### Common Configuration Errors
-1. **Missing API Key**: When `get_api_key()` is called but the required API key is not set
-2. **Invalid Provider**: When `llm_provider` is set to an unsupported value
-3. **Out of Range**: When numeric values exceed their min/max constraints
-4. **Invalid Literal**: When enum fields receive unsupported values
-## Configuration Best Practices
-1. **Use `.env` File**: Store sensitive keys in `.env` file (add to `.gitignore`)
-2. **Check Availability**: Use properties like `has_openai_key` before accessing API keys
-3. **Handle Errors**: Always catch `ConfigurationError` when calling `get_api_key()`
-4. **Validate Early**: Configuration is validated on import, so errors surface immediately
-5. **Use Defaults**: Leverage sensible defaults for optional configuration
-## Future Enhancements
-The following configurations are planned for future phases:
-1. **Additional LLM Providers**: DeepSeek, OpenRouter, Gemini, Perplexity, Azure OpenAI, Local models
-2. **Model Selection**: Reasoning/main/fast model configuration
-3. **Service Integration**: Additional service integrations and configurations

docs/configuration/index.md DELETED Viewed

@@ -1,746 +0,0 @@
-# Configuration Guide
-## Overview
-DeepCritical uses **Pydantic Settings** for centralized configuration management. All settings are defined in the `Settings` class in `src/utils/config.py` and can be configured via environment variables or a `.env` file.
-The configuration system provides:
-- **Type Safety**: Strongly-typed fields with Pydantic validation
-- **Environment File Support**: Automatically loads from `.env` file (if present)
-- **Case-Insensitive**: Environment variables are case-insensitive
-- **Singleton Pattern**: Global `settings` instance for easy access throughout the codebase
-- **Validation**: Automatic validation on load with helpful error messages
-## Quick Start
-1. Create a `.env` file in the project root
-2. Set at least one LLM API key (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, or `HF_TOKEN`)
-3. Optionally configure other services as needed
-4. The application will automatically load and validate your configuration
-## Configuration System Architecture
-### Settings Class
-The [`Settings`][settings-class] class extends `BaseSettings` from `pydantic_settings` and defines all application configuration:
-```13:21:src/utils/config.py
-class Settings(BaseSettings):
-    """Strongly-typed application settings."""
-    model_config = SettingsConfigDict(
-        env_file=".env",
-        env_file_encoding="utf-8",
-        case_sensitive=False,
-        extra="ignore",
-    )
-```
-[View source](https://github.com/DeepCritical/GradioDemo/blob/main/src/utils/config.py#L13-L21)
-### Singleton Instance
-A global `settings` instance is available for import:
-```234:235:src/utils/config.py
-# Singleton for easy import
-settings = get_settings()
-```
-[View source](https://github.com/DeepCritical/GradioDemo/blob/main/src/utils/config.py#L234-L235)
-### Usage Pattern
-Access configuration throughout the codebase:
-```python
-from src.utils.config import settings
-# Check if API keys are available
-if settings.has_openai_key:
-    # Use OpenAI
-    pass
-# Access configuration values
-max_iterations = settings.max_iterations
-web_search_provider = settings.web_search_provider
-```
-## Required Configuration
-### LLM Provider
-You must configure at least one LLM provider. The system supports:
-- **OpenAI**: Requires `OPENAI_API_KEY`
-- **Anthropic**: Requires `ANTHROPIC_API_KEY`
-- **HuggingFace**: Optional `HF_TOKEN` or `HUGGINGFACE_API_KEY` (can work without key for public models)
-#### OpenAI Configuration
-```bash
-LLM_PROVIDER=openai
-OPENAI_API_KEY=your_openai_api_key_here
-OPENAI_MODEL=gpt-5.1
-```
-The default model is defined in the `Settings` class:
-```29:29:src/utils/config.py
-    openai_model: str = Field(default="gpt-5.1", description="OpenAI model name")
-```
-#### Anthropic Configuration
-```bash
-LLM_PROVIDER=anthropic
-ANTHROPIC_API_KEY=your_anthropic_api_key_here
-ANTHROPIC_MODEL=claude-sonnet-4-5-20250929
-```
-The default model is defined in the `Settings` class:
-```30:32:src/utils/config.py
-    anthropic_model: str = Field(
-        default="claude-sonnet-4-5-20250929", description="Anthropic model"
-    )
-```
-#### HuggingFace Configuration
-HuggingFace can work without an API key for public models, but an API key provides higher rate limits:
-```bash
-# Option 1: Using HF_TOKEN (preferred)
-HF_TOKEN=your_huggingface_token_here
-# Option 2: Using HUGGINGFACE_API_KEY (alternative)
-HUGGINGFACE_API_KEY=your_huggingface_api_key_here
-# Default model
-HUGGINGFACE_MODEL=meta-llama/Llama-3.1-8B-Instruct
-```
-The HuggingFace token can be set via either environment variable:
-```33:35:src/utils/config.py
-    hf_token: str | None = Field(
-        default=None, alias="HF_TOKEN", description="HuggingFace API token"
-    )
-```
-```57:59:src/utils/config.py
-    huggingface_api_key: str | None = Field(
-        default=None, description="HuggingFace API token (HF_TOKEN or HUGGINGFACE_API_KEY)"
-    )
-```
-## Optional Configuration
-### Embedding Configuration
-DeepCritical supports multiple embedding providers for semantic search and RAG:
-```bash
-# Embedding Provider: "openai", "local", or "huggingface"
-EMBEDDING_PROVIDER=local
-# OpenAI Embedding Model (used by LlamaIndex RAG)
-OPENAI_EMBEDDING_MODEL=text-embedding-3-small
-# Local Embedding Model (sentence-transformers, used by EmbeddingService)
-LOCAL_EMBEDDING_MODEL=all-MiniLM-L6-v2
-# HuggingFace Embedding Model
-HUGGINGFACE_EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
-```
-The embedding provider configuration:
-```47:50:src/utils/config.py
-    embedding_provider: Literal["openai", "local", "huggingface"] = Field(
-        default="local",
-        description="Embedding provider to use",
-    )
-```
-**Note**: OpenAI embeddings require `OPENAI_API_KEY`. The local provider (default) uses sentence-transformers and requires no API key.
-### Web Search Configuration
-DeepCritical supports multiple web search providers:
-```bash
-# Web Search Provider: "serper", "searchxng", "brave", "tavily", or "duckduckgo"
-# Default: "duckduckgo" (no API key required)
-WEB_SEARCH_PROVIDER=duckduckgo
-# Serper API Key (for Google search via Serper)
-SERPER_API_KEY=your_serper_api_key_here
-# SearchXNG Host URL (for self-hosted search)
-SEARCHXNG_HOST=http://localhost:8080
-# Brave Search API Key
-BRAVE_API_KEY=your_brave_api_key_here
-# Tavily API Key
-TAVILY_API_KEY=your_tavily_api_key_here
-```
-The web search provider configuration:
-```71:74:src/utils/config.py
-    web_search_provider: Literal["serper", "searchxng", "brave", "tavily", "duckduckgo"] = Field(
-        default="duckduckgo",
-        description="Web search provider to use",
-    )
-```
-**Note**: DuckDuckGo is the default and requires no API key, making it ideal for development and testing.
-### PubMed Configuration
-PubMed search supports optional NCBI API key for higher rate limits:
-```bash
-# NCBI API Key (optional, for higher rate limits: 10 req/sec vs 3 req/sec)
-NCBI_API_KEY=your_ncbi_api_key_here
-```
-The PubMed tool uses this configuration:
-```22:29:src/tools/pubmed.py
-    def __init__(self, api_key: str | None = None) -> None:
-        self.api_key = api_key or settings.ncbi_api_key
-        # Ignore placeholder values from .env.example
-        if self.api_key == "your-ncbi-key-here":
-            self.api_key = None
-        # Use shared rate limiter
-        self._limiter = get_pubmed_limiter(self.api_key)
-```
-### Agent Configuration
-Control agent behavior and research loop execution:
-```bash
-# Maximum iterations per research loop (1-50, default: 10)
-MAX_ITERATIONS=10
-# Search timeout in seconds
-SEARCH_TIMEOUT=30
-# Use graph-based execution for research flows
-USE_GRAPH_EXECUTION=false
-```
-The agent configuration fields:
-```80:85:src/utils/config.py
-    # Agent Configuration
-    max_iterations: int = Field(default=10, ge=1, le=50)
-    search_timeout: int = Field(default=30, description="Seconds to wait for search")
-    use_graph_execution: bool = Field(
-        default=False, description="Use graph-based execution for research flows"
-    )
-```
-### Budget & Rate Limiting Configuration
-Control resource limits for research loops:
-```bash
-# Default token budget per research loop (1000-1000000, default: 100000)
-DEFAULT_TOKEN_LIMIT=100000
-# Default time limit per research loop in minutes (1-120, default: 10)
-DEFAULT_TIME_LIMIT_MINUTES=10
-# Default iterations limit per research loop (1-50, default: 10)
-DEFAULT_ITERATIONS_LIMIT=10
-```
-The budget configuration with validation:
-```87:105:src/utils/config.py
-    # Budget & Rate Limiting Configuration
-    default_token_limit: int = Field(
-        default=100000,
-        ge=1000,
-        le=1000000,
-        description="Default token budget per research loop",
-    )
-    default_time_limit_minutes: int = Field(
-        default=10,
-        ge=1,
-        le=120,
-        description="Default time limit per research loop (minutes)",
-    )
-    default_iterations_limit: int = Field(
-        default=10,
-        ge=1,
-        le=50,
-        description="Default iterations limit per research loop",
-    )
-```
-### RAG Service Configuration
-Configure the Retrieval-Augmented Generation service:
-```bash
-# ChromaDB collection name for RAG
-RAG_COLLECTION_NAME=deepcritical_evidence
-# Number of top results to retrieve from RAG (1-50, default: 5)
-RAG_SIMILARITY_TOP_K=5
-# Automatically ingest evidence into RAG
-RAG_AUTO_INGEST=true
-```
-The RAG configuration:
-```127:141:src/utils/config.py
-    # RAG Service Configuration
-    rag_collection_name: str = Field(
-        default="deepcritical_evidence",
-        description="ChromaDB collection name for RAG",
-    )
-    rag_similarity_top_k: int = Field(
-        default=5,
-        ge=1,
-        le=50,
-        description="Number of top results to retrieve from RAG",
-    )
-    rag_auto_ingest: bool = Field(
-        default=True,
-        description="Automatically ingest evidence into RAG",
-    )
-```
-### ChromaDB Configuration
-Configure the vector database for embeddings and RAG:
-```bash
-# ChromaDB storage path
-CHROMA_DB_PATH=./chroma_db
-# Whether to persist ChromaDB to disk
-CHROMA_DB_PERSIST=true
-# ChromaDB server host (for remote ChromaDB, optional)
-CHROMA_DB_HOST=localhost
-# ChromaDB server port (for remote ChromaDB, optional)
-CHROMA_DB_PORT=8000
-```
-The ChromaDB configuration:
-```113:125:src/utils/config.py
-    chroma_db_path: str = Field(default="./chroma_db", description="ChromaDB storage path")
-    chroma_db_persist: bool = Field(
-        default=True,
-        description="Whether to persist ChromaDB to disk",
-    )
-    chroma_db_host: str | None = Field(
-        default=None,
-        description="ChromaDB server host (for remote ChromaDB)",
-    )
-    chroma_db_port: int | None = Field(
-        default=None,
-        description="ChromaDB server port (for remote ChromaDB)",
-    )
-```
-### External Services
-#### Modal Configuration
-Modal is used for secure sandbox execution of statistical analysis:
-```bash
-# Modal Token ID (for Modal sandbox execution)
-MODAL_TOKEN_ID=your_modal_token_id_here
-# Modal Token Secret
-MODAL_TOKEN_SECRET=your_modal_token_secret_here
-```
-The Modal configuration:
-```110:112:src/utils/config.py
-    # External Services
-    modal_token_id: str | None = Field(default=None, description="Modal token ID")
-    modal_token_secret: str | None = Field(default=None, description="Modal token secret")
-```
-### Logging Configuration
-Configure structured logging:
-```bash
-# Log Level: "DEBUG", "INFO", "WARNING", or "ERROR"
-LOG_LEVEL=INFO
-```
-The logging configuration:
-```107:108:src/utils/config.py
-    # Logging
-    log_level: Literal["DEBUG", "INFO", "WARNING", "ERROR"] = "INFO"
-```
-Logging is configured via the `configure_logging()` function:
-```212:231:src/utils/config.py
-def configure_logging(settings: Settings) -> None:
-    """Configure structured logging with the configured log level."""
-    # Set stdlib logging level from settings
-    logging.basicConfig(
-        level=getattr(logging, settings.log_level),
-        format="%(message)s",
-    )
-    structlog.configure(
-        processors=[
-            structlog.stdlib.filter_by_level,
-            structlog.stdlib.add_logger_name,
-            structlog.stdlib.add_log_level,
-            structlog.processors.TimeStamper(fmt="iso"),
-            structlog.processors.JSONRenderer(),
-        ],
-        wrapper_class=structlog.stdlib.BoundLogger,
-        context_class=dict,
-        logger_factory=structlog.stdlib.LoggerFactory(),
-    )
-```
-## Configuration Properties
-The `Settings` class provides helpful properties for checking configuration state:
-### API Key Availability
-Check which API keys are available:
-```171:189:src/utils/config.py
-    @property
-    def has_openai_key(self) -> bool:
-        """Check if OpenAI API key is available."""
-        return bool(self.openai_api_key)
-    @property
-    def has_anthropic_key(self) -> bool:
-        """Check if Anthropic API key is available."""
-        return bool(self.anthropic_api_key)
-    @property
-    def has_huggingface_key(self) -> bool:
-        """Check if HuggingFace API key is available."""
-        return bool(self.huggingface_api_key or self.hf_token)
-    @property
-    def has_any_llm_key(self) -> bool:
-        """Check if any LLM API key is available."""
-        return self.has_openai_key or self.has_anthropic_key or self.has_huggingface_key
-```
-**Usage:**
-```python
-from src.utils.config import settings
-# Check API key availability
-if settings.has_openai_key:
-    # Use OpenAI
-    pass
-if settings.has_anthropic_key:
-    # Use Anthropic
-    pass
-if settings.has_huggingface_key:
-    # Use HuggingFace
-    pass
-if settings.has_any_llm_key:
-    # At least one LLM is available
-    pass
-```
-### Service Availability
-Check if external services are configured:
-```143:146:src/utils/config.py
-    @property
-    def modal_available(self) -> bool:
-        """Check if Modal credentials are configured."""
-        return bool(self.modal_token_id and self.modal_token_secret)
-```
-```191:204:src/utils/config.py
-    @property
-    def web_search_available(self) -> bool:
-        """Check if web search is available (either no-key provider or API key present)."""
-        if self.web_search_provider == "duckduckgo":
-            return True  # No API key required
-        if self.web_search_provider == "serper":
-            return bool(self.serper_api_key)
-        if self.web_search_provider == "searchxng":
-            return bool(self.searchxng_host)
-        if self.web_search_provider == "brave":
-            return bool(self.brave_api_key)
-        if self.web_search_provider == "tavily":
-            return bool(self.tavily_api_key)
-        return False
-```
-**Usage:**
-```python
-from src.utils.config import settings
-# Check service availability
-if settings.modal_available:
-    # Use Modal sandbox
-    pass
-if settings.web_search_available:
-    # Web search is configured
-    pass
-```
-### API Key Retrieval
-Get the API key for the configured provider:
-```148:160:src/utils/config.py
-    def get_api_key(self) -> str:
-        """Get the API key for the configured provider."""
-        if self.llm_provider == "openai":
-            if not self.openai_api_key:
-                raise ConfigurationError("OPENAI_API_KEY not set")
-            return self.openai_api_key
-        if self.llm_provider == "anthropic":
-            if not self.anthropic_api_key:
-                raise ConfigurationError("ANTHROPIC_API_KEY not set")
-            return self.anthropic_api_key
-        raise ConfigurationError(f"Unknown LLM provider: {self.llm_provider}")
-```
-For OpenAI-specific operations (e.g., Magentic mode):
-```162:169:src/utils/config.py
-    def get_openai_api_key(self) -> str:
-        """Get OpenAI API key (required for Magentic function calling)."""
-        if not self.openai_api_key:
-            raise ConfigurationError(
-                "OPENAI_API_KEY not set. Magentic mode requires OpenAI for function calling. "
-                "Use mode='simple' for other providers."
-            )
-        return self.openai_api_key
-```
-## Configuration Usage in Codebase
-The configuration system is used throughout the codebase:
-### LLM Factory
-The LLM factory uses settings to create appropriate models:
-```129:144:src/utils/llm_factory.py
-    if settings.llm_provider == "huggingface":
-        model_name = settings.huggingface_model or "meta-llama/Llama-3.1-8B-Instruct"
-        hf_provider = HuggingFaceProvider(api_key=settings.hf_token)
-        return HuggingFaceModel(model_name, provider=hf_provider)
-    if settings.llm_provider == "openai":
-        if not settings.openai_api_key:
-            raise ConfigurationError("OPENAI_API_KEY not set for pydantic-ai")
-        provider = OpenAIProvider(api_key=settings.openai_api_key)
-        return OpenAIModel(settings.openai_model, provider=provider)
-    if settings.llm_provider == "anthropic":
-        if not settings.anthropic_api_key:
-            raise ConfigurationError("ANTHROPIC_API_KEY not set for pydantic-ai")
-        anthropic_provider = AnthropicProvider(api_key=settings.anthropic_api_key)
-        return AnthropicModel(settings.anthropic_model, provider=anthropic_provider)
-```
-### Embedding Service
-The embedding service uses local embedding model configuration:
-```29:31:src/services/embeddings.py
-    def __init__(self, model_name: str | None = None):
-        self._model_name = model_name or settings.local_embedding_model
-        self._model = SentenceTransformer(self._model_name)
-```
-### Orchestrator Factory
-The orchestrator factory uses settings to determine mode:
-```69:80:src/orchestrator_factory.py
-def _determine_mode(explicit_mode: str | None) -> str:
-    """Determine which mode to use."""
-    if explicit_mode:
-        if explicit_mode in ("magentic", "advanced"):
-            return "advanced"
-        return "simple"
-    # Auto-detect: advanced if paid API key available
-    if settings.has_openai_key:
-        return "advanced"
-    return "simple"
-```
-## Environment Variables Reference
-### Required (at least one LLM)
-- `OPENAI_API_KEY` - OpenAI API key (required for OpenAI provider)
-- `ANTHROPIC_API_KEY` - Anthropic API key (required for Anthropic provider)
-- `HF_TOKEN` or `HUGGINGFACE_API_KEY` - HuggingFace API token (optional, can work without for public models)
-#### LLM Configuration Variables
-- `LLM_PROVIDER` - Provider to use: `"openai"`, `"anthropic"`, or `"huggingface"` (default: `"huggingface"`)
-- `OPENAI_MODEL` - OpenAI model name (default: `"gpt-5.1"`)
-- `ANTHROPIC_MODEL` - Anthropic model name (default: `"claude-sonnet-4-5-20250929"`)
-- `HUGGINGFACE_MODEL` - HuggingFace model ID (default: `"meta-llama/Llama-3.1-8B-Instruct"`)
-#### Embedding Configuration Variables
-- `EMBEDDING_PROVIDER` - Provider: `"openai"`, `"local"`, or `"huggingface"` (default: `"local"`)
-- `OPENAI_EMBEDDING_MODEL` - OpenAI embedding model (default: `"text-embedding-3-small"`)
-- `LOCAL_EMBEDDING_MODEL` - Local sentence-transformers model (default: `"all-MiniLM-L6-v2"`)
-- `HUGGINGFACE_EMBEDDING_MODEL` - HuggingFace embedding model (default: `"sentence-transformers/all-MiniLM-L6-v2"`)
-#### Web Search Configuration Variables
-- `WEB_SEARCH_PROVIDER` - Provider: `"serper"`, `"searchxng"`, `"brave"`, `"tavily"`, or `"duckduckgo"` (default: `"duckduckgo"`)
-- `SERPER_API_KEY` - Serper API key (required for Serper provider)
-- `SEARCHXNG_HOST` - SearchXNG host URL (required for SearchXNG provider)
-- `BRAVE_API_KEY` - Brave Search API key (required for Brave provider)
-- `TAVILY_API_KEY` - Tavily API key (required for Tavily provider)
-#### PubMed Configuration Variables
-- `NCBI_API_KEY` - NCBI API key (optional, increases rate limit from 3 to 10 req/sec)
-#### Agent Configuration Variables
-- `MAX_ITERATIONS` - Maximum iterations per research loop (1-50, default: `10`)
-- `SEARCH_TIMEOUT` - Search timeout in seconds (default: `30`)
-- `USE_GRAPH_EXECUTION` - Use graph-based execution (default: `false`)
-#### Budget Configuration Variables
-- `DEFAULT_TOKEN_LIMIT` - Default token budget per research loop (1000-1000000, default: `100000`)
-- `DEFAULT_TIME_LIMIT_MINUTES` - Default time limit in minutes (1-120, default: `10`)
-- `DEFAULT_ITERATIONS_LIMIT` - Default iterations limit (1-50, default: `10`)
-#### RAG Configuration Variables
-- `RAG_COLLECTION_NAME` - ChromaDB collection name (default: `"deepcritical_evidence"`)
-- `RAG_SIMILARITY_TOP_K` - Number of top results to retrieve (1-50, default: `5`)
-- `RAG_AUTO_INGEST` - Automatically ingest evidence into RAG (default: `true`)
-#### ChromaDB Configuration Variables
-- `CHROMA_DB_PATH` - ChromaDB storage path (default: `"./chroma_db"`)
-- `CHROMA_DB_PERSIST` - Whether to persist ChromaDB to disk (default: `true`)
-- `CHROMA_DB_HOST` - ChromaDB server host (optional, for remote ChromaDB)
-- `CHROMA_DB_PORT` - ChromaDB server port (optional, for remote ChromaDB)
-#### External Services Variables
-- `MODAL_TOKEN_ID` - Modal token ID (optional, for Modal sandbox execution)
-- `MODAL_TOKEN_SECRET` - Modal token secret (optional, for Modal sandbox execution)
-#### Logging Configuration Variables
-- `LOG_LEVEL` - Log level: `"DEBUG"`, `"INFO"`, `"WARNING"`, or `"ERROR"` (default: `"INFO"`)
-## Validation
-Settings are validated on load using Pydantic validation:
-- **Type Checking**: All fields are strongly typed
-- **Range Validation**: Numeric fields have min/max constraints (e.g., `ge=1, le=50` for `max_iterations`)
-- **Literal Validation**: Enum fields only accept specific values (e.g., `Literal["openai", "anthropic", "huggingface"]`)
-- **Required Fields**: API keys are checked when accessed via `get_api_key()` or `get_openai_api_key()`
-### Validation Examples
-The `max_iterations` field has range validation:
-```81:81:src/utils/config.py
-    max_iterations: int = Field(default=10, ge=1, le=50)
-```
-The `llm_provider` field has literal validation:
-```26:28:src/utils/config.py
-    llm_provider: Literal["openai", "anthropic", "huggingface"] = Field(
-        default="openai", description="Which LLM provider to use"
-    )
-```
-## Error Handling
-Configuration errors raise `ConfigurationError` from `src/utils/exceptions.py`:
-```22:25:src/utils/exceptions.py
-class ConfigurationError(DeepCriticalError):
-    """Raised when configuration is invalid."""
-    pass
-```
-### Error Handling Example
-```python
-from src.utils.config import settings
-from src.utils.exceptions import ConfigurationError
-try:
-    api_key = settings.get_api_key()
-except ConfigurationError as e:
-    print(f"Configuration error: {e}")
-```
-### Common Configuration Errors
-1. **Missing API Key**: When `get_api_key()` is called but the required API key is not set
-2. **Invalid Provider**: When `llm_provider` is set to an unsupported value
-3. **Out of Range**: When numeric values exceed their min/max constraints
-4. **Invalid Literal**: When enum fields receive unsupported values
-## Configuration Best Practices
-1. **Use `.env` File**: Store sensitive keys in `.env` file (add to `.gitignore`)
-2. **Check Availability**: Use properties like `has_openai_key` before accessing API keys
-3. **Handle Errors**: Always catch `ConfigurationError` when calling `get_api_key()`
-4. **Validate Early**: Configuration is validated on import, so errors surface immediately
-5. **Use Defaults**: Leverage sensible defaults for optional configuration
-## Future Enhancements
-The following configurations are planned for future phases:
-1. **Additional LLM Providers**: DeepSeek, OpenRouter, Gemini, Perplexity, Azure OpenAI, Local models
-2. **Model Selection**: Reasoning/main/fast model configuration
-3. **Service Integration**: Additional service integrations and configurations

docs/contributing.md DELETED Viewed

@@ -1,428 +0,0 @@
-# Contributing to DeepCritical
-Thank you for your interest in contributing to DeepCritical! This guide will help you get started.
-## Table of Contents
-- [Git Workflow](#git-workflow)
-- [Getting Started](#getting-started)
-- [Development Commands](#development-commands)
-- [Code Style & Conventions](#code-style--conventions)
-- [Type Safety](#type-safety)
-- [Error Handling & Logging](#error-handling--logging)
-- [Testing Requirements](#testing-requirements)
-- [Implementation Patterns](#implementation-patterns)
-- [Code Quality & Documentation](#code-quality--documentation)
-- [Prompt Engineering & Citation Validation](#prompt-engineering--citation-validation)
-- [MCP Integration](#mcp-integration)
-- [Common Pitfalls](#common-pitfalls)
-- [Key Principles](#key-principles)
-- [Pull Request Process](#pull-request-process)
-## Git Workflow
-- `main`: Production-ready (GitHub)
-- `dev`: Development integration (GitHub)
-- Use feature branches: `yourname-dev`
-- **NEVER** push directly to `main` or `dev` on HuggingFace
-- GitHub is source of truth; HuggingFace is for deployment
-## Getting Started
-1. **Fork the repository** on GitHub
-2. **Clone your fork**:
-   ```bash
-   git clone https://github.com/yourusername/GradioDemo.git
-   cd GradioDemo
-   ```
-3. **Install dependencies**:
-   ```bash
-   make install
-   ```
-4. **Create a feature branch**:
-   ```bash
-   git checkout -b yourname-feature-name
-   ```
-5. **Make your changes** following the guidelines below
-6. **Run checks**:
-   ```bash
-   make check
-   ```
-7. **Commit and push**:
-   ```bash
-   git commit -m "Description of changes"
-   git push origin yourname-feature-name
-   ```
-8. **Create a pull request** on GitHub
-## Development Commands
-```bash
-make install      # Install dependencies + pre-commit
-make check        # Lint + typecheck + test (MUST PASS)
-make test         # Run unit tests
-make lint         # Run ruff
-make format       # Format with ruff
-make typecheck    # Run mypy
-make test-cov     # Test with coverage
-make docs-build  # Build documentation
-make docs-serve  # Serve documentation locally
-```
-## Code Style & Conventions
-### Type Safety
-- **ALWAYS** use type hints for all function parameters and return types
-- Use `mypy --strict` compliance (no `Any` unless absolutely necessary)
-- Use `TYPE_CHECKING` imports for circular dependencies:
-```python
-from typing import TYPE_CHECKING
-if TYPE_CHECKING:
-    from src.services.embeddings import EmbeddingService
-```
-### Pydantic Models
-- All data exchange uses Pydantic models (`src/utils/models.py`)
-- Models are frozen (`model_config = {"frozen": True}`) for immutability
-- Use `Field()` with descriptions for all model fields
-- Validate with `ge=`, `le=`, `min_length=`, `max_length=` constraints
-### Async Patterns
-- **ALL** I/O operations must be async (`async def`, `await`)
-- Use `asyncio.gather()` for parallel operations
-- CPU-bound work (embeddings, parsing) must use `run_in_executor()`:
-```python
-loop = asyncio.get_running_loop()
-result = await loop.run_in_executor(None, cpu_bound_function, args)
-```
-- Never block the event loop with synchronous I/O
-### Linting
-- Ruff with 100-char line length
-- Ignore rules documented in `pyproject.toml`:
-  - `PLR0913`: Too many arguments (agents need many params)
-  - `PLR0912`: Too many branches (complex orchestrator logic)
-  - `PLR0911`: Too many return statements (complex agent logic)
-  - `PLR2004`: Magic values (statistical constants)
-  - `PLW0603`: Global statement (singleton pattern)
-  - `PLC0415`: Lazy imports for optional dependencies
-### Pre-commit
-- Run `make check` before committing
-- Must pass: lint + typecheck + test-cov
-- Pre-commit hooks installed via `make install`
-- **CRITICAL**: Make sure you run the full pre-commit checks before opening a PR (not draft), otherwise Obstacle is the Way will lose his mind
-## Error Handling & Logging
-### Exception Hierarchy
-Use custom exception hierarchy (`src/utils/exceptions.py`):
-- `DeepCriticalError` (base)
-- `SearchError` → `RateLimitError`
-- `JudgeError`
-- `ConfigurationError`
-### Error Handling Rules
-- Always chain exceptions: `raise SearchError(...) from e`
-- Log errors with context using `structlog`:
-```python
-logger.error("Operation failed", error=str(e), context=value)
-```
-- Never silently swallow exceptions
-- Provide actionable error messages
-### Logging
-- Use `structlog` for all logging (NOT `print` or `logging`)
-- Import: `import structlog; logger = structlog.get_logger()`
-- Log with structured data: `logger.info("event", key=value)`
-- Use appropriate levels: DEBUG, INFO, WARNING, ERROR
-### Logging Examples
-```python
-logger.info("Starting search", query=query, tools=[t.name for t in tools])
-logger.warning("Search tool failed", tool=tool.name, error=str(result))
-logger.error("Assessment failed", error=str(e))
-```
-### Error Chaining
-Always preserve exception context:
-```python
-try:
-    result = await api_call()
-except httpx.HTTPError as e:
-    raise SearchError(f"API call failed: {e}") from e
-```
-## Testing Requirements
-### Test Structure
-- Unit tests in `tests/unit/` (mocked, fast)
-- Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`)
-- Use markers: `unit`, `integration`, `slow`
-### Mocking
-- Use `respx` for httpx mocking
-- Use `pytest-mock` for general mocking
-- Mock LLM calls in unit tests (use `MockJudgeHandler`)
-- Fixtures in `tests/conftest.py`: `mock_httpx_client`, `mock_llm_response`
-### TDD Workflow
-1. Write failing test in `tests/unit/`
-2. Implement in `src/`
-3. Ensure test passes
-4. Run `make check` (lint + typecheck + test)
-### Test Examples
-```python
-@pytest.mark.unit
-async def test_pubmed_search(mock_httpx_client):
-    tool = PubMedTool()
-    results = await tool.search("metformin", max_results=5)
-    assert len(results) > 0
-    assert all(isinstance(r, Evidence) for r in results)
-@pytest.mark.integration
-async def test_real_pubmed_search():
-    tool = PubMedTool()
-    results = await tool.search("metformin", max_results=3)
-    assert len(results) <= 3
-```
-### Test Coverage
-- Run `make test-cov` for coverage report
-- Aim for >80% coverage on critical paths
-- Exclude: `__init__.py`, `TYPE_CHECKING` blocks
-## Implementation Patterns
-### Search Tools
-All tools implement `SearchTool` protocol (`src/tools/base.py`):
-- Must have `name` property
-- Must implement `async def search(query, max_results) -> list[Evidence]`
-- Use `@retry` decorator from tenacity for resilience
-- Rate limiting: Implement `_rate_limit()` for APIs with limits (e.g., PubMed)
-- Error handling: Raise `SearchError` or `RateLimitError` on failures
-Example pattern:
-```python
-class MySearchTool:
-    @property
-    def name(self) -> str:
-        return "mytool"
-    @retry(stop=stop_after_attempt(3), wait=wait_exponential(...))
-    async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
-        # Implementation
-        return evidence_list
-```
-### Judge Handlers
-- Implement `JudgeHandlerProtocol` (`async def assess(question, evidence) -> JudgeAssessment`)
-- Use pydantic-ai `Agent` with `output_type=JudgeAssessment`
-- System prompts in `src/prompts/judge.py`
-- Support fallback handlers: `MockJudgeHandler`, `HFInferenceJudgeHandler`
-- Always return valid `JudgeAssessment` (never raise exceptions)
-### Agent Factory Pattern
-- Use factory functions for creating agents (`src/agent_factory/`)
-- Lazy initialization for optional dependencies (e.g., embeddings, Modal)
-- Check requirements before initialization:
-```python
-def check_magentic_requirements() -> None:
-    if not settings.has_openai_key:
-        raise ConfigurationError("Magentic requires OpenAI")
-```
-### State Management
-- **Magentic Mode**: Use `ContextVar` for thread-safe state (`src/agents/state.py`)
-- **Simple Mode**: Pass state via function parameters
-- Never use global mutable state (except singletons via `@lru_cache`)
-### Singleton Pattern
-Use `@lru_cache(maxsize=1)` for singletons:
-```python
-@lru_cache(maxsize=1)
-def get_embedding_service() -> EmbeddingService:
-    return EmbeddingService()
-```
-- Lazy initialization to avoid requiring dependencies at import time
-## Code Quality & Documentation
-### Docstrings
-- Google-style docstrings for all public functions
-- Include Args, Returns, Raises sections
-- Use type hints in docstrings only if needed for clarity
-Example:
-```python
-async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
-    """Search PubMed and return evidence.
-    Args:
-        query: The search query string
-        max_results: Maximum number of results to return
-    Returns:
-        List of Evidence objects
-    Raises:
-        SearchError: If the search fails
-        RateLimitError: If we hit rate limits
-    """
-```
-### Code Comments
-- Explain WHY, not WHAT
-- Document non-obvious patterns (e.g., why `requests` not `httpx` for ClinicalTrials)
-- Mark critical sections: `# CRITICAL: ...`
-- Document rate limiting rationale
-- Explain async patterns when non-obvious
-## Prompt Engineering & Citation Validation
-### Judge Prompts
-- System prompt in `src/prompts/judge.py`
-- Format evidence with truncation (1500 chars per item)
-- Handle empty evidence case separately
-- Always request structured JSON output
-- Use `format_user_prompt()` and `format_empty_evidence_prompt()` helpers
-### Hypothesis Prompts
-- Use diverse evidence selection (MMR algorithm)
-- Sentence-aware truncation (`truncate_at_sentence()`)
-- Format: Drug → Target → Pathway → Effect
-- System prompt emphasizes mechanistic reasoning
-- Use `format_hypothesis_prompt()` with embeddings for diversity
-### Report Prompts
-- Include full citation details for validation
-- Use diverse evidence selection (n=20)
-- **CRITICAL**: Emphasize citation validation rules
-- Format hypotheses with support/contradiction counts
-- System prompt includes explicit JSON structure requirements
-### Citation Validation
-- **ALWAYS** validate references before returning reports
-- Use `validate_references()` from `src/utils/citation_validator.py`
-- Remove hallucinated citations (URLs not in evidence)
-- Log warnings for removed citations
-- Never trust LLM-generated citations without validation
-### Citation Validation Rules
-1. Every reference URL must EXACTLY match a provided evidence URL
-2. Do NOT invent, fabricate, or hallucinate any references
-3. Do NOT modify paper titles, authors, dates, or URLs
-4. If unsure about a citation, OMIT it rather than guess
-5. Copy URLs exactly as provided - do not create similar-looking URLs
-### Evidence Selection
-- Use `select_diverse_evidence()` for MMR-based selection
-- Balance relevance vs diversity (lambda=0.7 default)
-- Sentence-aware truncation preserves meaning
-- Limit evidence per prompt to avoid context overflow
-## MCP Integration
-### MCP Tools
-- Functions in `src/mcp_tools.py` for Claude Desktop
-- Full type hints required
-- Google-style docstrings with Args/Returns sections
-- Formatted string returns (markdown)
-### Gradio MCP Server
-- Enable with `mcp_server=True` in `demo.launch()`
-- Endpoint: `/gradio_api/mcp/`
-- Use `ssr_mode=False` to fix hydration issues in HF Spaces
-## Common Pitfalls
-1. **Blocking the event loop**: Never use sync I/O in async functions
-2. **Missing type hints**: All functions must have complete type annotations
-3. **Hallucinated citations**: Always validate references
-4. **Global mutable state**: Use ContextVar or pass via parameters
-5. **Import errors**: Lazy-load optional dependencies (magentic, modal, embeddings)
-6. **Rate limiting**: Always implement for external APIs
-7. **Error chaining**: Always use `from e` when raising exceptions
-## Key Principles
-1. **Type Safety First**: All code must pass `mypy --strict`
-2. **Async Everything**: All I/O must be async
-3. **Test-Driven**: Write tests before implementation
-4. **No Hallucinations**: Validate all citations
-5. **Graceful Degradation**: Support free tier (HF Inference) when no API keys
-6. **Lazy Loading**: Don't require optional dependencies at import time
-7. **Structured Logging**: Use structlog, never print()
-8. **Error Chaining**: Always preserve exception context
-## Pull Request Process
-1. Ensure all checks pass: `make check`
-2. Update documentation if needed
-3. Add tests for new features
-4. Update CHANGELOG if applicable
-5. Request review from maintainers
-6. Address review feedback
-7. Wait for approval before merging
-## Questions?
-- Open an issue on GitHub
-- Check existing documentation
-- Review code examples in the codebase
-Thank you for contributing to DeepCritical!

docs/contributing/code-quality.md DELETED Viewed

@@ -1,81 +0,0 @@
-# Code Quality & Documentation
-This document outlines code quality standards and documentation requirements.
-## Linting
-- Ruff with 100-char line length
-- Ignore rules documented in `pyproject.toml`:
-  - `PLR0913`: Too many arguments (agents need many params)
-  - `PLR0912`: Too many branches (complex orchestrator logic)
-  - `PLR0911`: Too many return statements (complex agent logic)
-  - `PLR2004`: Magic values (statistical constants)
-  - `PLW0603`: Global statement (singleton pattern)
-  - `PLC0415`: Lazy imports for optional dependencies
-## Type Checking
-- `mypy --strict` compliance
-- `ignore_missing_imports = true` (for optional dependencies)
-- Exclude: `reference_repos/`, `examples/`
-- All functions must have complete type annotations
-## Pre-commit
-- Run `make check` before committing
-- Must pass: lint + typecheck + test-cov
-- Pre-commit hooks installed via `make install`
-## Documentation
-### Docstrings
-- Google-style docstrings for all public functions
-- Include Args, Returns, Raises sections
-- Use type hints in docstrings only if needed for clarity
-Example:
-```python
-async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
-    """Search PubMed and return evidence.
-    Args:
-        query: The search query string
-        max_results: Maximum number of results to return
-    Returns:
-        List of Evidence objects
-    Raises:
-        SearchError: If the search fails
-        RateLimitError: If we hit rate limits
-    """
-```
-### Code Comments
-- Explain WHY, not WHAT
-- Document non-obvious patterns (e.g., why `requests` not `httpx` for ClinicalTrials)
-- Mark critical sections: `# CRITICAL: ...`
-- Document rate limiting rationale
-- Explain async patterns when non-obvious
-## See Also
-- [Code Style](code-style.md) - Code style guidelines
-- [Testing](testing.md) - Testing guidelines

docs/contributing/code-style.md DELETED Viewed

@@ -1,61 +0,0 @@
-# Code Style & Conventions
-This document outlines the code style and conventions for DeepCritical.
-## Type Safety
-- **ALWAYS** use type hints for all function parameters and return types
-- Use `mypy --strict` compliance (no `Any` unless absolutely necessary)
-- Use `TYPE_CHECKING` imports for circular dependencies:
-```python
-from typing import TYPE_CHECKING
-if TYPE_CHECKING:
-    from src.services.embeddings import EmbeddingService
-```
-## Pydantic Models
-- All data exchange uses Pydantic models (`src/utils/models.py`)
-- Models are frozen (`model_config = {"frozen": True}`) for immutability
-- Use `Field()` with descriptions for all model fields
-- Validate with `ge=`, `le=`, `min_length=`, `max_length=` constraints
-## Async Patterns
-- **ALL** I/O operations must be async (`async def`, `await`)
-- Use `asyncio.gather()` for parallel operations
-- CPU-bound work (embeddings, parsing) must use `run_in_executor()`:
-```python
-loop = asyncio.get_running_loop()
-result = await loop.run_in_executor(None, cpu_bound_function, args)
-```
-- Never block the event loop with synchronous I/O
-## Common Pitfalls
-1. **Blocking the event loop**: Never use sync I/O in async functions
-2. **Missing type hints**: All functions must have complete type annotations
-3. **Global mutable state**: Use ContextVar or pass via parameters
-4. **Import errors**: Lazy-load optional dependencies (magentic, modal, embeddings)
-## See Also
-- [Error Handling](error-handling.md) - Error handling guidelines
-- [Implementation Patterns](implementation-patterns.md) - Common patterns

docs/contributing/error-handling.md DELETED Viewed

@@ -1,69 +0,0 @@
-# Error Handling & Logging
-This document outlines error handling and logging conventions for DeepCritical.
-## Exception Hierarchy
-Use custom exception hierarchy (`src/utils/exceptions.py`):
-- `DeepCriticalError` (base)
-- `SearchError` → `RateLimitError`
-- `JudgeError`
-- `ConfigurationError`
-## Error Handling Rules
-- Always chain exceptions: `raise SearchError(...) from e`
-- Log errors with context using `structlog`:
-```python
-logger.error("Operation failed", error=str(e), context=value)
-```
-- Never silently swallow exceptions
-- Provide actionable error messages
-## Logging
-- Use `structlog` for all logging (NOT `print` or `logging`)
-- Import: `import structlog; logger = structlog.get_logger()`
-- Log with structured data: `logger.info("event", key=value)`
-- Use appropriate levels: DEBUG, INFO, WARNING, ERROR
-## Logging Examples
-```python
-logger.info("Starting search", query=query, tools=[t.name for t in tools])
-logger.warning("Search tool failed", tool=tool.name, error=str(result))
-logger.error("Assessment failed", error=str(e))
-```
-## Error Chaining
-Always preserve exception context:
-```python
-try:
-    result = await api_call()
-except httpx.HTTPError as e:
-    raise SearchError(f"API call failed: {e}") from e
-```
-## See Also
-- [Code Style](code-style.md) - Code style guidelines
-- [Testing](testing.md) - Testing guidelines

docs/contributing/implementation-patterns.md DELETED Viewed

@@ -1,84 +0,0 @@
-# Implementation Patterns
-This document outlines common implementation patterns used in DeepCritical.
-## Search Tools
-All tools implement `SearchTool` protocol (`src/tools/base.py`):
-- Must have `name` property
-- Must implement `async def search(query, max_results) -> list[Evidence]`
-- Use `@retry` decorator from tenacity for resilience
-- Rate limiting: Implement `_rate_limit()` for APIs with limits (e.g., PubMed)
-- Error handling: Raise `SearchError` or `RateLimitError` on failures
-Example pattern:
-```python
-class MySearchTool:
-    @property
-    def name(self) -> str:
-        return "mytool"
-    @retry(stop=stop_after_attempt(3), wait=wait_exponential(...))
-    async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
-        # Implementation
-        return evidence_list
-```
-## Judge Handlers
-- Implement `JudgeHandlerProtocol` (`async def assess(question, evidence) -> JudgeAssessment`)
-- Use pydantic-ai `Agent` with `output_type=JudgeAssessment`
-- System prompts in `src/prompts/judge.py`
-- Support fallback handlers: `MockJudgeHandler`, `HFInferenceJudgeHandler`
-- Always return valid `JudgeAssessment` (never raise exceptions)
-## Agent Factory Pattern
-- Use factory functions for creating agents (`src/agent_factory/`)
-- Lazy initialization for optional dependencies (e.g., embeddings, Modal)
-- Check requirements before initialization:
-```python
-def check_magentic_requirements() -> None:
-    if not settings.has_openai_key:
-        raise ConfigurationError("Magentic requires OpenAI")
-```
-## State Management
-- **Magentic Mode**: Use `ContextVar` for thread-safe state (`src/agents/state.py`)
-- **Simple Mode**: Pass state via function parameters
-- Never use global mutable state (except singletons via `@lru_cache`)
-## Singleton Pattern
-Use `@lru_cache(maxsize=1)` for singletons:
-```python
-@lru_cache(maxsize=1)
-def get_embedding_service() -> EmbeddingService:
-    return EmbeddingService()
-```
-- Lazy initialization to avoid requiring dependencies at import time
-## See Also
-- [Code Style](code-style.md) - Code style guidelines
-- [Error Handling](error-handling.md) - Error handling guidelines

docs/contributing/index.md DELETED Viewed

@@ -1,163 +0,0 @@
-# Contributing to DeepCritical
-Thank you for your interest in contributing to DeepCritical! This guide will help you get started.
-## Git Workflow
-- `main`: Production-ready (GitHub)
-- `dev`: Development integration (GitHub)
-- Use feature branches: `yourname-dev`
-- **NEVER** push directly to `main` or `dev` on HuggingFace
-- GitHub is source of truth; HuggingFace is for deployment
-## Development Commands
-```bash
-make install      # Install dependencies + pre-commit
-make check        # Lint + typecheck + test (MUST PASS)
-make test         # Run unit tests
-make lint         # Run ruff
-make format       # Format with ruff
-make typecheck    # Run mypy
-make test-cov     # Test with coverage
-```
-## Getting Started
-1. **Fork the repository** on GitHub
-2. **Clone your fork**:
-   ```bash
-   git clone https://github.com/yourusername/GradioDemo.git
-   cd GradioDemo
-   ```
-3. **Install dependencies**:
-   ```bash
-   make install
-   ```
-4. **Create a feature branch**:
-   ```bash
-   git checkout -b yourname-feature-name
-   ```
-5. **Make your changes** following the guidelines below
-6. **Run checks**:
-   ```bash
-   make check
-   ```
-7. **Commit and push**:
-   ```bash
-   git commit -m "Description of changes"
-   git push origin yourname-feature-name
-   ```
-8. **Create a pull request** on GitHub
-## Development Guidelines
-### Code Style
-- Follow [Code Style Guidelines](code-style.md)
-- All code must pass `mypy --strict`
-- Use `ruff` for linting and formatting
-- Line length: 100 characters
-### Error Handling
-- Follow [Error Handling Guidelines](error-handling.md)
-- Always chain exceptions: `raise SearchError(...) from e`
-- Use structured logging with `structlog`
-- Never silently swallow exceptions
-### Testing
-- Follow [Testing Guidelines](testing.md)
-- Write tests before implementation (TDD)
-- Aim for >80% coverage on critical paths
-- Use markers: `unit`, `integration`, `slow`
-### Implementation Patterns
-- Follow [Implementation Patterns](implementation-patterns.md)
-- Use factory functions for agent/tool creation
-- Implement protocols for extensibility
-- Use singleton pattern with `@lru_cache(maxsize=1)`
-### Prompt Engineering
-- Follow [Prompt Engineering Guidelines](prompt-engineering.md)
-- Always validate citations
-- Use diverse evidence selection
-- Never trust LLM-generated citations without validation
-### Code Quality
-- Follow [Code Quality Guidelines](code-quality.md)
-- Google-style docstrings for all public functions
-- Explain WHY, not WHAT in comments
-- Mark critical sections: `# CRITICAL: ...`
-## MCP Integration
-### MCP Tools
-- Functions in `src/mcp_tools.py` for Claude Desktop
-- Full type hints required
-- Google-style docstrings with Args/Returns sections
-- Formatted string returns (markdown)
-### Gradio MCP Server
-- Enable with `mcp_server=True` in `demo.launch()`
-- Endpoint: `/gradio_api/mcp/`
-- Use `ssr_mode=False` to fix hydration issues in HF Spaces
-## Common Pitfalls
-1. **Blocking the event loop**: Never use sync I/O in async functions
-2. **Missing type hints**: All functions must have complete type annotations
-3. **Hallucinated citations**: Always validate references
-4. **Global mutable state**: Use ContextVar or pass via parameters
-5. **Import errors**: Lazy-load optional dependencies (magentic, modal, embeddings)
-6. **Rate limiting**: Always implement for external APIs
-7. **Error chaining**: Always use `from e` when raising exceptions
-## Key Principles
-1. **Type Safety First**: All code must pass `mypy --strict`
-2. **Async Everything**: All I/O must be async
-3. **Test-Driven**: Write tests before implementation
-4. **No Hallucinations**: Validate all citations
-5. **Graceful Degradation**: Support free tier (HF Inference) when no API keys
-6. **Lazy Loading**: Don't require optional dependencies at import time
-7. **Structured Logging**: Use structlog, never print()
-8. **Error Chaining**: Always preserve exception context
-## Pull Request Process
-1. Ensure all checks pass: `make check`
-2. Update documentation if needed
-3. Add tests for new features
-4. Update CHANGELOG if applicable
-5. Request review from maintainers
-6. Address review feedback
-7. Wait for approval before merging
-## Questions?
-- Open an issue on GitHub
-- Check existing documentation
-- Review code examples in the codebase
-Thank you for contributing to DeepCritical!

docs/contributing/prompt-engineering.md DELETED Viewed

@@ -1,69 +0,0 @@
-# Prompt Engineering & Citation Validation
-This document outlines prompt engineering guidelines and citation validation rules.
-## Judge Prompts
-- System prompt in `src/prompts/judge.py`
-- Format evidence with truncation (1500 chars per item)
-- Handle empty evidence case separately
-- Always request structured JSON output
-- Use `format_user_prompt()` and `format_empty_evidence_prompt()` helpers
-## Hypothesis Prompts
-- Use diverse evidence selection (MMR algorithm)
-- Sentence-aware truncation (`truncate_at_sentence()`)
-- Format: Drug → Target → Pathway → Effect
-- System prompt emphasizes mechanistic reasoning
-- Use `format_hypothesis_prompt()` with embeddings for diversity
-## Report Prompts
-- Include full citation details for validation
-- Use diverse evidence selection (n=20)
-- **CRITICAL**: Emphasize citation validation rules
-- Format hypotheses with support/contradiction counts
-- System prompt includes explicit JSON structure requirements
-## Citation Validation
-- **ALWAYS** validate references before returning reports
-- Use `validate_references()` from `src/utils/citation_validator.py`
-- Remove hallucinated citations (URLs not in evidence)
-- Log warnings for removed citations
-- Never trust LLM-generated citations without validation
-## Citation Validation Rules
-1. Every reference URL must EXACTLY match a provided evidence URL
-2. Do NOT invent, fabricate, or hallucinate any references
-3. Do NOT modify paper titles, authors, dates, or URLs
-4. If unsure about a citation, OMIT it rather than guess
-5. Copy URLs exactly as provided - do not create similar-looking URLs
-## Evidence Selection
-- Use `select_diverse_evidence()` for MMR-based selection
-- Balance relevance vs diversity (lambda=0.7 default)
-- Sentence-aware truncation preserves meaning
-- Limit evidence per prompt to avoid context overflow
-## See Also
-- [Code Quality](code-quality.md) - Code quality guidelines
-- [Error Handling](error-handling.md) - Error handling guidelines

docs/contributing/testing.md DELETED Viewed

@@ -1,65 +0,0 @@
-# Testing Requirements
-This document outlines testing requirements and guidelines for DeepCritical.
-## Test Structure
-- Unit tests in `tests/unit/` (mocked, fast)
-- Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`)
-- Use markers: `unit`, `integration`, `slow`
-## Mocking
-- Use `respx` for httpx mocking
-- Use `pytest-mock` for general mocking
-- Mock LLM calls in unit tests (use `MockJudgeHandler`)
-- Fixtures in `tests/conftest.py`: `mock_httpx_client`, `mock_llm_response`
-## TDD Workflow
-1. Write failing test in `tests/unit/`
-2. Implement in `src/`
-3. Ensure test passes
-4. Run `make check` (lint + typecheck + test)
-## Test Examples
-```python
-@pytest.mark.unit
-async def test_pubmed_search(mock_httpx_client):
-    tool = PubMedTool()
-    results = await tool.search("metformin", max_results=5)
-    assert len(results) > 0
-    assert all(isinstance(r, Evidence) for r in results)
-@pytest.mark.integration
-async def test_real_pubmed_search():
-    tool = PubMedTool()
-    results = await tool.search("metformin", max_results=3)
-    assert len(results) <= 3
-```
-## Test Coverage
-- Run `make test-cov` for coverage report
-- Aim for >80% coverage on critical paths
-- Exclude: `__init__.py`, `TYPE_CHECKING` blocks
-## See Also
-- [Code Style](code-style.md) - Code style guidelines
-- [Implementation Patterns](implementation-patterns.md) - Common patterns

docs/getting-started/examples.md DELETED Viewed

@@ -1,209 +0,0 @@
-# Examples
-This page provides examples of using DeepCritical for various research tasks.
-## Basic Research Query
-### Example 1: Drug Information
-**Query**:
-```
-What are the latest treatments for Alzheimer's disease?
-```
-**What DeepCritical Does**:
-1. Searches PubMed for recent papers
-2. Searches ClinicalTrials.gov for active trials
-3. Evaluates evidence quality
-4. Synthesizes findings into a comprehensive report
-### Example 2: Clinical Trial Search
-**Query**:
-```
-What clinical trials are investigating metformin for cancer prevention?
-```
-**What DeepCritical Does**:
-1. Searches ClinicalTrials.gov for relevant trials
-2. Searches PubMed for supporting literature
-3. Provides trial details and status
-4. Summarizes findings
-## Advanced Research Queries
-### Example 3: Comprehensive Review
-**Query**:
-```
-Review the evidence for using metformin as an anti-aging intervention,
-including clinical trials, mechanisms of action, and safety profile.
-```
-**What DeepCritical Does**:
-1. Uses deep research mode (multi-section)
-2. Searches multiple sources in parallel
-3. Generates sections on:
-   - Clinical trials
-   - Mechanisms of action
-   - Safety profile
-4. Synthesizes comprehensive report
-### Example 4: Hypothesis Testing
-**Query**:
-```
-Test the hypothesis that regular exercise reduces Alzheimer's disease risk.
-```
-**What DeepCritical Does**:
-1. Generates testable hypotheses
-2. Searches for supporting/contradicting evidence
-3. Performs statistical analysis (if Modal configured)
-4. Provides verdict: SUPPORTED, REFUTED, or INCONCLUSIVE
-## MCP Tool Examples
-### Using search_pubmed
-```
-Search PubMed for "CRISPR gene editing cancer therapy"
-```
-### Using search_clinical_trials
-```
-Find active clinical trials for "diabetes type 2 treatment"
-```
-### Using search_all
-```
-Search all sources for "COVID-19 vaccine side effects"
-```
-### Using analyze_hypothesis
-```
-Analyze whether vitamin D supplementation reduces COVID-19 severity
-```
-## Code Examples
-### Python API Usage
-```python
-from src.orchestrator_factory import create_orchestrator
-from src.tools.search_handler import SearchHandler
-from src.agent_factory.judges import create_judge_handler
-# Create orchestrator
-search_handler = SearchHandler()
-judge_handler = create_judge_handler()
-orchestrator = create_orchestrator(
-    search_handler=search_handler,
-    judge_handler=judge_handler,
-    config={},
-    mode="advanced"
-)
-# Run research query
-query = "What are the latest treatments for Alzheimer's disease?"
-async for event in orchestrator.run(query):
-    print(f"Event: {event.type} - {event.data}")
-```
-### Gradio UI Integration
-```python
-import gradio as gr
-from src.app import create_research_interface
-# Create interface
-interface = create_research_interface()
-# Launch
-interface.launch(server_name="0.0.0.0", server_port=7860)
-```
-## Research Patterns
-### Iterative Research
-Single-loop research with search-judge-synthesize cycles:
-```python
-from src.orchestrator.research_flow import IterativeResearchFlow
-flow = IterativeResearchFlow(
-    search_handler=search_handler,
-    judge_handler=judge_handler,
-    use_graph=False
-)
-async for event in flow.run(query):
-    # Handle events
-    pass
-```
-### Deep Research
-Multi-section parallel research:
-```python
-from src.orchestrator.research_flow import DeepResearchFlow
-flow = DeepResearchFlow(
-    search_handler=search_handler,
-    judge_handler=judge_handler,
-    use_graph=True
-)
-async for event in flow.run(query):
-    # Handle events
-    pass
-```
-## Configuration Examples
-### Basic Configuration
-```bash
-# .env file
-LLM_PROVIDER=openai
-OPENAI_API_KEY=your_key_here
-MAX_ITERATIONS=10
-```
-### Advanced Configuration
-```bash
-# .env file
-LLM_PROVIDER=anthropic
-ANTHROPIC_API_KEY=your_key_here
-EMBEDDING_PROVIDER=local
-WEB_SEARCH_PROVIDER=duckduckgo
-MAX_ITERATIONS=20
-DEFAULT_TOKEN_LIMIT=200000
-USE_GRAPH_EXECUTION=true
-```
-## Next Steps
-- Read the [Configuration Guide](../configuration/index.md) for all options
-- Explore the [Architecture Documentation](../architecture/graph-orchestration.md)
-- Check out the [API Reference](../api/agents.md) for programmatic usage

docs/getting-started/installation.md DELETED Viewed

@@ -1,148 +0,0 @@
-# Installation
-This guide will help you install and set up DeepCritical on your system.
-## Prerequisites
-- Python 3.11 or higher
-- `uv` package manager (recommended) or `pip`
-- At least one LLM API key (OpenAI, Anthropic, or HuggingFace)
-## Installation Steps
-### 1. Install uv (Recommended)
-`uv` is a fast Python package installer and resolver. Install it with:
-```bash
-pip install uv
-```
-### 2. Clone the Repository
-```bash
-git clone https://github.com/DeepCritical/GradioDemo.git
-cd GradioDemo
-```
-### 3. Install Dependencies
-Using `uv` (recommended):
-```bash
-uv sync
-```
-Using `pip`:
-```bash
-pip install -e .
-```
-### 4. Install Optional Dependencies
-For embeddings support (local sentence-transformers):
-```bash
-uv sync --extra embeddings
-```
-For Modal sandbox execution:
-```bash
-uv sync --extra modal
-```
-For Magentic orchestration:
-```bash
-uv sync --extra magentic
-```
-Install all extras:
-```bash
-uv sync --all-extras
-```
-### 5. Configure Environment Variables
-Create a `.env` file in the project root:
-```bash
-# Required: At least one LLM provider
-LLM_PROVIDER=openai  # or "anthropic" or "huggingface"
-OPENAI_API_KEY=your_openai_api_key_here
-# Optional: Other services
-NCBI_API_KEY=your_ncbi_api_key_here  # For higher PubMed rate limits
-MODAL_TOKEN_ID=your_modal_token_id
-MODAL_TOKEN_SECRET=your_modal_token_secret
-```
-See the [Configuration Guide](../configuration/index.md) for all available options.
-### 6. Verify Installation
-Run the application:
-```bash
-uv run gradio run src/app.py
-```
-Open your browser to `http://localhost:7860` to verify the installation.
-## Development Setup
-For development, install dev dependencies:
-```bash
-uv sync --all-extras --dev
-```
-Install pre-commit hooks:
-```bash
-uv run pre-commit install
-```
-## Troubleshooting
-### Common Issues
-**Import Errors**:
-- Ensure you've installed all required dependencies
-- Check that Python 3.11+ is being used
-**API Key Errors**:
-- Verify your `.env` file is in the project root
-- Check that API keys are correctly formatted
-- Ensure at least one LLM provider is configured
-**Module Not Found**:
-- Run `uv sync` or `pip install -e .` again
-- Check that you're in the correct virtual environment
-**Port Already in Use**:
-- Change the port in `src/app.py` or use environment variable
-- Kill the process using port 7860
-## Next Steps
-- Read the [Quick Start Guide](quick-start.md)
-- Learn about [MCP Integration](mcp-integration.md)
-- Explore [Examples](examples.md)