Spaces:

tuandunghcmut
/

corgi-qwen3-vl-demo

Runtime error

App Files Files Community

dung-vpt-uney commited on 17 days ago

Commit

9c4a163

1 Parent(s): c3e1463

Deploy latest CoRGI Gradio demo

Browse files

Files changed (8) hide show

PROGRESS_LOG.md +1 -1
README.md +1 -0
corgi/__pycache__/gradio_app.cpython-313.pyc +0 -0
corgi/__pycache__/pipeline.cpython-313.pyc +0 -0
corgi/__pycache__/types.cpython-313.pyc +0 -0
corgi/gradio_app.py +20 -0
corgi/pipeline.py +26 -0
corgi/types.py +22 -0

PROGRESS_LOG.md CHANGED Viewed

@@ -14,7 +14,7 @@
 - Introduced structured logging for the app (`app.py`) and pipeline execution to trace model loads, cache hits, and Gradio lifecycle events on Spaces.
 - Reworked the Gradio UI to show per-step panels with annotated evidence galleries, giving each CoRGI reasoning step its own window alongside the final synthesized answer.
 - Preloaded the default Qwen3-VL model/tokenizer at import so Spaces load the GPU weights before serving requests.
-- Switched inference to bfloat16, tightened defaults (max steps/regions = 3), and moved the @spaces.GPU decorator down to the raw `_chat` call so each generation stays within the 120 s ZeroGPU budget.
 ## 2024-10-21
 - Updated default checkpoints to `Qwen/Qwen3-VL-8B-Thinking` and verified CLI/Gradio/test coverage.

 - Introduced structured logging for the app (`app.py`) and pipeline execution to trace model loads, cache hits, and Gradio lifecycle events on Spaces.
 - Reworked the Gradio UI to show per-step panels with annotated evidence galleries, giving each CoRGI reasoning step its own window alongside the final synthesized answer.
 - Preloaded the default Qwen3-VL model/tokenizer at import so Spaces load the GPU weights before serving requests.
+- Switched inference to bfloat16, tightened defaults (max steps/regions = 3), added per-stage timers, and moved the @spaces.GPU decorator down to the raw `_chat` call so each generation stays within the 120 s ZeroGPU budget.
 ## 2024-10-21
 - Updated default checkpoints to `Qwen/Qwen3-VL-8B-Thinking` and verified CLI/Gradio/test coverage.

README.md CHANGED Viewed

@@ -47,3 +47,4 @@ python app.py
 - **ROI Extraction**: Shows the source image with every grounded bounding box plus per-evidence crops, and lists the prompts used for each verification step.
 - **Evidence Descriptions**: Summarises each grounded region (bbox, description, confidence) with the associated ROI prompts.
 - **Answer Synthesis**: Highlights the final answer, supporting context, and the synthesis prompt/response pair.

 - **ROI Extraction**: Shows the source image with every grounded bounding box plus per-evidence crops, and lists the prompts used for each verification step.
 - **Evidence Descriptions**: Summarises each grounded region (bbox, description, confidence) with the associated ROI prompts.
 - **Answer Synthesis**: Highlights the final answer, supporting context, and the synthesis prompt/response pair.
+- **Performance**: Reports per-stage timings (reasoning, ROI extraction, synthesis) plus overall latency so you can monitor ZeroGPU runtime limits.

corgi/__pycache__/gradio_app.cpython-313.pyc CHANGED Viewed

Binary files a/corgi/__pycache__/gradio_app.cpython-313.pyc and b/corgi/__pycache__/gradio_app.cpython-313.pyc differ

corgi/__pycache__/pipeline.cpython-313.pyc CHANGED Viewed

Binary files a/corgi/__pycache__/pipeline.cpython-313.pyc and b/corgi/__pycache__/pipeline.cpython-313.pyc differ

corgi/__pycache__/types.cpython-313.pyc CHANGED Viewed

Binary files a/corgi/__pycache__/types.cpython-313.pyc and b/corgi/__pycache__/types.cpython-313.pyc differ

corgi/gradio_app.py CHANGED Viewed

@@ -158,6 +158,7 @@ def _empty_ui_payload(message: str) -> Dict[str, object]:
         "evidence_prompt": placeholder_prompt,
         "answer_process_markdown": message,
         "answer_prompt": placeholder_prompt,
     }
@@ -270,6 +271,20 @@ def _prepare_ui_payload(
     ]
     answer_process_markdown = "\n".join(answer_process_lines)
     return {
         "answer_markdown": answer_text,
         "chain_markdown": chain_markdown,
@@ -281,6 +296,7 @@ def _prepare_ui_payload(
         "evidence_prompt": evidence_prompt_md,
         "answer_process_markdown": answer_process_markdown,
         "answer_prompt": answer_prompt_md,
     }
@@ -456,6 +472,8 @@ def build_demo(
                     with gr.Tab("Answer Synthesis"):
                         answer_process_markdown = gr.Markdown("_No answer generated yet._")
                         answer_prompt_markdown = gr.Markdown("```text\nAwaiting answer prompt...\n```")
         def _on_submit(state_data, image, question, model_id, max_steps, max_regions):
             pipeline_state = state_data if isinstance(state_data, PipelineState) else None
@@ -479,6 +497,7 @@ def build_demo(
                 payload["evidence_prompt"],
                 payload["answer_process_markdown"],
                 payload["answer_prompt"],
             ]
         output_components = [
@@ -493,6 +512,7 @@ def build_demo(
             evidence_prompt_markdown,
             answer_process_markdown,
             answer_prompt_markdown,
         ]
         run_button.click(

         "evidence_prompt": placeholder_prompt,
         "answer_process_markdown": message,
         "answer_prompt": placeholder_prompt,
+        "timing_markdown": message,
     }
     ]
     answer_process_markdown = "\n".join(answer_process_lines)
+    timing_lines: List[str] = []
+    if result.timings:
+        total_entry = next((t for t in result.timings if t.name == "total_pipeline"), None)
+        if total_entry:
+            timing_lines.append(f"**Total pipeline:** {total_entry.duration_ms/1000:.2f} s")
+        for timing in result.timings:
+            if timing.name == "total_pipeline":
+                continue
+            label = timing.name.replace("_", " ")
+            if timing.step_index is not None:
+                label += f" (step {timing.step_index})"
+            timing_lines.append(f"- {label}: {timing.duration_ms/1000:.2f} s")
+    timing_markdown = "\n".join(timing_lines) if timing_lines else "_No timing data available._"
     return {
         "answer_markdown": answer_text,
         "chain_markdown": chain_markdown,
         "evidence_prompt": evidence_prompt_md,
         "answer_process_markdown": answer_process_markdown,
         "answer_prompt": answer_prompt_md,
+        "timing_markdown": timing_markdown,
     }
                     with gr.Tab("Answer Synthesis"):
                         answer_process_markdown = gr.Markdown("_No answer generated yet._")
                         answer_prompt_markdown = gr.Markdown("```text\nAwaiting answer prompt...\n```")
+                    with gr.Tab("Performance"):
+                        timing_markdown = gr.Markdown("_No timing data available._")
         def _on_submit(state_data, image, question, model_id, max_steps, max_regions):
             pipeline_state = state_data if isinstance(state_data, PipelineState) else None
                 payload["evidence_prompt"],
                 payload["answer_process_markdown"],
                 payload["answer_prompt"],
+                payload["timing_markdown"],
             ]
         output_components = [
             evidence_prompt_markdown,
             answer_process_markdown,
             answer_prompt_markdown,
+            timing_markdown,
         ]
         run_button.click(

corgi/pipeline.py CHANGED Viewed

@@ -3,14 +3,18 @@ from __future__ import annotations
 from dataclasses import dataclass, field
 from typing import List, Optional, Protocol
 from PIL import Image
 from .types import (
     GroundedEvidence,
     PromptLog,
     ReasoningStep,
     evidences_to_serializable,
     prompt_logs_to_serializable,
     steps_to_serializable,
 )
@@ -58,6 +62,8 @@ class PipelineResult:
     reasoning_log: Optional[PromptLog] = None
     grounding_logs: List[PromptLog] = field(default_factory=list)
     answer_log: Optional[PromptLog] = None
     def to_json(self) -> dict:
         payload = {
@@ -65,6 +71,7 @@ class PipelineResult:
             "steps": steps_to_serializable(self.steps),
             "evidence": evidences_to_serializable(self.evidence),
             "answer": self.answer,
         }
         reasoning_entries = (
             prompt_logs_to_serializable([self.reasoning_log]) if self.reasoning_log else []
@@ -73,6 +80,7 @@ class PipelineResult:
             payload["reasoning_log"] = reasoning_entries[0]
         payload["grounding_logs"] = prompt_logs_to_serializable(self.grounding_logs)
         answer_entries = prompt_logs_to_serializable([self.answer_log]) if self.answer_log else []
         if answer_entries:
@@ -97,21 +105,37 @@ class CoRGIPipeline:
         max_regions: int = 3,
     ) -> PipelineResult:
         self._vlm.reset_logs()
         steps = self._vlm.structured_reasoning(image=image, question=question, max_steps=max_steps)
         evidences: List[GroundedEvidence] = []
         for step in steps:
             if not step.needs_vision:
                 continue
             step_evs = self._vlm.extract_step_evidence(
                 image=image,
                 question=question,
                 step=step,
                 max_regions=max_regions,
             )
             if not step_evs:
                 continue
             evidences.extend(step_evs[:max_regions])
         answer = self._vlm.synthesize_answer(image=image, question=question, steps=steps, evidences=evidences)
         return PipelineResult(
             question=question,
             steps=steps,
@@ -120,6 +144,8 @@ class CoRGIPipeline:
             reasoning_log=self._vlm.reasoning_log,
             grounding_logs=list(self._vlm.grounding_logs),
             answer_log=self._vlm.answer_log,
         )

 from dataclasses import dataclass, field
 from typing import List, Optional, Protocol
+import time
 from PIL import Image
 from .types import (
     GroundedEvidence,
     PromptLog,
     ReasoningStep,
+    StageTiming,
     evidences_to_serializable,
     prompt_logs_to_serializable,
+    stage_timings_to_serializable,
     steps_to_serializable,
 )
     reasoning_log: Optional[PromptLog] = None
     grounding_logs: List[PromptLog] = field(default_factory=list)
     answer_log: Optional[PromptLog] = None
+    timings: List[StageTiming] = field(default_factory=list)
+    total_duration_ms: float = 0.0
     def to_json(self) -> dict:
         payload = {
             "steps": steps_to_serializable(self.steps),
             "evidence": evidences_to_serializable(self.evidence),
             "answer": self.answer,
+            "total_duration_ms": self.total_duration_ms,
         }
         reasoning_entries = (
             prompt_logs_to_serializable([self.reasoning_log]) if self.reasoning_log else []
             payload["reasoning_log"] = reasoning_entries[0]
         payload["grounding_logs"] = prompt_logs_to_serializable(self.grounding_logs)
+        payload["timings"] = stage_timings_to_serializable(self.timings)
         answer_entries = prompt_logs_to_serializable([self.answer_log]) if self.answer_log else []
         if answer_entries:
         max_regions: int = 3,
     ) -> PipelineResult:
         self._vlm.reset_logs()
+        timings: List[StageTiming] = []
+        total_start = time.monotonic()
+        reasoning_start = time.monotonic()
         steps = self._vlm.structured_reasoning(image=image, question=question, max_steps=max_steps)
+        reasoning_duration = (time.monotonic() - reasoning_start) * 1000.0
+        timings.append(StageTiming(name="structured_reasoning", duration_ms=reasoning_duration))
         evidences: List[GroundedEvidence] = []
         for step in steps:
             if not step.needs_vision:
                 continue
+            stage_name = f"roi_step_{step.index}"
+            grounding_start = time.monotonic()
             step_evs = self._vlm.extract_step_evidence(
                 image=image,
                 question=question,
                 step=step,
                 max_regions=max_regions,
             )
+            grounding_duration = (time.monotonic() - grounding_start) * 1000.0
+            timings.append(StageTiming(name=stage_name, duration_ms=grounding_duration, step_index=step.index))
             if not step_evs:
                 continue
             evidences.extend(step_evs[:max_regions])
+        answer_start = time.monotonic()
         answer = self._vlm.synthesize_answer(image=image, question=question, steps=steps, evidences=evidences)
+        answer_duration = (time.monotonic() - answer_start) * 1000.0
+        timings.append(StageTiming(name="answer_synthesis", duration_ms=answer_duration))
+        total_duration = (time.monotonic() - total_start) * 1000.0
+        timings.append(StageTiming(name="total_pipeline", duration_ms=total_duration))
         return PipelineResult(
             question=question,
             steps=steps,
             reasoning_log=self._vlm.reasoning_log,
             grounding_logs=list(self._vlm.grounding_logs),
             answer_log=self._vlm.answer_log,
+            timings=timings,
+            total_duration_ms=total_duration,
         )

corgi/types.py CHANGED Viewed

@@ -38,6 +38,15 @@ class PromptLog:
     stage: Optional[str] = None
 def steps_to_serializable(steps: List[ReasoningStep]) -> List[Dict[str, object]]:
     """Helper to convert steps into JSON-friendly dictionaries."""
@@ -85,3 +94,16 @@ def prompt_logs_to_serializable(logs: List[PromptLog]) -> List[Dict[str, object]
             item["stage"] = log.stage
         serializable.append(item)
     return serializable

     stage: Optional[str] = None
+@dataclass(frozen=True)
+class StageTiming:
+    """Timing metadata for a pipeline stage or sub-step."""
+    name: str
+    duration_ms: float
+    step_index: Optional[int] = None
 def steps_to_serializable(steps: List[ReasoningStep]) -> List[Dict[str, object]]:
     """Helper to convert steps into JSON-friendly dictionaries."""
             item["stage"] = log.stage
         serializable.append(item)
     return serializable
+def stage_timings_to_serializable(timings: List[StageTiming]) -> List[Dict[str, object]]:
+    serializable: List[Dict[str, object]] = []
+    for timing in timings:
+        item: Dict[str, object] = {
+            "name": timing.name,
+            "duration_ms": timing.duration_ms,
+        }
+        if timing.step_index is not None:
+            item["step_index"] = timing.step_index
+        serializable.append(item)
+    return serializable