Spaces:

ManojINaik
/

TheoremExplainAgent

Runtime error

App Files Files Community

dfdfdsfgs commited on Jun 12

Commit

d9486d1

1 Parent(s): 929083d

Upload project files

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.env.template +33 -0
.github/ISSUE_TEMPLATE/bug_report.md +32 -0
.github/ISSUE_TEMPLATE/feature_request.md +20 -0
.gitignore +3 -0
.specstory/history/.what-is-this.md +65 -0
Dockerfile +39 -0
LICENSE +21 -0
README.md +359 -14
app.py +164 -4
data/thb_easy/chemistry.json +142 -0
data/thb_easy/comp_sci.json +142 -0
data/thb_easy/math.json +142 -0
data/thb_easy/physics.json +142 -0
data/thb_hard/chemistry.json +142 -0
data/thb_hard/comp_sci.json +142 -0
data/thb_hard/math.json +142 -0
data/thb_hard/physics.json +142 -0
data/thb_medium/chemistry.json +142 -0
data/thb_medium/comp_sci.json +142 -0
data/thb_medium/math.json +142 -0
data/thb_medium/physics.json +142 -0
eval_suite/__init__.py +0 -0
eval_suite/image_utils.py +104 -0
eval_suite/parse_prompt.py +54 -0
eval_suite/prompts_raw/__init__.py +145 -0
eval_suite/prompts_raw/fix_transcript.txt +8 -0
eval_suite/prompts_raw/image_eval.txt +45 -0
eval_suite/prompts_raw/text_eval_new.txt +47 -0
eval_suite/prompts_raw/video_eval_new.txt +37 -0
eval_suite/text_utils.py +80 -0
eval_suite/utils.py +81 -0
eval_suite/video_utils.py +167 -0
evaluate.py +474 -0
generate_video.py +954 -0
mllm_tools/__init__.py +1 -0
mllm_tools/gemini.py +176 -0
mllm_tools/litellm.py +193 -0
mllm_tools/utils.py +174 -0
mllm_tools/vertex_ai.py +86 -0
requirements.txt +101 -0
src/__init__.py +1 -0
src/config/__init__.py +0 -0
src/config/config.py +20 -0
src/core/__init__.py +0 -0
src/core/code_generator.py +454 -0
src/core/parse_video.py +227 -0
src/core/video_planner.py +417 -0
src/core/video_renderer.py +448 -0
src/rag/__init__.py +0 -0
src/rag/rag_integration.py +390 -0

.env.template ADDED Viewed

	@@ -0,0 +1,33 @@

+# OpenAI
+OPENAI_API_KEY=""
+# Azure OpenAI
+AZURE_API_KEY=""
+AZURE_API_BASE=""
+AZURE_API_VERSION=""
+# Google Vertex AI
+VERTEXAI_PROJECT=""
+VERTEXAI_LOCATION=""
+GOOGLE_APPLICATION_CREDENTIALS=""
+# Google Gemini
+GEMINI_API_KEY=""
+# AWS Bedrock / S3
+AWS_ACCESS_KEY_ID=""
+AWS_SECRET_ACCESS_KEY=""
+AWS_REGION_NAME=""
+AWS_S3_BUCKET=""
+# Langfuse
+LANGFUSE_PUBLIC_KEY=""
+LANGFUSE_SECRET_KEY=""
+LANGFUSE_HOST=""
+# Kokoro TTS Settings
+KOKORO_MODEL_PATH="models/kokoro-v0_19.onnx"
+KOKORO_VOICES_PATH="models/voices.bin"
+KOKORO_DEFAULT_VOICE="af"
+KOKORO_DEFAULT_SPEED="1.0"
+KOKORO_DEFAULT_LANG="en-us"

.github/ISSUE_TEMPLATE/bug_report.md ADDED Viewed

	@@ -0,0 +1,32 @@

+---
+name: Bug report
+about: Create a report to help us improve
+title: ''
+labels: ''
+assignees: ''
+---
+**Describe the bug**
+A clear and concise description of what the bug is.
+**To Reproduce**
+Steps to reproduce the behavior:
+1. Go to '...'
+2. Click on '....'
+3. Scroll down to '....'
+4. See error
+**Expected behavior**
+A clear and concise description of what you expected to happen.
+**Screenshots**
+If applicable, add screenshots to help explain your problem.
+**Desktop (please complete the following information):**
+ - OS: [e.g. iOS]
+ - Browser [e.g. chrome, safari]
+ - Version [e.g. 22]
+**Additional context**
+Add any other context about the problem here.

.github/ISSUE_TEMPLATE/feature_request.md ADDED Viewed

	@@ -0,0 +1,20 @@

+---
+name: Feature request
+about: Suggest an idea for this project
+title: ''
+labels: ''
+assignees: ''
+---
+**Is your feature request related to a problem? Please describe.**
+A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
+**Describe the solution you'd like**
+A clear and concise description of what you want to happen.
+**Describe alternatives you've considered**
+A clear and concise description of any alternative solutions or features you've considered.
+**Additional context**
+Add any other context or screenshots about the feature request here.

.gitignore ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ **/__pycache__/
2	+
3	+ .env

.specstory/history/.what-is-this.md ADDED Viewed

	@@ -0,0 +1,65 @@

+# SpecStory Artifacts Directory
+This directory is automatically created and maintained by the SpecStory extension to preserve your Cursor composer and chat history.
+## What's Here?
+- `.specstory/history`: Contains markdown files of your AI coding sessions
+    - Each file represents a separate chat or composer session
+    - Files are automatically updated as you work
+- `.specstory/cursor_rules_backups`: Contains backups of the `.cursor/rules/derived-cursor-rules.mdc` file
+    - Backups are automatically created each time the `.cursor/rules/derived-cursor-rules.mdc` file is updated
+    - You can enable/disable the Cursor Rules feature in the SpecStory settings
+## Valuable Uses
+- Capture: Keep your context window up-to-date when starting new Chat/Composer sessions via @ references
+- Search: For previous prompts and code snippets
+- Learn: Meta-analyze your patterns and learn from your past experiences
+- Derive: Keep Cursor on course with your past decisions by automatically deriving Cursor rules from your AI interactions
+## Version Control
+We recommend keeping this directory under version control to maintain a history of your AI interactions. However, if you prefer not to version these files, you can exclude them by adding this to your `.gitignore`:
+```
+.specstory
+```
+We recommend not keeping the `.specstory/cursor_rules_backups` directory under version control if you are already using git to version the `.cursor/rules` directory, and committing regularly. You can exclude it by adding this to your `.gitignore`:
+```
+.specstory/cursor_rules_backups
+```
+## Searching Your Codebase
+When searching your codebase in Cursor, search results may include your previous AI coding interactions. To focus solely on your actual code files, you can exclude the AI interaction history from search results.
+To exclude AI interaction history:
+1. Open the "Find in Files" search in Cursor (Cmd/Ctrl + Shift + F)
+2. Navigate to the "files to exclude" section
+3. Add the following pattern:
+```
+.specstory/*
+```
+This will ensure your searches only return results from your working codebase files.
+## Notes
+- Auto-save only works when Cursor/sqlite flushes data to disk. This results in a small delay after the AI response is complete before SpecStory can save the history.
+- Auto-save does not yet work on remote WSL workspaces.
+## Settings
+You can control auto-saving behavior in Cursor:
+1. Open Cursor → Settings → VS Code Settings (Cmd/Ctrl + ,)
+2. Search for "SpecStory"
+3. Find "Auto Save" setting to enable/disable
+Auto-save occurs when changes are detected in Cursor's sqlite database, or every 2 minutes as a safety net.

Dockerfile ADDED Viewed

	@@ -0,0 +1,39 @@

+# Start with a Python base image
+FROM python:3.11-slim
+# Set working directory
+WORKDIR /app
+# Install system dependencies for Manim
+# This is a large installation and will take time
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    ffmpeg \
+    texlive-full \
+    pango1.0-tools \
+    libcairo2-dev \
+    libjpeg-dev \
+    libgif-dev \
+    libpango1.0-dev \
+    libsdl-pango-dev \
+    portaudio19-dev \
+    git \
+    && rm -rf /var/lib/apt/lists/*
+# Copy the entire project into the container
+COPY . .
+# Install Python requirements
+# Manim is included in requirements.txt
+RUN pip install --no-cache-dir -r requirements.txt
+# Download Kokoro TTS models during the build process
+RUN mkdir -p models && \
+    wget -P models https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/kokoro-v0_19.onnx && \
+    wget -P models https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/voices.bin
+# Expose the port the API will run on (e.g., 7860 for Gradio/FastAPI)
+EXPOSE 7860
+# Command to run the application
+# We will use Gradio to create the UI endpoint
+CMD ["python", "app.py"]

LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2025 TIGER Lab
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

README.md CHANGED Viewed

@@ -1,14 +1,359 @@
----
-title: TheoremExplainAgent
-emoji: 😻
-colorFrom: pink
-colorTo: red
-sdk: gradio
-sdk_version: 5.33.2
-app_file: app.py
-pinned: false
-license: mit
-short_description: TheoremExplainAgent
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# TheoremExplainAgent (TEA) 🍵
+[![arXiv](https://img.shields.io/badge/arXiv-2502.19400-b31b1b.svg)](https://arxiv.org/abs/2502.19400)
+<a href='https://huggingface.co/papers/2502.19400'><img src='https://img.shields.io/static/v1?label=Paper&message=Huggingface&color=orange'></a>
+[**🌐 Homepage**](https://tiger-ai-lab.github.io/TheoremExplainAgent/)  | [**📖 arXiv**](https://arxiv.org/abs/2502.19400) | [**🤗 HuggingFace Dataset**](https://huggingface.co/datasets/TIGER-Lab/TheoremExplainBench) | [🎥Video Data](https://drive.google.com/file/d/18kmzXvbxaFGyJw-g51jnq9m93v_ez4aJ/view)
+[![contributors](https://img.shields.io/github/contributors/TIGER-AI-Lab/TheoremExplainAgent)](https://github.com/TIGER-AI-Lab/TheoremExplainAgent/graphs/contributors)
+[![license](https://img.shields.io/github/license/TIGER-AI-Lab/TheoremExplainAgent.svg)](https://github.com/TIGER-AI-Lab/TheoremExplainAgent/blob/main/LICENSE)
+[![GitHub](https://img.shields.io/github/stars/TIGER-AI-Lab/TheoremExplainAgent?style=social)](https://github.com/TIGER-AI-Lab/TheoremExplainAgent)
+[![Hits](https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2FTIGER-AI-Lab%2FTheoremExplainAgent&count_bg=%23C83DB9&title_bg=%23555555&icon=&icon_color=%23E7E7E7&title=visitors&edge_flat=false)](https://hits.seeyoufarm.com)
+This repo contains the codebase for our paper [TheoremExplainAgent: Towards Video-based Multimodal Explanations for LLM Theorem Understanding](https://arxiv.org/abs/2502.19400)
+**ACL 2025 main**
+## Introduction
+TheoremExplainAgent is an AI system that generates long-form Manim videos to visually explain theorems, proving its deep understanding while uncovering reasoning flaws that text alone often hides.
+https://github.com/user-attachments/assets/17f2f4f2-8f2c-4abc-b377-ac92ebda69f3
+## 📰 News
+* 2025 Jun 8: We released our generated video data for researchers to serve as baselines.
+* 2025 May 15: Paper accepted to ACL 2025 main conference.
+* 2025 Mar 3: Generation code and Evaluation code released. Thanks for the wait!
+<!--* 2025 Mar 3: Reach 404 stars without code.-->
+* 2025 Feb 27: Paper available on [Arxiv](https://arxiv.org/abs/2502.19400). Thanks AK for putting our paper on [HF Daily](https://huggingface.co/papers/2502.19400).
+## Downloading Generated Video Data
+Skip this section if you just want to try out the code.
+If you are researchers who just need the baseline videos as baseline comparison, download it here:
+```shell
+wget --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=18kmzXvbxaFGyJw-g51jnq9m93v_ez4aJ' -O /tmp/gdrive.html && wget --load-cookies /tmp/cookies.txt -O baseline_videos.zip "https://drive.usercontent.google.com/download?id=18kmzXvbxaFGyJw-g51jnq9m93v_ez4aJ&export=download&confirm=$(sed -rn 's/.*name="confirm" value="([^"]+)".*/\\1/p' /tmp/gdrive.html)&uuid=$(sed -rn 's/.*name="uuid" value="([^"]+)".*/\\1/p' /tmp/gdrive.html)" && rm /tmp/gdrive.html /tmp/cookies.txt
+```
+## Installation
+> **Look at the [FAQ section in this README doc](https://github.com/TIGER-AI-Lab/TheoremExplainAgent?tab=readme-ov-file#-faq) if you encountered any errors. If that didnt help, create a issue**<br>
+1. Setting up conda environment
+```shell
+conda create --name tea python=3.12.8
+conda activate tea
+pip install -r requirements.txt
+```
+2. You may also need to install latex and other dependencies for Manim Community. Look at [Manim Installation Docs](https://docs.manim.community/en/stable/installation.html) for more details.
+```shell
+# You might need these dependencies if you are using Linux Ubuntu:
+sudo apt-get install portaudio19-dev
+sudo apt-get install libsdl-pango-dev
+```
+3. Then Download the Kokoro model and voices using the commands to enable TTS service.
+```shell
+mkdir -p models && wget -P models https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/kokoro-v0_19.onnx && wget -P models https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/voices.bin
+```
+4. Create `.env` based on `.env.template`, filling in the environmental variables according to the models you choose to use.
+See [LiteLLM](https://docs.litellm.ai/docs/providers) for reference.
+```shell
+touch .env
+```
+Then open the `.env` file and edit it with whatever text editor you like.
+Your `.env` file should look like the following:
+```shell
+# OpenAI
+OPENAI_API_KEY=""
+# Azure OpenAI
+AZURE_API_KEY=""
+AZURE_API_BASE=""
+AZURE_API_VERSION=""
+# Google Vertex AI
+VERTEXAI_PROJECT=""
+VERTEXAI_LOCATION=""
+GOOGLE_APPLICATION_CREDENTIALS=""
+# Google Gemini
+GEMINI_API_KEY=""
+...
+# Kokoro TTS Settings
+KOKORO_MODEL_PATH="models/kokoro-v0_19.onnx"
+KOKORO_VOICES_PATH="models/voices.bin"
+KOKORO_DEFAULT_VOICE="af"
+KOKORO_DEFAULT_SPEED="1.0"
+KOKORO_DEFAULT_LANG="en-us"
+```
+Fill in the API keys according to the model you wanted to use.
+5. Configure Python path. Note that you need to configure the python path to make it work. Otherwise you may encounter import issues (like not being able to import src etc.)
+```shell
+export PYTHONPATH=$(pwd):$PYTHONPATH
+```
+6. (Optional) To setup RAG, See [https://github.com/TIGER-AI-Lab/TheoremExplainAgent?tab=readme-ov-file#generation-with-rag](https://github.com/TIGER-AI-Lab/TheoremExplainAgent?tab=readme-ov-file#generation-with-rag).
+> **Look at the [FAQ section in this README doc](https://github.com/TIGER-AI-Lab/TheoremExplainAgent?tab=readme-ov-file#-faq) if you encountered any errors. If that didnt help, create a issue**<br>
+## Generation
+### Supported Models
+<!--You can customize the allowed models by editing the `src/utils/allowed_models.json` file. This file specifies which `model` and `helper_model` the system is permitted to use.-->
+The model naming follows the LiteLLM convention. For details on how models should be named, please refer to the [LiteLLM documentation](https://docs.litellm.ai/docs/providers).
+### Generation (Single topic)
+```shell
+python generate_video.py \
+      --model "openai/o3-mini" \
+      --helper_model "openai/o3-mini" \
+      --output_dir "output/your_exp_name" \
+      --topic "your_topic" \
+      --context "description of your topic, e.g. 'This is a topic about the properties of a triangle'" \
+```
+Example:
+```shell
+python generate_video.py \
+      --model "openai/o3-mini" \
+      --helper_model "openai/o3-mini" \
+      --output_dir "output/my_exp_name" \
+      --topic "Big O notation" \
+      --context "most common type of asymptotic notation in computer science used to measure worst case complexity" \
+```
+### Generation (in batch)
+```shell
+python generate_video.py \
+      --model "openai/o3-mini" \
+      --helper_model "openai/o3-mini" \
+      --output_dir "output/my_exp_name" \
+      --theorems_path data/thb_easy/math.json \
+      --max_scene_concurrency 7 \
+      --max_topic_concurrency 20 \
+```
+### Generation with RAG
+Before using RAG, download the RAG documentation from this [Google Drive link](https://drive.google.com/file/d/1Tn6J_JKVefFZRgZbjns93KLBtI9ullRv/view?usp=sharing). After downloading, unzip the file. For example, if you unzip it to `data/rag/manim_docs`, then you should set `--manim_docs_path` to `data/rag/manim_docs`. The vector database will be created the first time you run with RAG.
+```shell
+python generate_video.py \
+            --model "openai/o3-mini" \
+            --helper_model "openai/o3-mini" \
+            --output_dir "output/with_rag/o3-mini/vtutorbench_easy/math" \
+            --topic "Big O notation" \
+            --context "most common type of asymptotic notation in computer science used to measure worst case complexity" \
+            --use_rag \
+            --chroma_db_path "data/rag/chroma_db" \
+            --manim_docs_path "data/rag/manim_docs" \
+            --embedding_model "vertex_ai/text-embedding-005"
+```
+We support more options for generation, see below for more details:
+```shell
+usage: generate_video.py [-h]
+                         [--model]
+                         [--topic TOPIC] [--context CONTEXT]
+                         [--helper_model]
+                         [--only_gen_vid] [--only_combine] [--peek_existing_videos] [--output_dir OUTPUT_DIR] [--theorems_path THEOREMS_PATH]
+                         [--sample_size SAMPLE_SIZE] [--verbose] [--max_retries MAX_RETRIES] [--use_rag] [--use_visual_fix_code]
+                         [--chroma_db_path CHROMA_DB_PATH] [--manim_docs_path MANIM_DOCS_PATH]
+                         [--embedding_model {azure/text-embedding-3-large,vertex_ai/text-embedding-005}] [--use_context_learning]
+                         [--context_learning_path CONTEXT_LEARNING_PATH] [--use_langfuse] [--max_scene_concurrency MAX_SCENE_CONCURRENCY]
+                         [--max_topic_concurrency MAX_TOPIC_CONCURRENCY] [--debug_combine_topic DEBUG_COMBINE_TOPIC] [--only_plan] [--check_status]
+                         [--only_render] [--scenes SCENES [SCENES ...]]
+Generate Manim videos using AI
+options:
+  -h, --help            show this help message and exit
+  --model               Select the AI model to use
+  --topic TOPIC         Topic to generate videos for
+  --context CONTEXT     Context of the topic
+  --helper_model        Select the helper model to use
+  --only_gen_vid        Only generate videos to existing plans
+  --only_combine        Only combine videos
+  --peek_existing_videos, --peek
+                        Peek at existing videos
+  --output_dir OUTPUT_DIR
+                        Output directory
+  --theorems_path THEOREMS_PATH
+                        Path to theorems json file
+  --sample_size SAMPLE_SIZE, --sample SAMPLE_SIZE
+                        Number of theorems to sample
+  --verbose             Print verbose output
+  --max_retries MAX_RETRIES
+                        Maximum number of retries for code generation
+  --use_rag, --rag      Use Retrieval Augmented Generation
+  --use_visual_fix_code, --visual_fix_code
+                        Use VLM to fix code with rendered visuals
+  --chroma_db_path CHROMA_DB_PATH
+                        Path to Chroma DB
+  --manim_docs_path MANIM_DOCS_PATH
+                        Path to manim docs
+  --embedding_model {azure/text-embedding-3-large,vertex_ai/text-embedding-005}
+                        Select the embedding model to use
+  --use_context_learning
+                        Use context learning with example Manim code
+  --context_learning_path CONTEXT_LEARNING_PATH
+                        Path to context learning examples
+  --use_langfuse        Enable Langfuse logging
+  --max_scene_concurrency MAX_SCENE_CONCURRENCY
+                        Maximum number of scenes to process concurrently
+  --max_topic_concurrency MAX_TOPIC_CONCURRENCY
+                        Maximum number of topics to process concurrently
+  --debug_combine_topic DEBUG_COMBINE_TOPIC
+                        Debug combine videos
+  --only_plan           Only generate scene outline and implementation plans
+  --check_status        Check planning and code status for all theorems
+  --only_render         Only render scenes without combining videos
+  --scenes SCENES [SCENES ...]
+                        Specific scenes to process (if theorems_path is provided)
+```
+## Evaluation
+Note that Gemini and GPT4o is required for evaluation.
+Currently, evaluation requires a video file and a subtitle file (SRT format).
+Video evaluation:
+```shell
+usage: evaluate.py [-h]
+                   [--model_text {gemini/gemini-1.5-pro-002,gemini/gemini-1.5-flash-002,gemini/gemini-2.0-flash-001,vertex_ai/gemini-1.5-flash-002,vertex_ai/gemini-1.5-pro-002,vertex_ai/gemini-2.0-flash-001,openai/o3-mini,gpt-4o,azure/gpt-4o,azure/gpt-4o-mini,bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0,bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0,bedrock/anthropic.claude-3-5-haiku-20241022-v1:0,bedrock/us.anthropic.claude-3-7-sonnet-20250219-v1:0}]
+                   [--model_video {gemini/gemini-1.5-pro-002,gemini/gemini-2.0-flash-exp,gemini/gemini-2.0-pro-exp-02-05}]
+                   [--model_image {gemini/gemini-1.5-pro-002,gemini/gemini-1.5-flash-002,gemini/gemini-2.0-flash-001,vertex_ai/gemini-1.5-flash-002,vertex_ai/gemini-1.5-pro-002,vertex_ai/gemini-2.0-flash-001,openai/o3-mini,gpt-4o,azure/gpt-4o,azure/gpt-4o-mini,bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0,bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0,bedrock/anthropic.claude-3-5-haiku-20241022-v1:0,bedrock/us.anthropic.claude-3-7-sonnet-20250219-v1:0}]
+                   [--eval_type {text,video,image,all}] --file_path FILE_PATH --output_folder OUTPUT_FOLDER [--retry_limit RETRY_LIMIT] [--combine] [--bulk_evaluate] [--target_fps TARGET_FPS]
+                   [--use_parent_folder_as_topic] [--max_workers MAX_WORKERS]
+Automatic evaluation of theorem explanation videos with LLMs
+options:
+  -h, --help            show this help message and exit
+  --model_text {gemini/gemini-1.5-pro-002,gemini/gemini-1.5-flash-002,gemini/gemini-2.0-flash-001,vertex_ai/gemini-1.5-flash-002,vertex_ai/gemini-1.5-pro-002,vertex_ai/gemini-2.0-flash-001,openai/o3-mini,gpt-4o,azure/gpt-4o,azure/gpt-4o-mini,bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0,bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0,bedrock/anthropic.claude-3-5-haiku-20241022-v1:0,bedrock/us.anthropic.claude-3-7-sonnet-20250219-v1:0}
+                        Select the AI model to use for text evaluation
+  --model_video {gemini/gemini-1.5-pro-002,gemini/gemini-2.0-flash-exp,gemini/gemini-2.0-pro-exp-02-05}
+                        Select the AI model to use for video evaluation
+  --model_image {gemini/gemini-1.5-pro-002,gemini/gemini-1.5-flash-002,gemini/gemini-2.0-flash-001,vertex_ai/gemini-1.5-flash-002,vertex_ai/gemini-1.5-pro-002,vertex_ai/gemini-2.0-flash-001,openai/o3-mini,gpt-4o,azure/gpt-4o,azure/gpt-4o-mini,bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0,bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0,bedrock/anthropic.claude-3-5-haiku-20241022-v1:0,bedrock/us.anthropic.claude-3-7-sonnet-20250219-v1:0}
+                        Select the AI model to use for image evaluation
+  --eval_type {text,video,image,all}
+                        Type of evaluation to perform
+  --file_path FILE_PATH
+                        Path to a file or a theorem folder
+  --output_folder OUTPUT_FOLDER
+                        Directory to store the evaluation files
+  --retry_limit RETRY_LIMIT
+                        Number of retry attempts for each inference
+  --combine             Combine all results into a single JSON file
+  --bulk_evaluate       Evaluate a folder of theorems together
+  --target_fps TARGET_FPS
+                        Target FPS for video processing. If not set, original video FPS will be used
+  --use_parent_folder_as_topic
+                        Use parent folder name as topic name for single file evaluation
+  --max_workers MAX_WORKERS
+                        Maximum number of concurrent workers for parallel processing
+```
+* For `file_path`, it is recommended to pass a folder containing both an MP4 file and an SRT file.
+## Misc: Modify the system prompt in TheoremExplainAgent
+If you want to modify the system prompt, you need to:
+1. Modify files in `task_generator/prompts_raw` folder.
+2. Run `task_generator/parse_prompt.py` to rebuild the `__init__.py` file.
+```python
+cd task_generator
+python parse_prompt.py
+cd ..
+```
+## TheoremExplainBench (TEB)
+TheoremExplainBench can be found on https://huggingface.co/datasets/TIGER-Lab/TheoremExplainBench.
+How to use:
+```python
+import datasets
+dataset = datasets.load_dataset("TIGER-Lab/TheoremExplainBench")
+```
+Dataset info:
+```shell
+DatasetDict({
+    train: Dataset({
+        features: ['uid', 'subject', 'difficulty', 'theorem', 'description', 'subfield'],
+        num_rows: 240
+    })
+})
+```
+## ❓ FAQ
+The FAQ should cover the most common errors you could encounter. If you see something new, report it on issues.
+Q: Error `src.utils.kokoro_voiceover import KokoroService  # You MUST import like this as this is our custom voiceover service. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ModuleNotFoundError: No module named 'src'`. <br>
+A: Please run `export PYTHONPATH=$(pwd):$PYTHONPATH` when you start a new terminal. <br>
+Q: Error `Files not found` <br>
+A: Check your Manim installation. <br>
+Q: Error `latex ...` <br>
+A: Check your latex installation. <br>
+Q: The output log is not showing response? <br>
+A: It could be API-related issues. Make sure your `.env` file is properly configured (fill in your API keys), or you can enable litellm debug mode to figure out the issues. <be>
+Q: Plans / Scenes are missing? <br>
+A: It could be API-related issues. Make sure your `.env` file is properly configured (fill in your API keys), or you can enable litellm debug mode to figure out the issues. <br>
+## 🖊️ Citation
+Please kindly cite our paper if you use our code, data, models or results:
+```bibtex
+@misc{ku2025theoremexplainagentmultimodalexplanationsllm,
+      title={TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding},
+      author={Max Ku and Thomas Chong and Jonathan Leung and Krish Shah and Alvin Yu and Wenhu Chen},
+      year={2025},
+      eprint={2502.19400},
+      archivePrefix={arXiv},
+      primaryClass={cs.AI},
+      url={https://arxiv.org/abs/2502.19400},
+}
+```
+## 🎫 License
+This project is released under the [the MIT License](LICENSE).
+## ⭐ Star History
+[![Star History Chart](https://api.star-history.com/svg?repos=TIGER-AI-Lab/TheoremExplainAgent&type=Date)](https://star-history.com/#TIGER-AI-Lab/TheoremExplainAgent&Date)
+## 💞 Acknowledgements
+We want to thank [Votee AI](https://votee.ai/) for sponsoring API keys to access the close-sourced models.
+The code is built upon the below repositories, we thank all the contributors for open-sourcing.
+* [Manim Community](https://www.manim.community/)
+* [kokoro-manim-voiceover](https://github.com/xposed73/kokoro-manim-voiceover)
+* [manim-physics](https://github.com/Matheart/manim-physics)
+* [manim-Chemistry](https://github.com/UnMolDeQuimica/manim-Chemistry)
+* [ManimML](https://github.com/helblazer811/ManimML)
+* [manim-dsa](https://github.com/F4bbi/manim-dsa)
+* [manim-circuit](https://github.com/Mr-FuzzyPenguin/manim-circuit)
+## 🚨 Disclaimer
+**This work is intended for research purposes only. The authors do not encourage or endorse the use of this codebase for commercial applications. The code is provided "as is" without any warranties, and users assume all responsibility for its use.**
+Tested Environment: MacOS, Linux

app.py CHANGED Viewed

@@ -1,7 +1,167 @@
 import gradio as gr
-def greet(name):
-    return "Hello " + name + "!!"
-demo = gr.Interface(fn=greet, inputs="text", outputs="text")
-demo.launch()

 import gradio as gr
+import uuid
+import subprocess
+import threading
+import os
+import time
+from fastapi import FastAPI
+from fastapi.responses import FileResponse
+import asyncio
+# A simple in-memory dictionary to track task status.
+# For a production system, you'd use a database or Redis.
+tasks = {}
+def run_video_generation(task_id: str, topic: str, context: str):
+    """
+    This function runs the main generation script in a separate process.
+    """
+    tasks[task_id]['status'] = 'running'
+    # Sanitize topic to create a valid directory name
+    file_prefix = "".join(c if c.isalnum() else "_" for c in topic.lower())
+    output_dir = os.path.join("output", file_prefix)
+    command = [
+        "python", "generate_video.py",
+        "--model", "openai/o3-mini",      # Or get from request
+        "--topic", topic,
+        "--context", context,
+        "--output_dir", "output",
+        "--use_langfuse" # Assuming you have secrets set
+    ]
+    try:
+        # Using subprocess to run the existing script
+        process = subprocess.run(command, check=True, capture_output=True, text=True)
+        # Assume the final video is named based on the topic
+        # Note: The actual video path might differ. This is an assumption.
+        # You may need to parse the stdout from generate_video.py to get the exact path.
+        video_path = None
+        for file in os.listdir(output_dir):
+            if file.endswith("_combined.mp4"):
+                video_path = os.path.join(output_dir, file)
+                break
+        if video_path and os.path.exists(video_path):
+            tasks[task_id]['status'] = 'completed'
+            tasks[task_id]['video_path'] = video_path
+        else:
+            tasks[task_id]['status'] = 'failed'
+            tasks[task_id]['error'] = "Video file not found after generation."
+            tasks[task_id]['stdout'] = process.stdout
+            tasks[task_id]['stderr'] = process.stderr
+    except subprocess.CalledProcessError as e:
+        tasks[task_id]['status'] = 'failed'
+        tasks[task_id]['error'] = str(e)
+        tasks[task_id]['stdout'] = e.stdout
+        tasks[task_id]['stderr'] = e.stderr
+    except Exception as e:
+        tasks[task_id]['status'] = 'failed'
+        tasks[task_id]['error'] = str(e)
+def start_generation_thread(topic: str, context: str):
+    if not topic or not context:
+        return "Topic and Context cannot be empty.", "", None
+    task_id = str(uuid.uuid4())
+    tasks[task_id] = {'status': 'queued'}
+    # Use a background thread to run the time-consuming task
+    thread = threading.Thread(
+        target=run_video_generation,
+        args=(task_id, topic, context)
+    )
+    thread.start()
+    return f"Task started. Your Task ID is: {task_id}", task_id, None
+def check_status(task_id: str):
+    if not task_id:
+        return "Please provide a Task ID.", None
+    task = tasks.get(task_id)
+    if not task:
+        return "Task not found.", None
+    status = task.get('status')
+    if status == 'completed':
+        video_path = task.get('video_path')
+        return f"Status: {status}", video_path
+    elif status == 'failed':
+        error = task.get('error', 'Unknown error')
+        stdout = task.get('stdout', '')
+        stderr = task.get('stderr', '')
+        return f"Status: {status}\nError: {error}\nOutput: {stdout}\nStderr: {stderr}", None
+    return f"Status: {status}", None
+# We need a lightweight FastAPI app in the background to serve the video files.
+# Gradio can't serve files directly from arbitrary paths in a secure way.
+fastapi_app = FastAPI()
+@fastapi_app.get("/videos/{task_id}")
+def get_video(task_id: str):
+    """
+    Serves the final generated video file.
+    """
+    task = tasks.get(task_id)
+    if not task or task.get('status') != 'completed':
+        return {"error": "Task not completed or not found"}
+    video_path = task.get('video_path')
+    if not os.path.exists(video_path):
+        return {"error": "Video file not found."}
+    return FileResponse(video_path, media_type="video/mp4", filename=os.path.basename(video_path))
+# Gradio Interface
+with gr.Blocks() as demo:
+    gr.Markdown("# Theorem-Explain-Agent Video Generation")
+    gr.Markdown("Start a video generation task and check its status.")
+    with gr.Tab("Start Generation"):
+        topic_input = gr.Textbox(label="Topic", placeholder="e.g., The Pythagorean Theorem")
+        context_input = gr.Textbox(label="Context", placeholder="A short explanation of the theorem.")
+        start_button = gr.Button("Generate Video")
+        with gr.Column():
+            task_id_output = gr.Textbox(label="Task ID", interactive=False)
+            status_output_start = gr.Textbox(label="Status", interactive=False)
+    with gr.Tab("Check Status"):
+        task_id_input = gr.Textbox(label="Task ID", placeholder="Enter the Task ID you received.")
+        check_button = gr.Button("Check Status")
+        with gr.Column():
+            status_output_check = gr.Textbox(label="Status", interactive=False)
+            video_output = gr.Video(label="Generated Video")
+    # Actions
+    start_button.click(
+        fn=start_generation_thread,
+        inputs=[topic_input, context_input],
+        outputs=[status_output_start, task_id_output, video_output] # Clear video on new task
+    )
+    check_button.click(
+        fn=check_status,
+        inputs=[task_id_input],
+        outputs=[status_output_check, video_output]
+    )
+    gr.Markdown("### How to Use")
+    gr.Markdown(
+        "1.  Enter a `Topic` and `Context` in the 'Start Generation' tab and click 'Generate Video'.\n"
+        "2.  Copy the `Task ID` that appears.\n"
+        "3.  Go to the 'Check Status' tab, paste the `Task ID`, and click 'Check Status' periodically.\n"
+        "4.  When the generation is complete, the video will appear."
+    )
+# To run both Gradio and FastAPI, we mount the FastAPI app into Gradio's internal FastAPI app.
+app = gr.mount_ όπου(demo, fastapi_app, path="/")

data/thb_easy/chemistry.json ADDED Viewed

	@@ -0,0 +1,142 @@

+[
+    {
+        "theorem": "The Aufbau Principle",
+        "description": "Electrons fill atomic orbitals in order of increasing energy levels. This means the lowest energy orbitals are filled first, followed by higher energy orbitals. This helps in predicting electronic configuration and understanding the properties of elements.",
+        "difficulty": "Easy",
+        "remark": "Fundamental principle for building the electron configurations of atoms and understanding the periodic table.",
+        "subfield": "Atomic Structure"
+    },
+    {
+        "theorem": "The Law of Conservation of Mass",
+        "description": "In a closed system, the total mass of the reactants is equal to the total mass of the products. This implies that matter is neither created nor destroyed during a chemical reaction, only transformed. This principle is fundamental for understanding stoichiometry.",
+        "difficulty": "Easy",
+        "remark": "A cornerstone of chemistry, this principle allows us to balance chemical equations and make quantitative predictions.",
+        "subfield": "Chemical Reactions and Stoichiometry"
+    },
+    {
+        "theorem": "The Octet Rule",
+        "description": "Atoms tend to gain, lose, or share electrons in order to achieve a full outer shell of eight electrons (or two in the case of hydrogen and some other exceptions). This explains the bonding behaviour of most main group elements, guiding the formations of compounds.",
+        "difficulty": "Easy",
+        "remark": "Simple and powerful rule to understand the formations of chemical bonds and predict molecules' structures.",
+        "subfield": "Chemical Bonding"
+    },
+    {
+        "theorem": "Alkali metals",
+        "description": "The alkali metals consist of the chemical elements lithium (Li), sodium (Na), potassium (K), rubidium (Rb), caesium (Cs), and francium (Fr).",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Periodic Table and Elements"
+    },
+    {
+        "theorem": "Distillation",
+        "description": "In chemistry, Distillation is among the most useful methods available to chemists for separating the parts of a liquid. A process that relies on a cycle of heating, vaporization, condensing and cooling. A liquid of a lower boiling point will vaporize before a liquid of higher boiling point.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Separation Techniques"
+    },
+    {
+        "theorem": "Crystallization",
+        "description": "In chemistry, Crystallization, or crystallisation, is the process of atoms or molecules arranging into a well-defined, rigid crystal lattice in order to minimize their energetic state. The smallest entity of a crystal lattice is called a unit cell, which can accept atoms or molecules to grow a macroscopic crystal.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Solid State Chemistry"
+    },
+    {
+        "theorem": "Titration",
+        "description": "Titration is a common laboratory method of quantitative chemical analysis to determine the concentration of an identified analyte. A reagent, termed the titrant or titrator, is prepared as a standard solution of known concentration and volume.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Analytical Chemistry"
+    },
+    {
+        "theorem": "Ionic Compound",
+        "description": "An ionic compound is a chemical compound composed of ions. Ionic compounds are formed by the electrostatic attraction between positively charged cations and negatively charged anions.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Chemical Bonding"
+    },
+    {
+        "theorem": "Noble gas",
+        "description": "The noble gases are so named because they rarely react with other elements. Helium, neon, argon, krypton, xenon and radon atoms all have a full outer valence shell of electrons, which makes them quite unreactive.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Periodic Table and Elements"
+    },
+    {
+        "theorem": "Transition Metal",
+        "description": "Transition metal, any of various chemical elements that have valence electrons—i.e., electrons that can participate in the formation of chemical bonds—in two shells instead of only one.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Periodic Table and Elements"
+    },
+    {
+        "theorem": "Balance Chemical Equation",
+        "description": "A balanced equation is an equation for a chemical reaction in which the number of atoms for each element in the reaction and the total charge are the same for both the reactants and the products.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Chemical Reactions and Stoichiometry"
+    },
+    {
+        "theorem": "Combustion analysis",
+        "description": "Combustion analysis is a method used in both organic chemistry and analytical chemistry to determine the elemental composition (more precisely empirical formula) of a pure organic compound by combusting the sample under conditions where the resulting combustion products can be quantitatively analyzed.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Analytical Chemistry"
+    },
+    {
+        "theorem": "Oxidation",
+        "description": "In chemistry, the oxidation state, or oxidation number, is the hypothetical charge of an atom if all of its bonds to other atoms were fully ionic. It describes the degree of oxidation of an atom in a chemical compound. Conceptually, the oxidation state may be positive, negative or zero.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Redox Chemistry"
+    },
+    {
+        "theorem": "First law of thermodynamics",
+        "description": "The first law of thermodynamics is a formulation of the law of conservation of energy in the context of thermodynamic processes. The law distinguishes two principal forms of energy transfer, heat and thermodynamic work, that modify a thermodynamic system containing a constant amount of matter.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Thermodynamics"
+    },
+    {
+        "theorem": "Hess's Law",
+        "description": "The enthalpy change of a reaction is independent of the path taken from reactants to products. This allows the calculation of enthalpy changes for reactions that cannot be easily measured directly by using a series of reactions with known enthalpy changes. The overall enthalpy change is the sum of enthalpy changes of individual steps.",
+        "difficulty": "Easy",
+        "remark": "Useful for calculating enthalpy changes of complex reactions. It's based on the state function of enthalpy.",
+        "subfield": "Thermodynamics"
+    },
+    {
+        "theorem": "The Ideal Gas Law",
+        "description": "The product of the pressure and volume of an ideal gas is proportional to the product of the amount of gas and its absolute temperature: PV = nRT. This law describes the behavior of ideal gases and helps predict their volume, pressure, temperature, or amount under given conditions.",
+        "difficulty": "Easy",
+        "remark": "Ideal for understanding the behaviour of gases, often used in stoichiometry related to gases. Assumes no intermolecular forces or particle volume.",
+        "subfield": "Gas Laws"
+    },
+    {
+        "theorem": "Charles's Law",
+        "description": "Charles's law (also known as the law of volumes) is an experimental gas law that describes how gases tend to expand when heated.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Gas Laws"
+    },
+    {
+        "theorem": "Gay-Lussac's Law",
+        "description": "Gay-Lussac's law usually refers to Joseph-Louis Gay-Lussac's law of combining volumes of gases, discovered in 1808 and published in 1809.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Gas Laws"
+    },
+    {
+        "theorem": "pH Scale Definition",
+        "description": "pH is a measure of the hydrogen ion concentration in a solution.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Acid-Base Chemistry"
+    },
+    {
+        "theorem": "Van't Hoff Equation",
+        "description": "The Van 't Hoff equation has been widely utilized to explore the changes in state functions in a thermodynamic system. ",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Chemical Kinetics"
+    }
+]

data/thb_easy/comp_sci.json ADDED Viewed

	@@ -0,0 +1,142 @@

+[
+    {
+        "theorem": "The Pigeonhole Principle",
+        "description": "If you have more pigeons than pigeonholes, then at least one pigeonhole must contain more than one pigeon.  More formally, if *n* items are put into *m* containers, with *n > m*, then at least one container must contain more than one item.",
+        "difficulty": "Easy",
+        "remark": "A fundamental principle in combinatorics with surprising applications in various areas of computer science, like proving existence in hashing or data compression. Simple to understand, powerful in use.",
+        "subfield": "Discrete Mathematics"
+    },
+    {
+        "theorem": "De Morgan's Laws",
+        "description": "De Morgan's Laws provide a way to simplify or transform logical statements involving AND, OR, and NOT.  Specifically:  1) NOT (A AND B) is equivalent to (NOT A) OR (NOT B).  2) NOT (A OR B) is equivalent to (NOT A) AND (NOT B).",
+        "difficulty": "Easy",
+        "remark": "Crucial for boolean algebra and digital logic design. Helps with simplifying complex logic expressions and is widely used in programming.",
+        "subfield": "Boolean Algebra"
+    },
+    {
+        "theorem": "The Time Complexity of Linear Search",
+        "description": "In the worst-case scenario, searching for an element in an unsorted array using linear search requires O(n) time, where 'n' is the number of elements in the array. This is because the algorithm may need to examine every element in the array to find or conclude the non-existence of the target.",
+        "difficulty": "Easy",
+        "remark": "A foundational concept in algorithm analysis. Illustrates how the running time of an algorithm scales with the input size.",
+        "subfield": "Algorithm Analysis"
+    },
+    {
+        "theorem": "The Properties of a Binary Tree",
+        "description": "For a complete or full binary tree: 1) The maximum number of nodes at level *l* is 2^l (where the root is at level 0). 2)  The total number of nodes in a complete binary tree of *h* depth is 2^(h+1) - 1.",
+        "difficulty": "Easy",
+        "remark": "Fundamental for understanding and analyzing tree data structures. Used in many algorithmic designs.",
+        "subfield": "Data Structures"
+    },
+    {
+        "theorem": "The Triangle Inequality Theorem",
+        "description": "The triangle inequality states that for any three points A, B, and C in a metric space (e.g., the Euclidean plane), the sum of the lengths of any two sides of a triangle must be greater than or equal to the length of the third side. |AB| + |BC| >= |AC|",
+        "difficulty": "Easy",
+        "remark": "Often used in graph algorithms (e.g. proving properties of shortest path) . The principle is used as basis of many distance metrics.",
+        "subfield": "Computational Geometry"
+    },
+    {
+        "theorem": "Hamming distance",
+        "description": "In information theory, the Hamming distance between two strings or vectors of equal length is the number of positions at which the corresponding symbols are different.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Information Theory"
+    },
+    {
+        "theorem": "Big O notation",
+        "description": "most common type of asymptotic notation in computer science used to measure worst case complexity",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Algorithm Analysis"
+    },
+    {
+        "theorem": "Deadlock",
+        "description": "A deadlock is a situation where two or more processes are blocked waiting for each other to release resources, resulting in a circular wait condition.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Operating Systems"
+    },
+    {
+        "theorem": "Bubble Sort",
+        "description": "Bubble sort is a simple sorting algorithm that repeatedly steps through the list, compares adjacent elements and swaps them if they are in the wrong order.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Algorithms"
+    },
+    {
+        "theorem": "Karnaugh Map",
+        "description": "A Karnaugh map (K-map) is a graphical method for simplifying Boolean algebra expressions.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Digital Logic Design"
+    },
+    {
+        "theorem": "Hash table",
+        "description": "A hash table uses a hash function to compute an index, also called a hash code, into an array of buckets or slots, from which the desired value can be found.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Data Structures"
+    },
+    {
+        "theorem": "Linked list",
+        "description": "data structure that does not necessarily store elements next to each other and instead works by maintaining, for each element, a link to the next element in the list",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Data Structures"
+    },
+    {
+        "theorem": "Chain Code",
+        "description": "A chain code is a lossless compression based image segmentation method for binary images based upon tracing image contours. The basic principle of chain coding, like other contour codings, is to separately encode each connected component, or blob in the image.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Image Processing"
+    },
+    {
+        "theorem": "Signal-to-noise ratio",
+        "description": "The signal-to-noise ratio (SNR) is a measure of the ratio between the power of a signal and the power of background noise.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Signal Processing"
+    },
+    {
+        "theorem": "Run-length encoding",
+        "description": "Run-length encoding (RLE) is a form of data compression that encodes consecutive data elements by a single data value and count, rather than by the original data values.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Data Compression"
+    },
+    {
+        "theorem": "Elbow method",
+        "description": "The elbow method is a graphical method for finding the optimal K value in a k-means clustering algorithm.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Machine Learning"
+    },
+    {
+        "theorem": "Huffman coding",
+        "description": "In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Data Compression"
+    },
+    {
+        "theorem": "Paging",
+        "description": "Paging is a memory management technique used in operating systems to manage virtual memory. It involves dividing the virtual address space into fixed-size blocks called pages, and storing these pages in a secondary storage device called a paging file.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Operating Systems"
+    },
+    {
+        "theorem": "OSI model",
+        "description": "The Open Systems Interconnection (OSI) model is a conceptual framework that describes how data is sent over a network.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Computer Networks"
+    },
+    {
+        "theorem": "IEEE Convertion",
+        "description": "The IEEE-754 standard describes floating-point formats, a way to represent real numbers in hardware.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Computer Architecture"
+    }
+]

data/thb_easy/math.json ADDED Viewed

	@@ -0,0 +1,142 @@

+[
+    {
+        "theorem": "The Pythagorean Theorem",
+        "description": "In a right-angled triangle, the square of the length of the hypotenuse (the side opposite the right angle) is equal to the sum of the squares of the lengths of the other two sides. If a and b are the lengths of the legs and c is the length of the hypotenuse, then  a\u00b2 + b\u00b2 = c\u00b2.",
+        "difficulty": "Easy",
+        "remark": "Fundamental theorem in geometry; widely used in various fields.",
+        "subfield": "Geometry"
+    },
+    {
+        "theorem": "Properties of Kites",
+        "description": "A kite is a quadrilateral with two pairs of adjacent, congruent sides. In geometry, kites have several unique properties that distinguish them from other quadrilaterals. Here are some of the key properties of kites:\n\n1. Two pairs of adjacent sides are congruent: In a kite, there are two distinct pairs of adjacent sides that have equal length. This means that if one pair of sides has a length of 'a', the other pair will also have a length of 'a', and if the other pair has a length of 'b', the first pair will also have a length of 'b'.\n\n2. Diagonals are perpendicular: The diagonals of a kite intersect at a 90-degree angle, meaning they are perpendicular to each other.\n\n3. One diagonal is bisected: In a kite, one of the diagonals is bisected by the other diagonal, meaning it is divided into two equal parts. This property is true for the diagonal connecting the vertices between the congruent sides.\n\n4. One pair of opposite angles is congruent: In a kite, the angles between the congruent sides (the angles formed by the two pairs of equal sides) are congruent, meaning they have the same degree measure.\n\n5. Area: The area of a kite can be calculated using the lengths of its diagonals. If 'd1' and 'd2' are the lengths of the diagonals, the area of the kite is given by the formula: Area = (1/2) * d1 * d2.\n\n6. Circumscribed circle: A kite can have a circumscribed circle only if it is a rhombus (all sides are congruent) or a square (all sides and angles are congruent).\n\n7. Inscribed circle: A kite can have an inscribed circle only if it is a square (all sides and angles are congruent).\n\nThese properties make kites an interesting and unique type of quadrilateral in geometry.",
+        "difficulty": "Easy",
+        "remark": "Properties of kites are useful for solving geometry problems involving kites.",
+        "subfield": "Geometry"
+    },
+    {
+        "theorem": "Euler's formula",
+        "description": "Euler's formula is a fundamental equation in complex analysis that establishes a deep connection between trigonometry and complex exponentials. It is named after the Swiss mathematician Leonhard Euler. The formula is given by:\n\ne^(ix) = cos(x) + i*sin(x)\n\nwhere e is the base of the natural logarithm (approximately 2.71828), i is the imaginary unit (i^2 = -1), x is a real number, and cos(x) and sin(x) are the trigonometric functions cosine and sine, respectively.\n\nEuler's formula demonstrates that complex exponentials can be expressed in terms of trigonometric functions, and vice versa. This relationship is particularly useful in various fields of mathematics, physics, and engineering, as it simplifies calculations involving complex numbers and trigonometric functions.\n\nOne of the most famous consequences of Euler's formula is Euler's identity, which is obtained by setting x = \u03c0 in the formula:\n\ne^(i\u03c0) + 1 = 0\n\nEuler's identity is considered one of the most beautiful equations in mathematics, as it combines five fundamental constants (e, i, \u03c0, 1, and 0) in a simple and elegant relationship.",
+        "difficulty": "Easy",
+        "remark": "Euler's formula is widely used in various fields, including engineering, physics, and computer science.",
+        "subfield": "Complex Analysis"
+    },
+    {
+        "theorem": "Laws of Exponents",
+        "description": "The laws of exponents simplify the multiplication and division operations.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Algebra"
+    },
+    {
+        "theorem": "One-to-one function",
+        "description": "a function for which each value of the output is associated with a unique input value",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Functions"
+    },
+    {
+        "theorem": "Inverse function",
+        "description": "For any one-to-one function f(x), the inverse is a function f^(-1)(x) such that f^(-1)(f(x))=x for all x in the domain of f; this also implies that f(f^(-1)(x))=x for all x in the domain of f^(-1)",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Functions"
+    },
+    {
+        "theorem": "Remainder theorem",
+        "description": "The remainder theorem states that when a polynomial p(x) is divided by a linear polynomial (x - a), then the remainder is equal to p(a).",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Algebra"
+    },
+    {
+        "theorem": "Rational Zero Theorem",
+        "description": "The rational root theorem is also known as the rational zero theorem (or) the rational zero test (or) rational test theorem and is used to determine the rational roots of a polynomial function. ",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Algebra"
+    },
+    {
+        "theorem": "Product-to-sum formula",
+        "description": "The product-to-sum formulas are a set of formulas from trigonometric formulas.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Geometry"
+    },
+    {
+        "theorem": "Heron's formula",
+        "description": "Heron's formula is a formula that is used to find the area of a triangle when the lengths of all three sides are known.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Geometry"
+    },
+    {
+        "theorem": "De Moivre's Theorem",
+        "description": "Formula used to find the nth power or nth roots of a complex number; states that, for a positive integer n, z^n is found by raising the modulus to the nth power and multiplying the angles by n",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Complex Analysis"
+    },
+    {
+        "theorem": "Cramer's Rule",
+        "description": "a method for solving systems of equations that have the same number of equations as variables using determinants",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Algebra"
+    },
+    {
+        "theorem": "Angle of rotation",
+        "description": "An angle of rotation is the measure of the amount that a figure is rotated about a fixed point called a point of rotation.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Geometry"
+    },
+    {
+        "theorem": "Similar Triangles Theorem",
+        "description": "Two triangles are similar if their corresponding angles are equal and their corresponding sides are proportional.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Geometry"
+    },
+    {
+        "theorem": "Congruent Triangles Theorem",
+        "description": "Two triangles are congruent if they satisfy any of these criteria: SSS (Side-Side-Side), SAS (Side-Angle-Side), ASA (Angle-Side-Angle), AAS (Angle-Angle-Side), or HL (Hypotenuse-Leg) for right triangles.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Geometry"
+    },
+    {
+        "theorem": "Geometric Sequence",
+        "description": "For a geometric sequence with the first term a, common ratio r, and n terms, the sum is: S_n = a * (1 - r^n) / (1 - r) for r != 1",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Sequences and Series"
+    },
+    {
+        "theorem": "Arithmetic Sequence",
+        "description": "For an arithmetic sequence with the first term a, common difference d, and n terms, the sum is: S_n = (n/2) * (2a + (n-1)d)",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Sequences and Series"
+    },
+    {
+        "theorem": "Permutation",
+        "description": "The term permutation refers to a mathematical calculation of the number of ways a particular set can be arranged.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Combinatorics"
+    },
+    {
+        "theorem": "Directrix",
+        "description": "a line perpendicular to the axis of symmetry of a parabola; a line such that the ratio of the distance between the points on the conic and the focus to the distance to the directrix is constant.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Conic Sections"
+    },
+    {
+        "theorem": "Eccentricity",
+        "description": "the eccentricity of a conic section is a non-negative real number that uniquely characterizes its shape.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Conic Sections"
+    }
+]

data/thb_easy/physics.json ADDED Viewed

	@@ -0,0 +1,142 @@

+[
+    {
+        "theorem": "Ohm's Law",
+        "description": "The voltage (V) across a conductor is directly proportional to the current (I) flowing through it, given the resistance (R) remains constant. The formula is V = IR.  This law holds for many materials, particularly metals, and components like resistors.",
+        "difficulty": "Easy",
+        "remark": "A cornerstone of circuit analysis. While it is an approximation, it's incredibly useful in solving basic circuit problems. The 'resistance' is a macroscopic property representing the ease of electron movement.",
+        "subfield": "Electricity and Circuits"
+    },
+    {
+        "theorem": "Newton's First Law of Motion",
+        "description": "a body at rest remains at rest, or, if in motion, remains in motion at a constant velocity unless acted on by a net external force; also known as the law of inertia",
+        "difficulty": "Easy",
+        "remark": "This law is fundamental to understanding the relationship between force and motion. It establishes that forces cause acceleration which changes velocity. Applicable for solving motion problems where force and mass are known.",
+        "subfield": "Classical Mechanics"
+    },
+    {
+        "theorem": "Newton's Second Law of Motion",
+        "description": "The net force (F_net) acting on an object is equal to the mass (m) of the object multiplied by its acceleration (a). F_net = ma. This law is fundamental to understanding the relationship between force and motion.",
+        "difficulty": "Easy",
+        "remark": "This is one of the most important laws in classical mechanics. It establishes that forces cause acceleration which changes velocity. Applicable for solving motion problems where force and mass are known.",
+        "subfield": "Classical Mechanics"
+    },
+    {
+        "theorem": "Hooke's law",
+        "description": "In physics, Hooke's law is an empirical law which states that the force needed to extend or compress a spring by some distance scales linearly with respect to that distance.",
+        "difficulty": "Easy",
+        "remark": "This law is fundamental to understanding the relationship between force and motion. It establishes that forces cause acceleration which changes velocity. Applicable for solving motion problems where force and mass are known.",
+        "subfield": "Classical Mechanics"
+    },
+    {
+        "theorem": "Gravitational Force",
+        "description": "In physics, gravity is a fundamental interaction primarily observed as mutual attraction between all things that have mass.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Classical Mechanics"
+    },
+    {
+        "theorem": "Centrifugal force",
+        "description": "Centrifugal force is a fictitious force in Newtonian mechanics that appears to act on all objects when viewed in a rotating frame of reference. It appears to be directed radially away from the axis of rotation of the frame.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Classical Mechanics"
+    },
+    {
+        "theorem": "Kinetic energy",
+        "description": "In physics, the kinetic energy of an object is the form of energy that it possesses due to its motion. In classical mechanics, the kinetic energy of a non-rotating object of mass m traveling at a speed v is.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Classical Mechanics"
+    },
+    {
+        "theorem": "Torque",
+        "description": "Torque is a measure of the force that can cause an object to rotate about an axis. Just as force is what causes an object to accelerate in linear kinematics, torque is what causes an object to acquire angular acceleration. Torque is a vector quantity.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Classical Mechanics"
+    },
+    {
+        "theorem": "Right-hand rule",
+        "description": "The right hand rule is a hand mnemonic used in physics to identify the direction of axes or parameters that point in three dimensions.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Electromagnetism"
+    },
+    {
+        "theorem": "Snell's Law",
+        "description": "Relates the angles of incidence and refraction of light when passing between two different media. It states that n₁sin(θ₁) = n₂sin(θ₂), where n₁ and n₂ are the refractive indices of the two media, and θ₁ and θ₂ are the angles of incidence and refraction, respectively.",
+        "difficulty": "Easy",
+        "remark": "This theorem is fundamental to understanding how light bends when it travels through different materials, essential for studying optics (lenses, prisms). Its application involves using trigonometry.",
+        "subfield": "Optics"
+    },
+    {
+        "theorem": "The Ideal Gas Law",
+        "description": "Relates the pressure (P), volume (V), temperature (T), and the number of moles (n) of an ideal gas: PV = nRT, where R is the ideal gas constant. It serves as a good approximation for the behavior of real gases under certain conditions.",
+        "difficulty": "Easy",
+        "remark": "Connects macroscopic gas properties and allows calculations involving gas behavior under varied conditions. Applicable for thermodynamics problems and understanding gas pressure, volume and temperature relationship.",
+        "subfield": "Thermodynamics"
+    },
+    {
+        "theorem": "Pascal's Principle",
+        "description": "Pascal's law is a principle in fluid mechanics given by Blaise Pascal that states that a pressure change at any point in a confined incompressible fluid is transmitted throughout the fluid such that the same change occurs everywhere.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Fluid Mechanics"
+    },
+    {
+        "theorem": "Avogadro's number",
+        "description": "The concept of the mole can be used to convert between mass and number of particles.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Thermodynamics"
+    },
+    {
+        "theorem": "Dalton's law of partial pressures",
+        "description": "Dalton's law of partial pressures states that the total pressure of a mixture of gases is the sum of the partial pressures of its components.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Thermodynamics"
+    },
+    {
+        "theorem": "PV diagram",
+        "description": "a graph of pressure vs. volume",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Thermodynamics"
+    },
+    {
+        "theorem": "Color wavelengths",
+        "description": "The wavelength of a color is the range of nanometers (nm) at which it appears in the visible light spectrum.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Optics"
+    },
+    {
+        "theorem": "Ultrasound",
+        "description": "Ultrasound refers to sound waves with frequencies higher than the audible range for humans.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Waves and Sound"
+    },
+    {
+        "theorem": "Coulomb's law",
+        "description": "Coulomb's inverse-square law, or simply Coulomb's law, is an experimental law of physics that calculates the amount of force between two electrically charged particles at rest. This electric force is conventionally called the electrostatic force or Coulomb force.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Electromagnetism"
+    },
+    {
+        "theorem": "Kirchhoff's voltage law",
+        "description": "The sum of all the voltages around a loop is equal to zero.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Electricity and Circuits"
+    },
+    {
+        "theorem": "Thévenin's theorem",
+        "description": "Thévenin's theorem states that any linear circuit containing several voltage sources and resistors can be simplified to a Thévenin-equivalent circuit with a single voltage source and resistance connected in series with a load.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Electricity and Circuits"
+    }
+]

data/thb_hard/chemistry.json ADDED Viewed

	@@ -0,0 +1,142 @@

+[
+    {
+        "theorem": "The Henderson-Hasselbalch Equation",
+        "description": "The pH of a buffer solution is equal to the pKa of the weak acid plus the logarithm of the ratio of the concentration of the conjugate base to the concentration of the weak acid: pH = pKa + log([A-]/[HA]). It allows for the calculation of buffer solutions pH and predicting how pH would change with addition of acid or base",
+        "difficulty": "Hard",
+        "remark": "Crucial in understanding buffer solutions and titrations. Used in biochemistry extensively.",
+        "subfield": "Acid-Base Chemistry"
+    },
+    {
+        "theorem": "Bragg's law",
+        "description": "Bragg's law in chemistry describes how X-rays reflect off of a crystal surface.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Crystallography"
+    },
+    {
+        "theorem": "Debye-Scherrer Equation",
+        "description": "The Debye-Scherrer equation is used in chemistry to calculate the size of crystalline nanoparticles. It is based on X-ray diffraction (XRD) measurements.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Crystallography"
+    },
+    {
+        "theorem": "Hückel's Rule",
+        "description": "In organic chemistry, Hückel's rule predicts that a planar ring molecule will have aromatic properties if it has 4n + 2 π-electrons, where n is a non-negative integer.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Organic Chemistry"
+    },
+    {
+        "theorem": "Hard Acid Soft Base Theory",
+        "description": "Hard Acid Soft Base Theory (HSAB): This theory works on the principle that soft acid reacts with the soft base while hard acid reacts with the hard base",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Acid-Base Chemistry"
+    },
+    {
+        "theorem": "Pauli Exclusion Principle",
+        "description": "Pauli's Exclusion Principle states that no two electrons in the same atom can have identical values for all four of their quantum numbers.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Quantum Chemistry"
+    },
+    {
+        "theorem": "Crystal Field Theory",
+        "description": "Crystal field theory (CFT) describes the breaking of orbital degeneracy in transition metal complexes due to the presence of ligands.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Inorganic Chemistry"
+    },
+    {
+        "theorem": "Hohenberg-Kohn theorem",
+        "description": "The first Hohenberg–Kohn theorem states that 'the ground state of any interacting many particle system with a given fixed inter-particle interaction is a unique functional of the electron density n(r).",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Quantum Chemistry"
+    },
+    {
+        "theorem": "Frost–Ebsworth diagram",
+        "description": "A Frost diagram or Frost–Ebsworth diagram is a type of graph used by inorganic chemists in electrochemistry to illustrate the relative stability of a number of different oxidation states of a particular substance. The graph illustrates the free energy vs oxidation state of a chemical species.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Electrochemistry"
+    },
+    {
+        "theorem": "Coulson-Fischer Theorem",
+        "description": "In theoretical chemistry and molecular physics, Coulson–Fischer theory provides a quantum mechanical description of the electronic structure of molecules.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Quantum Chemistry"
+    },
+    {
+        "theorem": "Frank-Condon Principle",
+        "description": "The Franck-Condon Principle describes the intensities of vibronic transitions, or the absorption or emission of a photon.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Spectroscopy"
+    },
+    {
+        "theorem": "Nernst Equation",
+        "description": "The Nernst Equation enables the determination of cell potential under non-standard conditions.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Electrochemistry"
+    },
+    {
+        "theorem": "Slater's Rules",
+        "description": "The general principle behind Slater's Rule is that the actual charge felt by an electron is equal to what you'd expect the charge to be from a certain number of protons, but minus a certain amount of charge from other electrons.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Quantum Chemistry"
+    },
+    {
+        "theorem": "Langmuir Adsorption Isotherm",
+        "description": "A continuous monolayer of adsorbate molecules surrounding a homogeneous solid surface is the conceptual basis for this adsorption model.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Physical Chemistry"
+    },
+    {
+        "theorem": "Marcus Theory",
+        "description": "Marcus theory is a theory originally developed by Rudolph A. Marcus, starting in 1956, to explain the rates of electron transfer reactions.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Physical Chemistry"
+    },
+    {
+        "theorem": "Eyring Equation",
+        "description": "The Eyring equation is an equation used in chemical kinetics to describe changes in the rate of a chemical reaction against temperature.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Chemical Kinetics"
+    },
+    {
+        "theorem": "Woodward-Hoffmann Rules",
+        "description": "Robert Burns Woodward and Roald Hoffmann devised these set of rules to explain the stereochemistry of pericyclic reactions based on the orbital symmetry.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Organic Chemistry"
+    },
+    {
+        "theorem": "Born-Haber Cycle",
+        "description": "A Born–Haber cycle applies Hess's law to calculate the lattice enthalpy by comparing the standard enthalpy change of formation of the ionic compound (from the elements) to the enthalpy required to make gaseous ions from the elements. This lattice calculation is complex.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Thermodynamics"
+    },
+    {
+        "theorem": "Molecular Orbital Theory",
+        "description": "In chemistry, molecular orbital theory is a method for describing the electronic structure of molecules using quantum mechanics.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Quantum Chemistry"
+    },
+    {
+        "theorem": "Hammond Postulate",
+        "description": "The postulate, which George Hammond first proposed in 1955, states that if two states, such as a transition state and an unstable intermediate, occur consecutively during a reaction process and have nearly the same energy content, their interconversion will result in only a minor reorganisation of molecular structures.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Physical Chemistry"
+    }
+]

data/thb_hard/comp_sci.json ADDED Viewed

	@@ -0,0 +1,142 @@

+[
+    {
+        "theorem": "Evidence lower bound",
+        "description": "The evidence lower bound (ELBO) is a lower bound on the log-evidence of a model, which is a measure of how well the model fits the data.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Machine Learning"
+    },
+    {
+        "theorem": "Viterbi Algorithm",
+        "description": "The Viterbi Algorithm is a dynamic programming algorithm used for finding the most likely sequence of hidden states, known as the Viterbi path, in a Hidden Markov Model (HMM). It is named after its inventor, Andrew Viterbi, and is widely used in various applications such as speech recognition, natural language processing, and bioinformatics.\n\nA Hidden Markov Model (HMM) is a statistical model that represents a stochastic process involving a sequence of observable events and hidden states. In an HMM, the observable events are generated by the hidden states, which follow a Markov chain. The Markov chain is characterized by the transition probabilities between hidden states, and the emission probabilities of observable events given the hidden states.\n\nThe Viterbi Algorithm works by finding the most probable path of hidden states that generates the observed sequence of events. It does this by iteratively computing the maximum probability of reaching each state at each time step, considering all possible paths that lead to that state. The algorithm uses dynamic programming to efficiently compute these probabilities and store them in a trellis structure.\n\nHere's a high-level description of the Viterbi Algorithm:\n\n1. Initialization: Set the initial probabilities for each hidden state, considering the initial state probabilities and the emission probabilities for the first observed event.\n\n2. Recursion: For each subsequent observed event, compute the maximum probability of reaching each hidden state, considering all possible previous states and their transition probabilities. Update the emission probabilities for the current observed event.\n\n3. Termination: Identify the hidden state with the highest probability at the last time step.\n\n4. Traceback: Starting from the identified state in the termination step, backtrack through the trellis to find the most probable path of hidden states that generated the observed sequence.\n\nThe Viterbi Algorithm is an efficient and widely used method for decoding the hidden states in a Hidden Markov Model, providing valuable insights into the underlying structure of the stochastic process.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Dynamic Programming"
+    },
+    {
+        "theorem": "Fano's inequality",
+        "description": "In information theory, Fano's inequality relates the average information lost in a noisy channel to the probability of the categorization error.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Information Theory"
+    },
+    {
+        "theorem": "Message Passing algorithm",
+        "description": "Message passing algorithm is an iterative decoding algorithm factorizes the global function of many variables into product of simpler local functions, whose arguments are the subset of variables.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Machine Learning"
+    },
+    {
+        "theorem": "Maximal Planar Graph",
+        "description": "A maximal planar graph is a graph which can be embedded in the plane such that every face of the graph is a triangle.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Graph Theory"
+    },
+    {
+        "theorem": "Cayley's formula",
+        "description": "This formula tells how many trees can be constructed with N vertices.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Graph Theory"
+    },
+    {
+        "theorem": "Floyd's Cycle Finding Algorithm",
+        "description": "Also known as the tortoise and the hare algorithm, it is a pointer algorithm that uses two pointers which move at different speeds to find a cycle in a sequence.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Algorithms"
+    },
+    {
+        "theorem": "Sigma-Delta Modulation",
+        "description": "A sigma delta modulator converts this shunt voltage across the resistor, into high-frequency one-bit digital bitstream using oversampling and noise shaping.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Digital Signal Processing"
+    },
+    {
+        "theorem": "Kruskal's algorithm",
+        "description": "greedy algorithm that sorts the list of edges in the graph by weight.",
+        "difficulty": "Hard",
+        "remark": "A fundamental algorithm in graph theory. It's used in network design, spanning tree construction, and various optimization problems. Requires understanding of graph theory and greedy algorithms.",
+        "subfield": "Graph Theory"
+    },
+    {
+        "theorem": "Prim's algorithm",
+        "description": "greedy algorithm that maintains a priority queue of vertices in the graph ordered by connecting edge weight",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Graph Theory"
+    },
+    {
+        "theorem": "Region growing by pixel aggregation",
+        "description": "Region growing by pixel aggregation is a technique used in image processing to segment an image into regions based on the similarity of pixel values.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Image Processing"
+    },
+    {
+        "theorem": "Arithmetic coding",
+        "description": "Arithmetic coding is a lossless data compression technique that assigns a unique code to each symbol in a message based on its probability of occurrence.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Data Compression"
+    },
+    {
+        "theorem": "Expectation–maximization (EM) algorithm",
+        "description": "an expectation–maximization (EM) algorithm is an iterative method to find (local) maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Machine Learning"
+    },
+    {
+        "theorem": "Differential entropy",
+        "description": "Differential entropy, also known as continuous entropy, is a concept in information theory that extends the idea of entropy from discrete random variables to continuous random variables. Entropy, in general, is a measure of the uncertainty or randomness associated with a random variable. In the context of information theory, it quantifies the average amount of information required to describe the outcome of a random variable.\n\nFor discrete random variables, entropy is well-defined using the Shannon entropy formula, which sums the product of the probability of each outcome and the logarithm of its reciprocal probability. However, for continuous random variables, the probability of any specific outcome is zero, making the Shannon entropy formula inapplicable.\n\nDifferential entropy addresses this issue by considering the probability density function (pdf) of a continuous random variable instead of the probabilities of individual outcomes. The differential entropy H(X) of a continuous random variable X with a probability density function f(x) is defined as:\n\nH(X) = - \u222b f(x) * log(f(x)) dx\n\nwhere the integral is taken over the entire range of the random variable X, and log is the logarithm base 2 (or any other base, depending on the desired unit of measurement for entropy).\n\nDifferential entropy can be interpreted as the average amount of information required to describe the outcome of a continuous random variable with a given probability density function. However, unlike the entropy of discrete random variables, differential entropy can be negative, which occurs when the probability density function is highly concentrated around certain values.\n\nIt is important to note that differential entropy is not a direct extension of discrete entropy, and some properties of discrete entropy do not hold for differential entropy. For example, differential entropy is not invariant under changes of variables or coordinate transformations, whereas discrete entropy is invariant under permutations of the outcomes.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Information Theory"
+    },
+    {
+        "theorem": "Kullback–Leibler divergence",
+        "description": "a type of statistical distance: a measure of how much a model probability distribution Q is different from a true probability distribution P.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Information Theory"
+    },
+    {
+        "theorem": "Principal component analysis",
+        "description": "Principal component analysis (PCA) is a statistical method that reduces the dimensions of a dataset to a smaller set of components.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Machine Learning"
+    },
+    {
+        "theorem": "Self-attention",
+        "description": "Self-attention is a mechanism in neural networks that allows the model to focus on different parts of the input sequence when making predictions.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Machine Learning"
+    },
+    {
+        "theorem": "Adversarial training",
+        "description": "Adversarial Training is a machine learning technique that is primarily used for improving the robustness of models. It's a process where models are trained with malicious inputs (adversarial examples) alongside the genuine data.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Machine Learning"
+    },
+    {
+        "theorem": "Forward-Backward Algorithm",
+        "description": "The Forward-Backward Algorithm is a dynamic programming algorithm used in Hidden Markov Models (HMMs) to compute the posterior probabilities of hidden states given a sequence of observations. It is a stochastic process that combines both the forward and backward algorithms to efficiently compute these probabilities.\n\nThe algorithm consists of two main steps:\n\n1. Forward Algorithm:\nThe forward algorithm computes the probability of observing a particular sequence of observations up to a certain time step, given the hidden state at that time step. It calculates the forward probabilities, which are the joint probabilities of the observed sequence and the hidden state at each time step. The forward algorithm uses a recursive approach, where the forward probability at each time step is calculated based on the forward probabilities of the previous time step.\n\n2. Backward Algorithm:\nThe backward algorithm computes the probability of observing the remaining sequence of observations from a certain time step onwards, given the hidden state at that time step. It calculates the backward probabilities, which are the conditional probabilities of the future observations given the hidden state at each time step. Similar to the forward algorithm, the backward algorithm also uses a recursive approach, where the backward probability at each time step is calculated based on the backward probabilities of the next time step.\n\nAfter computing the forward and backward probabilities, the Forward-Backward Algorithm combines these probabilities to calculate the posterior probabilities of the hidden states at each time step. The posterior probability of a hidden state at a particular time step is the probability of that state given the entire sequence of observations. This is computed by multiplying the forward probability and the backward probability for that state at that time step and then normalizing the result.\n\nThe Forward-Backward Algorithm is widely used in various applications, such as speech recognition, natural language processing, and bioinformatics, where the goal is to infer the most likely sequence of hidden states given a sequence of observations.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Dynamic Programming"
+    },
+    {
+        "theorem": "Cook-Levin Theorem",
+        "description": "In computational complexity theory, the Cook–Levin theorem, also known as Cook's theorem, states that the Boolean satisfiability problem is NP-complete.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Computational Complexity"
+    }
+]

data/thb_hard/math.json ADDED Viewed

	@@ -0,0 +1,142 @@

+[
+    {
+        "theorem": "Taylor's theorem",
+        "description": "Taylor's theorem gives an approximation of a k-times differentiable function around a given point by a polynomial of degree k, called the k-th order Taylor polynomial.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Calculus"
+    },
+    {
+        "theorem": "Simpson's rule",
+        "description": "In numerical integration, Simpson's rules are several approximations for definite integrals, named after Thomas Simpson.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Numerical Analysis"
+    },
+    {
+        "theorem": "Velocity vector",
+        "description": "Velocity is the speed in combination with the direction of motion of an object.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Vector Calculus"
+    },
+    {
+        "theorem": "Double Riemann sum",
+        "description": "A double Riemann sum is a mathematical method used to approximate the value of a double integral over a two-dimensional region.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Multivariable Calculus"
+    },
+    {
+        "theorem": "Fubini's theorem",
+        "description": "Fubini's Theorem is a fundamental result in calculus that allows the evaluation of a double integral as an iterated integral, provided certain conditions are met. It simplifies the computation of double integrals over a rectangular or general region by breaking them into two single integrals.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Multivariable Calculus"
+    },
+    {
+        "theorem": "Jacobian matrix and determinant",
+        "description": "In vector calculus, the Jacobian matrix of a vector-valued function of several variables is the matrix of all its first-order partial derivatives.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Vector Calculus"
+    },
+    {
+        "theorem": "Green's theorem",
+        "description": "Green's theorem is used to integrate the derivatives in a particular plane.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Vector Calculus"
+    },
+    {
+        "theorem": "Stokes' theorem",
+        "description": "relates the flux integral over a surface S to a line integral around the boundary C of the surface S",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Vector Calculus"
+    },
+    {
+        "theorem": "Burnside's Lemma",
+        "description": "Burnside's Lemma, also known as the Cauchy-Frobenius Lemma or the Orbit-Counting Theorem, is a fundamental result in combinatorics that deals with counting the number of distinct elements in a set under the action of a group. It is particularly useful in counting problems involving symmetries and permutations.\n\nThe lemma is named after the British mathematician William Burnside, who contributed significantly to the development of group theory.\n\nStatement of Burnside's Lemma:\n\nLet G be a finite group that acts on a finite set X. Then the number of distinct orbits of X under the action of G is given by:\n\n(1/|G|) * \u03a3 |Fix(g)|\n\nwhere |G| is the order of the group (i.e., the number of elements in G), the sum is taken over all elements g in G, and |Fix(g)| is the number of elements in X that are fixed by the action of g (i.e., the number of elements x in X such that g(x) = x).\n\nIn simpler terms, Burnside's Lemma states that the number of distinct orbits (or equivalence classes) in a set under the action of a group can be found by averaging the number of fixed points of each group element.\n\nBurnside's Lemma is often used in combinatorial problems where we need to count the number of distinct configurations of an object, taking into account its symmetries. By applying the lemma, we can avoid overcounting configurations that are equivalent under a given symmetry operation.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Group Theory"
+    },
+    {
+        "theorem": "Lah Number",
+        "description": "In mathematics, the (signed and unsigned) Lah numbers are coefficients expressing rising factorials in terms of falling factorials and vice versa.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Combinatorics"
+    },
+    {
+        "theorem": "Ramsey's theorem",
+        "description": "Ramsey's theorem essentially states that if a structure (such as a graph or a set of numbers) is large enough, then some kind of order or regularity will always emerge, no matter how it is arranged or colored.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Combinatorics"
+    },
+    {
+        "theorem": "Schwarz Lemma theorem",
+        "description": "Schwarz Lemma is a fundamental result in complex analysis that provides a bound on the behavior of holomorphic functions (i.e., complex-differentiable functions) in the unit disk. It is named after the German mathematician Hermann Schwarz.\n\nStatement of Schwarz Lemma:\n\nLet f be a holomorphic function on the open unit disk D = {z \u2208 \u2102 : |z| < 1} such that f(0) = 0 and |f(z)| \u2264 1 for all z \u2208 D. Then, for all z \u2208 D, the following inequalities hold:\n\n1. |f(z)| \u2264 |z|\n2. |f'(0)| \u2264 1\n\nMoreover, if equality holds for some z \u2260 0 (i.e., |f(z)| = |z|) or |f'(0)| = 1, then f is a rotation, i.e., f(z) = e^(i\u03b8)z for some real \u03b8.\n\nThe Schwarz Lemma has several important consequences and generalizations in complex analysis, such as the Riemann Mapping Theorem and the Pick's Lemma. It is a powerful tool for understanding the behavior of holomorphic functions in the unit disk and provides a way to compare the size of their derivatives at the origin.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Complex Analysis"
+    },
+    {
+        "theorem": "Cauchy Riemann Theorem",
+        "description": "The Cauchy-Riemann Theorem is a fundamental result in complex analysis, a branch of mathematics that studies functions of complex variables. It provides necessary and sufficient conditions for a complex function to be holomorphic (complex differentiable) in a given domain.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Complex Analysis"
+    },
+    {
+        "theorem": "Morera's Theorem",
+        "description": "Morera's theorem, named after Giacinto Morera, gives an important criterion for proving that a function is holomorphic.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Complex Analysis"
+    },
+    {
+        "theorem": "Catalan-Mingantu Number",
+        "description": "The Catalan numbers are a sequence of natural numbers that occur in various counting problems, often involving recursively defined objects. ",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Combinatorics"
+    },
+    {
+        "theorem": "Liouville's theorem",
+        "description": "Liouville's theorem states that: The density of states in an ensemble of many identical states with different initial conditions is constant along every trajectory in phase space. It states that if one constructs an ensemble of paths, the probability density along the trajectory remains constant.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Complex Analysis"
+    },
+    {
+        "theorem": "Derangement Formula",
+        "description": "In combinatorial mathematics, a derangement is a permutation of the elements of a set in which no element appears in its original position.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Combinatorics"
+    },
+    {
+        "theorem": "Delian problem",
+        "description": "Doubling the cube, also known as the Delian problem, is an ancient geometric problem. Given the edge of a cube, the problem requires the construction of the edge of a second cube whose volume is double that of the first.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Geometry"
+    },
+    {
+        "theorem": "Polya's Enumeration Theorem",
+        "description": "Pólya's Enumeration Theorem, also known as Pólya's Counting Theorem, is a powerful result in combinatorics used to count distinct arrangements or configurations of objects that are invariant under a group of symmetries.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Combinatorics"
+    },
+    {
+        "theorem": "Cauchy's theorem",
+        "description": "Cauchy's Theorem is a fundamental result in group theory, a branch of abstract algebra. It provides a condition under which a finite group contains an element of a specific order. It is named after the French mathematician Augustin-Louis Cauchy.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Group Theory"
+    }
+]

data/thb_hard/physics.json ADDED Viewed

	@@ -0,0 +1,142 @@

+[
+    {
+        "theorem": "Boltzmann machine",
+        "description": "It is a statistical physics technique applied in the context of cognitive science. It is also classified as a Markov random field.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Statistical Physics"
+    },
+    {
+        "theorem": "Geometric Brownian Motion",
+        "description": "A geometric Brownian motion (GBM) (also known as exponential Brownian motion) is a continuous-time stochastic process in which the logarithm of the randomly varying quantity follows a Brownian motion (also called a Wiener process) with drift.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Statistical Physics"
+    },
+    {
+        "theorem": "Fermat's Principle",
+        "description": "Fermat's principle states that light travels between two points along the path that requires the least time, as compared to other nearby paths.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Optics"
+    },
+    {
+        "theorem": "Huygens's Principle",
+        "description": "The Huygens–Fresnel principle states that every point on a wavefront is itself the source of spherical wavelets, and the secondary wavelets emanating from different points mutually interfere. The sum of these spherical wavelets forms a new wavefront.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Optics"
+    },
+    {
+        "theorem": "Virial Theorem",
+        "description": "In mechanics, the virial theorem provides a general equation that relates the average over time of the total kinetic energy of a stable system of discrete particles, bound by a conservative force, with that of the total potential energy of the system.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Classical Mechanics"
+    },
+    {
+        "theorem": "Poynting Theorem",
+        "description": "It states that in a given volume, the stored energy changes at a rate given by the work done on the charges within the volume, minus the rate at which energy leaves the volume.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Electromagnetism"
+    },
+    {
+        "theorem": "Fresnel transmission equations",
+        "description": "Fresnel's equations describe the reflection and transmission of electromagnetic waves at an interface.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Optics"
+    },
+    {
+        "theorem": "Fourier Heat Conduction Law",
+        "description": "Fourier's law states that the negative gradient of temperature and the time rate of heat transfer is proportional to the area at right angles of that gradient through which the heat flows.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Thermodynamics"
+    },
+    {
+        "theorem": "Ampère's circuital law",
+        "description": "Ampere's circuital law states that the line integral of the magnetic field surrounding closed-loop equals to the number of times the algebraic sum of currents passing through the loop.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Electromagnetism"
+    },
+    {
+        "theorem": "Malus's Law",
+        "description": "Malus law states that the intensity of a plane-polarised light that passes through an analyser is directly proportional to the square of the cosine of the angle between the plane of the polariser and the transmission axis of the analyser.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Optics"
+    },
+    {
+        "theorem": "Van der Waals Equation",
+        "description": "The van der Waals equation is a mathematical formula that describes the behavior of real gases. It is an equation of state that relates the pressure, temperature, and molar volume in a fluid.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Thermodynamics"
+    },
+    {
+        "theorem": "Rayleigh Criterion",
+        "description": "The Rayleigh criterion is the generally accepted criterion for the minimum resolvable detail - the imaging process is said to be diffraction-limited when the first diffraction minimum of the image of one source point coincides with the maximum of another.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Optics"
+    },
+    {
+        "theorem": "Paschen Curve",
+        "description": "Paschen's law is an equation that gives the breakdown voltage, that is, the voltage necessary to start a discharge or electric arc, between two electrodes in a gas as a function of pressure and gap length.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Electromagnetism"
+    },
+    {
+        "theorem": "Chandrasekhar Limit",
+        "description": "The Chandrasekhar limit is the maximum mass that a star can have and still be a stable white dwarf.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Astrophysics"
+    },
+    {
+        "theorem": "Landau Damping",
+        "description": "Landau damping is a phenomena observed in plasma wherein there is an ex- ponential decay in the oscillations of the number density of electrons in a plasma (also referred to as Langmuir waves) and so stability is achieved in some area of the phase-space.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Plasma Physics"
+    },
+    {
+        "theorem": "Schwarzschild radius",
+        "description": "The Schwarzschild radius is the critical distance from the center of a massive body where the gravitational pull becomes so strong that not even light can escape, defining the boundary of a black hole.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Astrophysics"
+    },
+    {
+        "theorem": "Babinet's Principle",
+        "description": "In physics, Babinet's principle states that the diffraction pattern from an opaque body is identical to that from a hole of the same size and shape except for the overall forward beam intensity.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Optics"
+    },
+    {
+        "theorem": "Schrödinger's Cat",
+        "description": "Schrödinger's cat is a thought experiment in quantum mechanics that illustrates the paradoxical nature of quantum superposition and wave function collapse.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Quantum Mechanics"
+    },
+    {
+        "theorem": "Rayleigh Criterion for Resolution",
+        "description": "For a circular aperture, lens, or mirror, the Rayleigh criterion states that two images are just resolvable when the center of the diffraction pattern of one is directly over the first minimum of the diffraction pattern of the other.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Optics"
+    },
+    {
+        "theorem": "Navier-Stokes Equations",
+        "description": "In fluid mechanics, the Navier-Stokes equations are partial differential equations that express the flow of viscous fluids.",
+        "difficulty": "Hard",
+        "remark": "",
+        "subfield": "Fluid Mechanics"
+    }
+]

data/thb_medium/chemistry.json ADDED Viewed

	@@ -0,0 +1,142 @@

+[
+    {
+        "theorem": "Le Chatelier's Principle",
+        "description": "When a system at equilibrium is subjected to a change in condition (such as temperature, pressure, or concentration), the system will shift in a direction that relieves the stress and a new equilibrium will be established. This principle helps predict how equilibrium will shift in response to external changes.",
+        "difficulty": "Medium",
+        "remark": "Essential for understanding chemical equilibrium and its practical applications in industrial processes.",
+        "subfield": "Chemical Equilibrium"
+    },
+    {
+        "theorem": "The Pauli Exclusion Principle",
+        "description": "No two electrons in the same atom can have the same set of four quantum numbers (n, l, ml, ms). This limits the number of electrons that can occupy an orbital, which is max two electrons, with opposite spins (+1/2 and -1/2). This explains electronic configuration in atoms.",
+        "difficulty": "Medium",
+        "remark": "Essential for understanding electronic structure and the basis for chemical bonding.",
+        "subfield": "Quantum Chemistry"
+    },
+    {
+        "theorem": "Raoult's Law",
+        "description": "The partial vapor pressure of a component in an ideal solution is equal to the vapor pressure of the pure component multiplied by its mole fraction in the solution: P_A = P_A*  X_A. This helps to predict vapor pressure of ideal solutions and is a basis for colligative properties",
+        "difficulty": "Medium",
+        "remark": "Describes vapor pressure of solutions, useful in understanding boiling point elevation and freezing point depression.",
+        "subfield": "Physical Chemistry"
+    },
+    {
+        "theorem": "Beer-Lambert Law",
+        "description": "The absorbance of a solution is directly proportional to the concentration of the analyte and the path length of the light beam through the solution: A = \u03b5bc, where \u03b5 is molar absorptivity, b is path length, and c is the concentration. Useful in analytical chemistry for determining the concentration of a substance by measuring the light it absorbs.",
+        "difficulty": "Medium",
+        "remark": "Important in spectrophotometry for quantitative analysis of solutions.",
+        "subfield": "Analytical Chemistry"
+    },
+    {
+        "theorem": "Phase diagram",
+        "description": "Phase diagram is a graphical representation of the physical states of a substance under different conditions of temperature and pressure.",
+        "difficulty": "Medium",
+        "remark": "Useful in understanding the phase transitions of substances.",
+        "subfield": "Physical Chemistry"
+    },
+    {
+        "theorem": "Boyle's Law",
+        "description": "Raoult's law is a relation of physical chemistry, with implications in thermodynamics.",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Physical Chemistry"
+    },
+    {
+        "theorem": "Graham's Law of Effusion",
+        "description": "Graham's law of effusion was formulated by Scottish physical chemist Thomas Graham in 1848. Graham found experimentally that the rate of effusion of a gas is inversely proportional to the square root of the molar mass of its particles.",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Physical Chemistry"
+    },
+    {
+        "theorem": "Arrhenius Equation",
+        "description": "In physical chemistry, the Arrhenius equation is a formula for the temperature dependence of reaction rates.",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Chemical Kinetics"
+    },
+    {
+        "theorem": "Henry's law",
+        "description": "the proportional relationship between the concentration of dissolved gas in a solution and the partial pressure of the gas in contact with the solution",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Physical Chemistry"
+    },
+    {
+        "theorem": "Lewis Acid-Base Theory",
+        "description": "In the Lewis theory of acid-base reactions, bases donate pairs of electrons and acids accept pairs of electrons.",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Acid-Base Chemistry"
+    },
+    {
+        "theorem": "Clausius-Clapeyron Equation",
+        "description": "allows us to estimate the vapor pressure at another temperature.",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Thermodynamics"
+    },
+    {
+        "theorem": "Michaelis-Menten Kinetics",
+        "description": "In biochemistry, Michaelis–Menten kinetics, named after Leonor Michaelis and Maud Menten, is the simplest case of enzyme kinetics, applied to enzyme-catalysed reactions of one substrate and one product.",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Chemical Kinetics"
+    },
+    {
+        "theorem": "Gibbs Free Energy Equation",
+        "description": "The change in free energy, ΔG, is equal to the sum of the enthalpy plus the product of the temperature and entropy of the system.",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Thermodynamics"
+    },
+    {
+        "theorem": "Transition State Theory",
+        "description": "In chemistry, transition state theory (TST) explains the reaction rates of elementary chemical reactions.",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Chemical Kinetics"
+    },
+    {
+        "theorem": "Koopman's Theorem",
+        "description": "Koopmans' theorem states that the first ionization energy of a molecule is equal to the negative of the energy of the highest occupied molecular orbital (HOMO).",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Quantum Chemistry"
+    },
+    {
+        "theorem": "Recrystallization",
+        "description": "Recrystallization, also known as fractional crystallization, is a procedure for purifying an impure compound in a solvent.",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Analytical Chemistry"
+    },
+    {
+        "theorem": "Electrogravimetry",
+        "description": "Electrogravimetry is a method used to separate and quantify ions of a substance, usually a metal. In this process, the analyte solution is electrolyzed.",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Analytical Chemistry"
+    },
+    {
+        "theorem": "Kjeldahl Method",
+        "description": "The Kjeldahl method is a laboratory technique used to measure the amount of nitrogen in a sample. ",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Analytical Chemistry"
+    },
+    {
+        "theorem": "Liquid-Liquid Extraction",
+        "description": "Liquid–liquid extraction, also known as solvent extraction and partitioning, is a method to separate compounds or metal complexes, based on their relative solubilities in two different immiscible liquids, usually water (polar) and an organic solvent (non-polar).",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Analytical Chemistry"
+    },
+    {
+        "theorem": "Reflux",
+        "description": "Reflux is a laboratory technique where a reaction mixture is heated to boil and the vapors are condensed back into the reaction flask, allowing continuous heating without loss of volatile components.",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Laboratory Techniques"
+    }
+]

data/thb_medium/comp_sci.json ADDED Viewed

	@@ -0,0 +1,142 @@

+[
+    {
+        "theorem": "The Halting Problem (Undecidability)",
+        "description": "There is no general algorithm (or program) that can determine, for any arbitrary computer program and its input, whether the program will eventually halt (stop) or run forever.",
+        "difficulty": "Medium",
+        "remark": "A core concept in theoretical computer science. Introduces the idea of limits of computation. Understanding the proof (usually using diagonalization) is key to grasp the concept. Usually taught in discrete math or Theory of Computation.",
+        "subfield": "Theory of Computation"
+    },
+    {
+        "theorem": "The Time Complexity of Binary Search",
+        "description": "In the worst case, searching for an element in a sorted array using binary search requires O(log n) time, where n is the number of elements in the array. This efficiency arises from repeatedly dividing the search interval in half.",
+        "difficulty": "Medium",
+        "remark": "Highlights the power of divide-and-conquer algorithms. Illustrates why sorted data structures are often essential. Requires understanding of logarithms",
+        "subfield": "Algorithms"
+    },
+    {
+        "theorem": "The Correctness of Simple Sorting Algorithm (e.g. Bubble Sort)",
+        "description": "Bubble sort repeatedly compares adjacent elements and swaps them if they are in the wrong order. We can formally prove that after n-1 passes, the array will be sorted. Proving it involves demonstrating that the largest element is 'bubbled' to the end of the array in each pass, by using loop invariants.",
+        "difficulty": "Medium",
+        "remark": "Demonstrates how to formally analyze simple algorithms for their correctness and requires some understanding of loop invariants. Useful for introduction to proofs in algorithm.",
+        "subfield": "Algorithms"
+    },
+    {
+        "theorem": "The Church-Turing Thesis",
+        "description": "All models of computation that we know can compute what is Turing computable. In other words, if an effective method (algorithm) for solving a problem exists at all, then a Turing machine can also compute a solution, and vice versa.",
+        "difficulty": "Medium",
+        "remark": "A fundamental principle in theoretical computer science. It defines the limit of computability. It links different computational models to a single class. Requires an understanding of the Turing Machine.",
+        "subfield": "Theory of Computation"
+    },
+    {
+        "theorem": "The Relationship between Recursion and Induction",
+        "description": "Recursive functions can be proven correct and analyzed with mathematical induction. The base case of induction matches the base case in the recursive function. The induction step corresponds to the recursive step.",
+        "difficulty": "Medium",
+        "remark": "Connects two key concepts in Computer Science. Illustrates how induction can be used to prove correctness of recursive algorithms and mathematical induction can be used to define recursive functions. Important for formal analysis.",
+        "subfield": "Programming Fundamentals"
+    },
+    {
+        "theorem": "Chroma Subsampling",
+        "description": "Chroma subsampling is a technique used in digital image processing to reduce the amount of data required to represent an image. It involves reducing the number of color channels or samples per pixel in an image, typically by using fewer bits for chroma (color) information.",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Image Processing"
+    },
+    {
+        "theorem": "Median filtering",
+        "description": "Median filtering is a non-linear digital filtering technique that is used to remove noise from an image or signal. It works by replacing each pixel with the median value of the pixels in its neighborhood.",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Image Processing"
+    },
+    {
+        "theorem": "Shannon Lower bound",
+        "description": "The Shannon Lower Bound refers to a theoretical limit in information theory that represents the minimum entropy or information required to encode a random source. It is tied to the Shannon Entropy, which quantifies the average information content of a random variable. Here's a breakdown of what it means:",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Information Theory"
+    },
+    {
+        "theorem": "Dijkstra's algorithm",
+        "description": "maintains a priority queue of vertices in the graph ordered by distance from the start and repeatedly selects the next shortest path to an unconnected part of the graph",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Graph Theory"
+    },
+    {
+        "theorem": "K-means clustering",
+        "description": "K-means clustering is a method of clustering that partitions the dataset into K clusters, where each cluster is represented by its centroid or center point.",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Machine Learning"
+    },
+    {
+        "theorem": "K-nearest neighbors",
+        "description": "K-nearest neighbors (KNN) is a simple and effective classification algorithm that works by finding the K closest data points in the training set to a new data point and then assigning the class label based on the majority class of these neighbors.",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Machine Learning"
+    },
+    {
+        "theorem": "Gradient descent",
+        "description": "Common optimization algorithm used in machine learning to minimize a loss function.",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Machine Learning"
+    },
+    {
+        "theorem": "Markov Decision Processes",
+        "description": "A Markov decision process (MDP) refers to a stochastic decision-making process that uses a mathematical framework to model the decision-making of a dynamic system.",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Machine Learning"
+    },
+    {
+        "theorem": "ALOHA network",
+        "description": "ALOHA is basically a multiple access protocol which describes how all the terminals can access a medium without interfering at all with one another or even colliding. It operates at the data-link layer.",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Computer Networks"
+    },
+    {
+        "theorem": "Discrete Cosine Transform",
+        "description": "A discrete cosine transform (DCT) expresses a finite sequence of data points in terms of a sum of cosine functions oscillating at different frequencies.",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Digital Signal Processing"
+    },
+    {
+        "theorem": "Master Theorem",
+        "description": "The master theorem is used in calculating the time complexity of recurrence relations (divide and conquer algorithms) in a simple and quick way.",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Algorithms"
+    },
+    {
+        "theorem": "Fast Fourier Transform",
+        "description": "A fast Fourier transform (FFT) is an algorithm that computes the Discrete Fourier Transform (DFT) of a sequence, or its inverse (IDFT).",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Digital Signal Processing"
+    },
+    {
+        "theorem": "SR latch",
+        "description": "S-R latches i.e., Set-Reset latches are the simplest form of latches and are implemented using two inputs: S (Set) and R (Reset).",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Digital Logic"
+    },
+    {
+        "theorem": "TCP Reno",
+        "description": "TCP Reno is a classic congestion control algorithm that was introduced in the early 1990s. It uses a mechanism called additive increase multiplicative decrease (AIMD) to adjust the TCP window size, which is the amount of data that can be sent without waiting for an acknowledgment.",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Computer Networks"
+    },
+    {
+        "theorem": "Chord P2P Network and finger table",
+        "description": "Chord addresses peer addressability and peer findability and message routability challenges by organizing all peers in the P2P network into a single virtual ring.",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Computer Networks"
+    }
+]

data/thb_medium/math.json ADDED Viewed

	@@ -0,0 +1,142 @@

+[
+    {
+        "theorem": "The Factor Theorem",
+        "description": "A polynomial f(x) has a factor (x - a) if and only if f(a) = 0. This theorem helps in finding roots and factors of polynomials.",
+        "difficulty": "Medium",
+        "remark": "Crucial for solving polynomial equations and understanding polynomial behavior.",
+        "subfield": "Algebra"
+    },
+    {
+        "theorem": "The Law of Sines",
+        "description": "In any triangle, the ratio of the length of a side to the sine of its opposite angle is constant. If a, b, and c are the side lengths, and A, B, and C are the opposite angles, then a/sin(A) = b/sin(B) = c/sin(C).",
+        "difficulty": "Medium",
+        "remark": "Useful for solving triangles when you have angle-side relationships.",
+        "subfield": "Trigonometry"
+    },
+    {
+        "theorem": "The Binomial Theorem",
+        "description": "For any non-negative integer n and real numbers a and b, (a + b)^n = Σ(k=0 to n) [n choose k] a^(n-k) b^k, where [n choose k] is the binomial coefficient, also written as nCk. It gives a formula for expanding powers of binomials.",
+        "difficulty": "Medium",
+        "remark": "Important in algebra, combinatorics, and probability.",
+        "subfield": "Algebra"
+    },
+    {
+        "theorem": "The Intermediate Value Theorem",
+        "description": "If f(x) is a continuous function on a closed interval [a, b] and k is any number between f(a) and f(b), then there exists at least one number c in the interval [a, b] such that f(c) = k. This theorem helps to find roots and demonstrate the behavior of continuous functions.",
+        "difficulty": "Medium",
+        "remark": "Fundamental for understand continuous functions in calculus",
+        "subfield": "Calculus"
+    },
+    {
+        "theorem": "The Cosine Rule",
+        "description": "In any triangle, the square of the length of one side is equal to the sum of the squares of the lengths of the other two sides minus twice the product of the lengths of those two sides multiplied by the cosine of the angle between them. For a triangle with side lengths a, b, c, and opposite angles A, B, C:  a² = b² + c² - 2bc*cos(A). Similar formulas are valid for b² and c².",
+        "difficulty": "Medium",
+        "remark": "Used in any triangle to solve for sides and/or angles",
+        "subfield": "Trigonometry"
+    },
+    {
+        "theorem": "The Divergence Test",
+        "description": "If lim (n→∞) aₙ ≠ 0 or doesn't exist, then the series ∑aₙ diverges. It is a simple test to identify divergent series but will not be able to determine if the series is convergent.",
+        "difficulty": "Medium",
+        "remark": "An important initial check when examining series convergence.",
+        "subfield": "Calculus"
+    },
+    {
+        "theorem": "The Squeeze Theorem (or Sandwich Theorem)",
+        "description": "If g(x) ≤ f(x) ≤ h(x) for all x near a (except possibly at a), and if lim(x→a) g(x) = L and lim(x→a) h(x) = L, then lim(x→a) f(x) = L. Useful for evaluating limits when direct calculation is difficult, by bounding a function between two simpler functions.",
+        "difficulty": "Medium",
+        "remark": "Commonly used in calculus for finding challenging limits.",
+        "subfield": "Calculus"
+    },
+    {
+        "theorem": "The Chain Rule",
+        "description": "The chain rule is a formula for finding the derivative of a composite function. It states that the derivative of a function composed of two functions is the product of the derivative of the outer function and the derivative of the inner function.",
+        "difficulty": "Medium",
+        "remark": "Commonly used in calculus for finding the derivative of composite functions.",
+        "subfield": "Calculus"
+    },
+    {
+        "theorem": "Product Rule",
+        "description": "The product rule is a formula for finding the derivative of a product of two functions. It states that the derivative of a product of two functions is the sum of the product of the first function and the derivative of the second function, and the product of the second function and the derivative of the first function.",
+        "difficulty": "Medium",
+        "remark": "Commonly used in calculus for finding the derivative of products of functions.",
+        "subfield": "Calculus"
+    },
+    {
+        "theorem": "Quotient Rule",
+        "description": "The quotient rule is a formula for finding the derivative of a quotient of two functions. It states that the derivative of a quotient of two functions is the quotient of the derivative of the numerator and the denominator, minus the product of the numerator and the derivative of the denominator, all divided by the square of the denominator.",
+        "difficulty": "Medium",
+        "remark": "Commonly used in calculus for finding the derivative of quotients of functions.",
+        "subfield": "Calculus"
+    },
+    {
+        "theorem": "Power Rule",
+        "description": "The power rule is a formula for finding the derivative of a power of a function. It states that the derivative of a power of a function is the product of the power and the derivative of the function.",
+        "difficulty": "Medium",
+        "remark": "Commonly used in calculus for finding the derivative of powers of functions.",
+        "subfield": "Calculus"
+    },
+    {
+        "theorem": "Integration by Substitution",
+        "description": "Integration by substitution is a technique used to simplify the integration of a function by substituting a new variable for the original variable.",
+        "difficulty": "Medium",
+        "remark": "Commonly used in calculus for finding the integral of functions.",
+        "subfield": "Calculus"
+    },
+    {
+        "theorem": "Disk & Washer Method",
+        "description": "The washer method formula is used to find the volume of two functions that are rotated around the x-axis.",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Calculus"
+    },
+    {
+        "theorem": "Extreme value theorem",
+        "description": "if 𝑓 is a continuous function over a finite, closed interval, then 𝑓 has an absolute maximum and an absolute minimum",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Calculus"
+    },
+    {
+        "theorem": "Fermat's theorem",
+        "description": "if 𝑓 has a local extremum at 𝑐, then 𝑐 is a critical point of 𝑓",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Calculus"
+    },
+    {
+        "theorem": "Mean Value Theorem",
+        "description": "Mean Value Theorem states that if a function f is continuous on the closed interval [a,b] and differentiable on the open interval (a,b), then there exists a point c in the interval (a,b) such that f'(c) is equal to the function's average rate of change over [a,b].",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Calculus"
+    },
+    {
+        "theorem": "Newton-Raphson method",
+        "description": "The Newton-Raphson method, also known as the Newton's method, is a widely used iterative numerical technique for finding the approximate roots of a real-valued function. It is named after Sir Isaac Newton and Joseph Raphson, who independently developed the method in the 17th century.\n\nThe method is based on the idea of linear approximation, where a function is approximated by its tangent line at a given point. The intersection of this tangent line with the x-axis provides a better approximation of the root than the initial point. This process is then repeated iteratively until the desired level of accuracy is achieved.\n\nGiven a function f(x) and an initial guess x0 for the root, the Newton-Raphson method can be described by the following iterative formula:\n\nx1 = x0 - f(x0) / f'(x0)\n\nHere, f'(x0) is the derivative of the function f(x) evaluated at the point x0. The new approximation x1 is then used as the starting point for the next iteration, and the process is repeated until the difference between successive approximations is smaller than a predefined tolerance level or a maximum number of iterations is reached.\n\nThe Newton-Raphson method converges rapidly when the initial guess is close to the actual root and the function is well-behaved. However, the method may fail to converge or converge to a wrong root if the initial guess is not close enough to the actual root, or if the function has multiple roots, or if the derivative of the function is zero or nearly zero at the root.\n\nDespite these limitations, the Newton-Raphson method is widely used in various fields of science and engineering due to its simplicity and fast convergence properties when applied to well-behaved functions.",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Numerical Analysis"
+    },
+    {
+        "theorem": "Rolle's theorem",
+        "description": "Rolle's theorem or Rolle's lemma essentially states that any real-valued differentiable function that attains equal values at two distinct points must have at least one point, somewhere between them, at which the slope of the tangent line is zero.",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Calculus"
+    },
+    {
+        "theorem": "Second derivative test",
+        "description": "The second partial derivatives test classifies the point as a local maximum or local minimum.",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Calculus"
+    },
+    {
+        "theorem": "Pappus's Theorem",
+        "description": "Pappus's centroid theorem is either of two related theorems dealing with the surface areas and volumes of surfaces and solids of revolution.",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Geometry"
+    }
+]

data/thb_medium/physics.json ADDED Viewed

	@@ -0,0 +1,142 @@

+[
+    {
+        "theorem": "The Work-Energy Theorem",
+        "description": "The net work done on an object is equal to the change in its kinetic energy. Mathematically, this is expressed as  W_net = \u0394KE, where W_net is the net work and \u0394KE is the change in kinetic energy.",
+        "difficulty": "Medium",
+        "remark": "This theorem connects force, displacement, and energy. It's crucial for analyzing motion when forces are not constant or when the detailed time evolution is not needed. It's often used to solve problems involving motion and energy transfer.",
+        "subfield": "Classical Mechanics"
+    },
+    {
+        "theorem": "The Law of Conservation of Energy",
+        "description": "In a closed system, the total energy remains constant; it can transform from one form to another (e.g., potential to kinetic) but cannot be created or destroyed. Mathematically, E_total_initial = E_total_final.",
+        "difficulty": "Medium",
+        "remark": "This is a fundamental principle in physics, applicable to a wide range of scenarios from mechanics to thermodynamics. It simplifies problem-solving by focusing on energy balance rather than detailed force interactions.",
+        "subfield": "Classical Mechanics"
+    },
+    {
+        "theorem": "The Law of Universal Gravitation",
+        "description": "Any two objects with mass attract each other with a force that is directly proportional to the product of their masses and inversely proportional to the square of the distance between their centers.  F = G(m\u2081m\u2082)/r\u00b2, where G is the gravitational constant.",
+        "difficulty": "Medium",
+        "remark": "This law describes the gravitational force that governs the motions of celestial bodies and explains why things fall towards the earth. Its mathematical form shows the distance dependence of the gravitational force.",
+        "subfield": "Gravitation"
+    },
+    {
+        "theorem": "Archimedes' Principle",
+        "description": "An object immersed in a fluid experiences an upward buoyant force equal to the weight of the fluid displaced by the object. This principle explains buoyancy and is crucial for understanding why objects float or sink.",
+        "difficulty": "Medium",
+        "remark": "Connects the density of a fluid, the volume of displaced fluid, and the buoyant force. It's used to design boats and determine densities through buoyancy measurements.",
+        "subfield": "Fluid Mechanics"
+    },
+    {
+        "theorem": "The Doppler Effect",
+        "description": "Describes the change in frequency of a wave (sound or light) when the source and the observer are moving relative to each other.  The perceived frequency shifts higher when the source and observer move closer and lower when they move apart. The mathematical formulation differs for sound and light.",
+        "difficulty": "Medium",
+        "remark": "Has applications in areas like radar speed guns, medical imaging, astronomy for finding the recession velocity of galaxies. It's crucial in understanding wave phenomena in a dynamic context.",
+        "subfield": "Wave Physics"
+    },
+    {
+        "theorem": "The Principle of Superposition of Waves",
+        "description": "When two or more waves overlap in a medium, the resultant displacement at any point is the vector sum of the displacements of the individual waves at that point. This principle governs wave interference and diffraction phenomena.",
+        "difficulty": "Medium",
+        "remark": "Explains how waves combine with each other. Its application can create both constructive and destructive interference effects. Essential in understanding the behavior of light and sound, diffraction gratings.",
+        "subfield": "Wave Physics"
+    },
+    {
+        "theorem": "Kepler's laws of planetary motion",
+        "description": "These laws describe the motion of planets around the sun. Kepler's First Law states that planets orbit in elliptical paths with the sun at one of the two foci. Kepler's Second Law states that a line drawn from the sun to a planet sweeps out equal areas in equal times. Kepler's Third Law relates the orbital period of a planet to its average distance from the sun.",
+        "difficulty": "Medium",
+        "remark": "These laws are crucial for understanding the motion of planets and are used in astronomy and space science.",
+        "subfield": "Astrophysics"
+    },
+    {
+        "theorem": "Gauss's law",
+        "description": "Gauss's law states that the electric flux through any closed surface is equal to the charge enclosed by the surface divided by the permittivity of free space.",
+        "difficulty": "Medium",
+        "remark": "This law is fundamental to understanding the relationship between electric fields and charges. It's used in electrostatics and electromagnetism to calculate electric fields around charged objects.",
+        "subfield": "Electromagnetism"
+    },
+    {
+        "theorem": "Stokes' law",
+        "description": "Stokes' Law describes the force of viscous drag on a small spherical object moving through a viscous fluid.",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Fluid Mechanics"
+    },
+    {
+        "theorem": "Bernoulli's principle",
+        "description": "Bernoulli's principle is a key concept in fluid dynamics that relates pressure, density, speed and height. Bernoulli's principle states that an increase in the speed of a parcel of fluid occurs simultaneously with a decrease in either the pressure or the height above a datum.",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Fluid Mechanics"
+    },
+    {
+        "theorem": "Poiseuille's law",
+        "description": "the rate of laminar flow of an incompressible fluid in a tube.",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Fluid Mechanics"
+    },
+    {
+        "theorem": "Stefan-Boltzmann Law of Radiation",
+        "description": "The Stefan–Boltzmann law, also known as Stefan's law, describes the intensity of the thermal radiation emitted by matter in terms of that matter's temperature. It is named for Josef Stefan, who empirically derived the relationship, and Ludwig Boltzmann who derived the law theoretically.",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Thermodynamics"
+    },
+    {
+        "theorem": "Carnot cycle",
+        "description": "A Carnot cycle is an ideal thermodynamic cycle proposed by French physicist Sadi Carnot in 1824.",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Thermodynamics"
+    },
+    {
+        "theorem": "Electromagnetic spectrum",
+        "description": "The electromagnetic spectrum is the full range of electromagnetic radiation, organized by frequency or wavelength. The spectrum is divided into separate bands, with different names for the electromagnetic waves within each band.",
+        "difficulty": "Easy",
+        "remark": "",
+        "subfield": "Electromagnetism"
+    },
+    {
+        "theorem": "Ampere's law",
+        "description": "In classical electromagnetism, Ampère's circuital law relates the circulation of a magnetic field around a closed loop to the electric current passing through the loop.",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Electromagnetism"
+    },
+    {
+        "theorem": "Brewster's law",
+        "description": "Brewster's law is a relationship of light waves at the maximum polarization angle of light.",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Optics"
+    },
+    {
+        "theorem": "Brownian motion",
+        "description": "Brownian motion is the seemingly random motion of particles within a liquid or gas that emerges from constant collisions and redirection from impacting the atoms or molecules within the fluid. All matter is in constant motion which results in Brownian motion.",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Statistical Physics"
+    },
+    {
+        "theorem": "Hubble's law",
+        "description": "Hubble's law, also known as the Hubble–Lemaître law, is the observation in physical cosmology that galaxies are moving away from Earth at speeds proportional to their distance. In other words, the farther a galaxy is from the Earth, the faster it moves away.",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Astrophysics"
+    },
+    {
+        "theorem": "Tsiolkovsky rocket equation",
+        "description": "It is a mathematical equation that describes the motion of a rocket in a vacuum and is used to calculate the velocity, acceleration, and thrust of the rocket.",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Classical Mechanics"
+    },
+    {
+        "theorem": "Hall Effect",
+        "description": "Hall effect is a process in which a transverse electric field is developed in a solid material when the material carrying an electric current is placed in a magnetic field that is perpendicular to the current.",
+        "difficulty": "Medium",
+        "remark": "",
+        "subfield": "Electromagnetism"
+    }
+]

eval_suite/__init__.py ADDED Viewed

File without changes

eval_suite/image_utils.py ADDED Viewed

	@@ -0,0 +1,104 @@

+import os
+import tempfile
+import numpy as np
+from PIL import Image, ImageOps
+from moviepy import VideoFileClip
+from eval_suite.prompts_raw import _image_eval
+from eval_suite.utils import extract_json, convert_score_fields, calculate_geometric_mean
+from mllm_tools.utils import _prepare_text_image_inputs
+from src.core.parse_video import image_with_most_non_black_space
+def extract_key_frames(video_path, output_dir, num_chunks):
+    """Extract key frames from a video by dividing it into chunks and selecting representative frames.
+    Args:
+        video_path (str): Path to the input video file
+        output_dir (str): Directory where extracted frames will be saved
+        num_chunks (int): Number of chunks to divide the video into
+    Returns:
+        list: List of paths to the extracted key frames
+    """
+    # Create output directory if it doesn't exist
+    os.makedirs(output_dir, exist_ok=True)
+    # Extract all frames from the video
+    clip = VideoFileClip(video_path)
+    frames = list(clip.iter_frames(fps=1))  # one frame every second
+    total_frames = len(frames)
+    if total_frames == 0:
+        print("No frames extracted from the video.")
+        return []
+    # Determine the number of frames per chunk
+    frames_per_chunk = total_frames // num_chunks
+    num_chunks = min(num_chunks, (total_frames + frames_per_chunk - 1) // frames_per_chunk)
+    key_frames = []
+    # Process each chunk of frames
+    for i in range(num_chunks):
+        start_idx = i * frames_per_chunk
+        end_idx = min((i + 1) * frames_per_chunk, total_frames)
+        chunk_frames = frames[start_idx:end_idx]
+        if chunk_frames:
+            # Save the frame with most non-black space
+            output_path = os.path.join(output_dir, f"key_frame_{i+1}.jpg")
+            result = image_with_most_non_black_space(chunk_frames, output_path)
+        else:
+            print(f"No frames in chunk {i+1}. Skipping.")
+            result = None
+        if result is not None:
+            key_frames.append(output_path)
+    clip.close()
+    return key_frames
+def evaluate_sampled_images(model, video_path, description="No description provided", num_chunks=10, output_folder=None):
+    """Evaluate sampled frames from a video using an image evaluation model.
+    Args:
+        model: The image evaluation model to use
+        video_path (str): Path to the input video file
+        description (str, optional): Description of the video content. Defaults to "No description provided"
+        num_chunks (int, optional): Number of chunks to divide the video into. Defaults to 10
+        output_folder (str, optional): Directory for temporary files. Defaults to None
+    Returns:
+        dict: Dictionary containing evaluation scores and individual frame assessments with keys:
+            - evaluation: Dictionary of averaged scores for each criterion
+            - image_chunks: List of individual frame evaluation results
+    """
+    with tempfile.TemporaryDirectory(dir=output_folder) as temp_dir:
+        key_frames = extract_key_frames(video_path, temp_dir, num_chunks)
+        prompt = _image_eval.format(description=description)
+        responses = []
+        for key_frame in key_frames:
+            inputs = _prepare_text_image_inputs(prompt, key_frame)
+            response = model(inputs)
+            response_json = extract_json(response)
+            response_json = convert_score_fields(response_json)
+            responses.append(response_json)
+    criteria = list(responses[0]["evaluation"].keys())
+    scores_dict = {c: [] for c in criteria}
+    for response in responses:
+        for key, val in response["evaluation"].items():
+            scores_dict[key].append(val["score"])
+    res_score = {}
+    for key, scores in scores_dict.items():
+        res_score[key] = {"score": calculate_geometric_mean(scores)}
+    return {
+        "evaluation": res_score,
+        "image_chunks": responses
+    }

eval_suite/parse_prompt.py ADDED Viewed

	@@ -0,0 +1,54 @@

+import os
+from tqdm import tqdm
+def call_parse_prompt():
+    """
+    Locates the prompts_raw directory and generates an __init__.py file containing prompt texts.
+    Searches for prompts_raw directory in current and parent directories. Once found, calls
+    create_python_file_with_texts() to generate the __init__.py file.
+    """
+    current_file_path = os.path.abspath(__file__)
+    current_folder_path = os.path.dirname(current_file_path)
+    folder_path = os.path.join(current_folder_path, "prompts_raw")
+    # If prompts_raw not found in current directory, search parent directories
+    if not os.path.exists(folder_path):
+        parent_dir = current_folder_path
+        while parent_dir != os.path.dirname(parent_dir):  # Stop at root directory
+            parent_dir = os.path.dirname(parent_dir)
+            test_path = os.path.join(parent_dir, "prompts_raw")
+            if os.path.exists(test_path):
+                folder_path = test_path
+                break
+    output_file = os.path.join(folder_path, "__init__.py")
+    create_python_file_with_texts(folder_path, output_file)
+def create_python_file_with_texts(folder_path, output_file):
+    """
+    Creates a Python file containing prompt texts from .txt files.
+    Args:
+        folder_path (str): Path to directory containing prompt .txt files
+        output_file (str): Path where the output __init__.py file will be created
+    The function reads all .txt files in the given folder, converts their contents into
+    Python variables, and writes them to the output file. Variable names are derived from
+    file paths with special characters replaced.
+    """
+    with open(output_file, 'w', encoding='utf-8') as out_file:
+        out_file.write("# This file is generated automatically through parse_prompt.py\n\n")
+        txt_files = [file for root, dirs, files in os.walk(folder_path) for file in files if file.endswith(".txt")]
+        for file in tqdm(txt_files, desc="Processing files"):
+            file_path = os.path.join(folder_path, file)
+            var_name = "_" + file_path.replace(folder_path, "").replace(os.sep, "_").replace(".txt", "").strip("_")
+            with open(file_path, 'r', encoding='utf-8') as f:
+                content = f.read().replace('"""', '\"\"\"')
+                out_file.write(f'{var_name} = """{content}"""\n\n')
+if __name__ == "__main__":
+    call_parse_prompt()

eval_suite/prompts_raw/__init__.py ADDED Viewed

	@@ -0,0 +1,145 @@

+# This file is generated automatically through parse_prompt.py
+_video_eval_new = """# Task: Video Frame Quality Evaluation
+You are tasked with analyzing and scoring a chunk of a theorem explanation video. Note that you may not have the full context of the video. Your job is to assign a score from 1 to 5 for each criterion. Please provide a brief justification for your scores.
+## Evaluation Criteria
+1. **Visual Consistency**
+   - Style Consistency: Does the visual style remain consistent across frames?
+   - Smoothness: Are the motions and transitions smooth?
+## Scoring Instructions
+1. Assign a score from **1 to 5** for each dimension:
+   - **1**: Very poor quality, completely fails to meet the criteria.
+   - **2**: Below average, significant issues present.
+   - **3**: Acceptable, meets the basic criteria with minor issues.
+   - **4**: Good, performs well with no major issues.
+   - **5**: Excellent, fully meets or exceeds expectations.
+2. Provide a comprehensive evaluation for each dimension.
+3. Format your output in **JSON**
+### JSON Output Format
+```json
+{{
+  "overall_analysis": "[Provide a general assessment of the video's quality]",
+  "evaluation": {{
+    "visual_consistency": {{
+      "comprehensive_evaluation": "[Analysis of visual consistency]",
+      "score": [1-5]
+    }}
+  }}
+}}
+```
+Description of the theorem:
+{description}
+Video chunk:"""
+_text_eval_new = """You are a specialist in evaluating theorem explanation videos, known for giving clear and objective feedback. You will be given the transcript of a video. Your task is to evaluate and score the content of the video in several dimensions.
+### Task Objective
+1. Perform an overall analysis of the video.
+    * Identify the topic of the video.
+    * Note your general thoughts and impression of the video, and any findings and observations.
+2. Conduct a comprehensive evaluation and score each criterion in the given dimensions.
+    * Analyze how well or poorly the video meets each criterion.
+    * Assign a score from **1 to 5** for each dimension:
+        - **1**: Very poor quality, completely fails to meet the criteria.
+        - **2**: Below average, significant issues present.
+        - **3**: Acceptable, meets the basic criteria with minor issues.
+        - **4**: Good, performs well with no major issues.
+        - **5**: Excellent, fully meets or exceeds expectations.
+3. Output the results in the specified JSON format.
+### Evaluation Criteria
+1. **Accuracy and Depth**
+    - Does the narration explain the theorem accurately?
+    - Does the video provide intuitive and/or rigorous explanations for why the theorem holds?
+2. **Logical Flow**
+    - Does the video follow a clear and logical structure?
+    - Does the video present a coherent buildup of ideas?
+### Notes
+* You do not have access to the visual portion of the video as you are given only the textual portion. Do not reference or commentate on the visuals as they will be evaluated separately - just assume that there are reasonable visuals (e.g., geometric objects, graphs of functions, and calculations) to accompany the narration.
+* The evaluation criteria are intended to be independent of each other. Do not restate the same violation in multiple criteria; only consider it in the most relevant criterion.
+### Output Format
+```json
+{{
+  "overall_analysis": "[Overall analysis]",
+  "evaluation": {{
+    "accuracy_and_depth": {{
+      "comprehensive_evaluation": "[Analysis of accuracy and depth]",
+      "score": [1-5]
+    }},
+    "logical_flow": {{
+      "comprehensive_evaluation": "[Analysis of logical flow]",
+      "score": [1-5]
+    }}
+  }}
+}}
+```
+The transcript of the video is as follows:
+{transcript}
+"""
+_fix_transcript = """You are an expert in YouTube video transcripts. There is a transcript that was automatically generated through YouTube, so it lacks proper capitalization and punctuation. Your task is to fix the transcript so that there is proper punctuation, capitalization, and spacing. Do not make other modifications (e.g., keep the original word choice).
+You should enclose the fixed transcript with a <SCRIPT></SCRIPT> block, i.e.:
+<SCRIPT>
+(Fixed transcript here)
+</SCRIPT>
+Original transcript: {transcript}
+"""
+_image_eval = """# Task: Video Frame Quality Evaluation
+You are tasked with analyzing and scoring a frame taken from a theorem explanation video. Note that you may not have the context of the video, so the captured frame may be a frame where some motion of visual elements is taking place. Your job is to assign a score from 1 to 5 for each criterion. Please provide a brief justification for your scores.
+## Evaluation Criteria
+1. **Visual Relevance**
+   - Does the video frame align with the theorem's concepts and derivations?
+2. **Element Layout**
+   - Placemend and Size: Are the visual elements well-placed and appropriately sized within the frame?
+   - Overlap: Are the visual elements free of unintentional overlap?
+   - Clarity: Is the visual information conveyed in the frame clear and easy to understand?
+## Scoring Instructions
+1. Assign a score from **1 to 5** for each dimension:
+   - **1**: Very poor quality, completely fails to meet the criteria.
+   - **2**: Below average, significant issues present.
+   - **3**: Acceptable, meets the basic criteria with minor issues.
+   - **4**: Good, performs well with no major issues.
+   - **5**: Excellent, fully meets or exceeds expectations.
+2. Provide a comprehensive evaluation for each dimension.
+3. Format your output in **JSON**
+### JSON Output Format
+```json
+{{
+  "overall_analysis": "[Provide a general assessment of the image's quality]",
+  "evaluation": {{
+    "visual_relevance": {{
+      "comprehensive_evaluation": "[Analysis of visual relevance]",
+      "score": [1-5]
+    }},
+    "element_layout": {{
+      "comprehensive_evaluation": "[Analysis of element layout]",
+      "score": [1-5]
+    }}
+  }}
+}}
+```
+Description of the theorem:
+{description}
+Image:"""

eval_suite/prompts_raw/fix_transcript.txt ADDED Viewed

	@@ -0,0 +1,8 @@

+You are an expert in YouTube video transcripts. There is a transcript that was automatically generated through YouTube, so it lacks proper capitalization and punctuation. Your task is to fix the transcript so that there is proper punctuation, capitalization, and spacing. Do not make other modifications (e.g., keep the original word choice).
+You should enclose the fixed transcript with a <SCRIPT></SCRIPT> block, i.e.:
+<SCRIPT>
+(Fixed transcript here)
+</SCRIPT>
+Original transcript: {transcript}

eval_suite/prompts_raw/image_eval.txt ADDED Viewed

	@@ -0,0 +1,45 @@

+# Task: Video Frame Quality Evaluation
+You are tasked with analyzing and scoring a frame taken from a theorem explanation video. Note that you may not have the context of the video, so the captured frame may be a frame where some motion of visual elements is taking place. Your job is to assign a score from 1 to 5 for each criterion. Please provide a brief justification for your scores.
+## Evaluation Criteria
+1. **Visual Relevance**
+   - Does the video frame align with the theorem's concepts and derivations?
+2. **Element Layout**
+   - Placemend and Size: Are the visual elements well-placed and appropriately sized within the frame?
+   - Overlap: Are the visual elements free of unintentional overlap?
+   - Clarity: Is the visual information conveyed in the frame clear and easy to understand?
+## Scoring Instructions
+1. Assign a score from **1 to 5** for each dimension:
+   - **1**: Very poor quality, completely fails to meet the criteria.
+   - **2**: Below average, significant issues present.
+   - **3**: Acceptable, meets the basic criteria with minor issues.
+   - **4**: Good, performs well with no major issues.
+   - **5**: Excellent, fully meets or exceeds expectations.
+2. Provide a comprehensive evaluation for each dimension.
+3. Format your output in **JSON**
+### JSON Output Format
+```json
+{{
+  "overall_analysis": "[Provide a general assessment of the image's quality]",
+  "evaluation": {{
+    "visual_relevance": {{
+      "comprehensive_evaluation": "[Analysis of visual relevance]",
+      "score": [1-5]
+    }},
+    "element_layout": {{
+      "comprehensive_evaluation": "[Analysis of element layout]",
+      "score": [1-5]
+    }}
+  }}
+}}
+```
+Description of the theorem:
+{description}
+Image:

eval_suite/prompts_raw/text_eval_new.txt ADDED Viewed

	@@ -0,0 +1,47 @@

+You are a specialist in evaluating theorem explanation videos, known for giving clear and objective feedback. You will be given the transcript of a video. Your task is to evaluate and score the content of the video in several dimensions.
+### Task Objective
+1. Perform an overall analysis of the video.
+    * Identify the topic of the video.
+    * Note your general thoughts and impression of the video, and any findings and observations.
+2. Conduct a comprehensive evaluation and score each criterion in the given dimensions.
+    * Analyze how well or poorly the video meets each criterion.
+    * Assign a score from **1 to 5** for each dimension:
+        - **1**: Very poor quality, completely fails to meet the criteria.
+        - **2**: Below average, significant issues present.
+        - **3**: Acceptable, meets the basic criteria with minor issues.
+        - **4**: Good, performs well with no major issues.
+        - **5**: Excellent, fully meets or exceeds expectations.
+3. Output the results in the specified JSON format.
+### Evaluation Criteria
+1. **Accuracy and Depth**
+    - Does the narration explain the theorem accurately?
+    - Does the video provide intuitive and/or rigorous explanations for why the theorem holds?
+2. **Logical Flow**
+    - Does the video follow a clear and logical structure?
+    - Does the video present a coherent buildup of ideas?
+### Notes
+* You do not have access to the visual portion of the video as you are given only the textual portion. Do not reference or commentate on the visuals as they will be evaluated separately - just assume that there are reasonable visuals (e.g., geometric objects, graphs of functions, and calculations) to accompany the narration.
+* The evaluation criteria are intended to be independent of each other. Do not restate the same violation in multiple criteria; only consider it in the most relevant criterion.
+### Output Format
+```json
+{{
+  "overall_analysis": "[Overall analysis]",
+  "evaluation": {{
+    "accuracy_and_depth": {{
+      "comprehensive_evaluation": "[Analysis of accuracy and depth]",
+      "score": [1-5]
+    }},
+    "logical_flow": {{
+      "comprehensive_evaluation": "[Analysis of logical flow]",
+      "score": [1-5]
+    }}
+  }}
+}}
+```
+The transcript of the video is as follows:
+{transcript}

eval_suite/prompts_raw/video_eval_new.txt ADDED Viewed

	@@ -0,0 +1,37 @@

+# Task: Video Frame Quality Evaluation
+You are tasked with analyzing and scoring a chunk of a theorem explanation video. Note that you may not have the full context of the video. Your job is to assign a score from 1 to 5 for each criterion. Please provide a brief justification for your scores.
+## Evaluation Criteria
+1. **Visual Consistency**
+   - Style Consistency: Does the visual style remain consistent across frames?
+   - Smoothness: Are the motions and transitions smooth?
+## Scoring Instructions
+1. Assign a score from **1 to 5** for each dimension:
+   - **1**: Very poor quality, completely fails to meet the criteria.
+   - **2**: Below average, significant issues present.
+   - **3**: Acceptable, meets the basic criteria with minor issues.
+   - **4**: Good, performs well with no major issues.
+   - **5**: Excellent, fully meets or exceeds expectations.
+2. Provide a comprehensive evaluation for each dimension.
+3. Format your output in **JSON**
+### JSON Output Format
+```json
+{{
+  "overall_analysis": "[Provide a general assessment of the video's quality]",
+  "evaluation": {{
+    "visual_consistency": {{
+      "comprehensive_evaluation": "[Analysis of visual consistency]",
+      "score": [1-5]
+    }}
+  }}
+}}
+```
+Description of the theorem:
+{description}
+Video chunk:

eval_suite/text_utils.py ADDED Viewed

	@@ -0,0 +1,80 @@

+from typing import Union
+import pysrt
+from mllm_tools.litellm import LiteLLMWrapper
+from mllm_tools.gemini import GeminiWrapper
+from mllm_tools.utils import _prepare_text_inputs
+from eval_suite.prompts_raw import _fix_transcript, _text_eval_new
+from eval_suite.utils import extract_json, convert_score_fields
+def parse_srt_to_text(srt_path) -> str:
+    """
+    Parse an SRT subtitle file into plain text.
+    Args:
+        srt_path: Path to the SRT subtitle file.
+    Returns:
+        str: The subtitle text with duplicates removed and ellipses replaced.
+    """
+    subs = pysrt.open(srt_path)
+    full_text = []
+    for sub in subs:
+        sub.text = sub.text.replace("...", ".")
+        for line in sub.text.splitlines():
+            # .srt can contain repeated lines
+            if full_text and full_text[-1] == line:
+                continue
+            full_text.append(line)
+    return "\n".join(full_text)
+def fix_transcript(text_eval_model: Union[LiteLLMWrapper, GeminiWrapper], transcript: str) -> str:
+    """
+    Fix and clean up a transcript using an LLM model.
+    Args:
+        text_eval_model: The LLM model wrapper to use for fixing the transcript.
+        transcript: The input transcript text to fix.
+    Returns:
+        str: The fixed and cleaned transcript text.
+    """
+    print("Fixing transcript...")
+    prompt = _fix_transcript.format(transcript=transcript)
+    response = text_eval_model(_prepare_text_inputs(prompt))
+    fixed_script = response.split("<SCRIPT>", maxsplit=1)[1].split("</SCRIPT>")[0]
+    return fixed_script
+def evaluate_text(text_eval_model: LiteLLMWrapper, transcript: str, retry_limit: int) -> dict:
+    """
+    Evaluate transcript text using an LLM model with retry logic.
+    Args:
+        text_eval_model: The LLM model wrapper to use for evaluation.
+        transcript: The transcript text to evaluate.
+        retry_limit: Maximum number of retry attempts on failure.
+    Returns:
+        dict: The evaluation results as a JSON object.
+    Raises:
+        ValueError: If all retry attempts fail.
+    """
+    # prompt = _text_eval.format(transcript=transcript)
+    prompt = _text_eval_new.format(transcript=transcript)
+    for attempt in range(retry_limit):
+        try:
+            evaluation = text_eval_model(_prepare_text_inputs(prompt))
+            evaluation_json = extract_json(evaluation)
+            evaluation_json = convert_score_fields(evaluation_json)
+            return evaluation_json
+        except Exception as e:
+            print(f"Attempt {attempt + 1} failed: {e.__class__.__name__}: {e}")
+            if attempt + 1 == retry_limit:
+                raise ValueError("Reached maximum retry limit. Evaluation failed.") from None

eval_suite/utils.py ADDED Viewed

	@@ -0,0 +1,81 @@

+import json
+import re
+from math import prod
+from typing import List
+def extract_json(response: str) -> dict:
+    """
+    Extract JSON content from a string response.
+    Args:
+        response (str): String containing JSON content, possibly within code blocks.
+    Returns:
+        dict: Extracted and parsed JSON content.
+    Raises:
+        ValueError: If no valid JSON content could be extracted.
+    """
+    try:
+        evaluation_json = json.loads(response)
+    except json.JSONDecodeError:
+        # If JSON parsing fails, try to extract the content between ```json and ```
+        match = re.search(r'```json\n(.*?)\n```', response, re.DOTALL)
+        if not match:
+            # If no match for ```json, try to extract content between ``` and ```
+            match = re.search(r'```\n(.*?)\n```', response, re.DOTALL)
+        if match:
+            evaluation_content = match.group(1)
+            evaluation_json = json.loads(evaluation_content)
+        else:
+            raise ValueError("Failed to extract valid JSON content")
+    return evaluation_json
+def convert_score_fields(data: dict) -> dict:
+    """
+    Convert score fields in a dictionary to integers recursively.
+    Args:
+        data (dict): Dictionary containing score fields to convert.
+    Returns:
+        dict: Dictionary with score fields converted to integers.
+    Raises:
+        ValueError: If a score value cannot be converted to integer.
+    """
+    # Create a new dictionary with the converted values
+    converted_data = {}
+    for key, value in data.items():
+        if key == "score":
+            if isinstance(value, int):
+                converted_data[key] = value
+            elif isinstance(value, str) and value.isdigit():
+                converted_data[key] = int(value)
+            else:
+                raise ValueError(f"Invalid score value: {value!r}")
+        elif isinstance(value, dict):
+            converted_data[key] = convert_score_fields(value)
+        else:
+            converted_data[key] = value
+    return converted_data
+def calculate_geometric_mean(scores: List[int]) -> float:
+    """
+    Calculate the geometric mean of a list of scores.
+    Args:
+        scores (List[int]): List of integer scores, may contain None values.
+    Returns:
+        float: Geometric mean of non-None scores. Returns 0.0 if list is empty
+            or contains only None values.
+    """
+    scores = [s for s in scores if s is not None]
+    if not scores:
+        return 0.0
+    product = prod(scores)
+    return product ** (1 / len(scores))

eval_suite/video_utils.py ADDED Viewed

	@@ -0,0 +1,167 @@

+import os
+import cv2
+import tempfile
+from dotenv import load_dotenv
+from mllm_tools.utils import _prepare_text_video_inputs
+from eval_suite.prompts_raw import _video_eval_new
+from eval_suite.utils import extract_json, convert_score_fields
+load_dotenv()
+def reduce_video_framerate(input_path, target_fps=1, output_path=None):
+    """
+    Reduces the frame rate of a video by only keeping frames at the target interval.
+    Args:
+        input_path (str): Path to the input video
+        target_fps (int): Target frames per second (default: 1)
+        output_path (str, optional): Path to save the processed video. If None, uses a temporary file.
+    Returns:
+        str: Path to the processed video
+    Raises:
+        ValueError: If input video cannot be opened or has invalid FPS
+        RuntimeError: If video writer initialization fails or output video creation fails
+    """
+    cap = cv2.VideoCapture(input_path)
+    if not cap.isOpened():
+        raise ValueError(f"Could not open input video: {input_path}")
+    original_fps = cap.get(cv2.CAP_PROP_FPS)
+    if original_fps <= 0:
+        raise ValueError(f"Invalid FPS ({original_fps}) detected in input video")
+    frame_interval = int(original_fps / target_fps)
+    # Get video properties
+    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
+    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
+    # Use provided output path or create temporary file
+    if output_path is None:
+        temp_output = tempfile.NamedTemporaryFile(suffix='.mp4', delete=False)
+        output_path = temp_output.name
+    # Ensure output directory exists
+    os.makedirs(os.path.dirname(output_path), exist_ok=True)
+    # Try different codecs in order of preference
+    codecs = [
+        ('avc1', '.mp4'),  # H.264 codec
+        ('mp4v', '.mp4'),  # MP4V codec
+        ('XVID', '.avi'),  # XVID codec
+        ('MJPG', '.avi'),  # Motion JPEG codec
+    ]
+    success = False
+    for codec, ext in codecs:
+        if output_path.endswith('.mp4') and not ext.endswith('.mp4'):
+            # If we're switching to AVI format, change the extension
+            output_path = output_path[:-4] + ext
+        fourcc = cv2.VideoWriter_fourcc(*codec)
+        out = cv2.VideoWriter(output_path, fourcc, target_fps, (width, height))
+        if out.isOpened():
+            success = True
+            print(f"Successfully initialized video writer with codec: {codec}")
+            break
+        else:
+            out.release()
+            if os.path.exists(output_path):
+                os.remove(output_path)
+    if not success:
+        raise RuntimeError("Could not initialize video writer with any available codec")
+    frame_count = 0
+    frames_written = 0
+    while cap.isOpened():
+        ret, frame = cap.read()
+        if not ret:
+            break
+        # Only write frames at the specified interval
+        if frame_count % frame_interval == 0:
+            out.write(frame)
+            frames_written += 1
+        frame_count += 1
+    cap.release()
+    out.release()
+    # Verify the output
+    verify_cap = cv2.VideoCapture(output_path)
+    if not verify_cap.isOpened():
+        raise RuntimeError(f"Failed to create output video at {output_path}")
+    actual_fps = verify_cap.get(cv2.CAP_PROP_FPS)
+    total_frames = verify_cap.get(cv2.CAP_PROP_FRAME_COUNT)
+    verify_cap.release()
+    if actual_fps <= 0:
+        print("Warning: Output video reports invalid FPS. This might be a codec issue.")
+        actual_fps = target_fps  # Use target FPS for duration calculation
+    print(f"Created video with {frames_written} frames at {actual_fps} FPS")
+    print(f"Total duration: {total_frames/actual_fps:.2f} seconds")
+    print(f"Video saved to: {output_path}")
+    return output_path
+def evaluate_video_chunk_new(model, video_path, transcript="No transcript provided", description="No description provided",
+                             save_processed_video=None, target_fps=None, retry_limit=5):
+    """
+    Evaluate a single video chunk using a multimodal model.
+    Args:
+        model: The multimodal model to use for evaluation
+        video_path (str): Path to the video file to evaluate
+        transcript (str, optional): Video transcript text. Defaults to "No transcript provided"
+        description (str, optional): Video description text. Defaults to "No description provided"
+        save_processed_video (str, optional): Path to save processed video. If None, uses temporary file
+        target_fps (int, optional): Target frames per second for video processing. If None, no processing
+        retry_limit (int, optional): Maximum number of retry attempts. Defaults to 5
+    Returns:
+        dict: Evaluation results as a JSON object with scores converted to integers
+    Raises:
+        FileNotFoundError: If video file does not exist
+        Exception: If evaluation fails after all retry attempts
+    """
+    if not os.path.exists(video_path):
+        raise FileNotFoundError(f"Video file not found: {video_path}")
+    # Only process video if target_fps is specified
+    if target_fps is not None:
+        processed_video_path = reduce_video_framerate(video_path, target_fps=target_fps, output_path=save_processed_video)
+        video_to_use = processed_video_path
+    else:
+        video_to_use = video_path
+    prompt = _video_eval_new.format(description=description)
+    inputs = _prepare_text_video_inputs(prompt, video_to_use)
+    try:
+        for attempt in range(retry_limit):
+            try:
+                response = model(inputs)
+                response_json = extract_json(response)
+                response_json = convert_score_fields(response_json)
+                return response_json
+            except Exception as e:
+                print(f"Attempt {attempt + 1} failed: {e}")
+                if attempt + 1 == retry_limit:
+                    print("Reached maximum retry limit. Evaluation failed.")
+                    raise
+    finally:
+        # Clean up the temporary processed video if we created one
+        if target_fps is not None and save_processed_video is None and os.path.exists(processed_video_path):
+            os.unlink(processed_video_path)

evaluate.py ADDED Viewed

	@@ -0,0 +1,474 @@

+import os
+import json
+import argparse
+import tempfile
+from typing import Dict, List, Union
+from datetime import datetime
+from dotenv import load_dotenv
+from moviepy import VideoFileClip
+from mllm_tools.litellm import LiteLLMWrapper
+from mllm_tools.gemini import GeminiWrapper
+from eval_suite.utils import calculate_geometric_mean
+from eval_suite.text_utils import parse_srt_to_text, fix_transcript, evaluate_text
+from eval_suite.video_utils import evaluate_video_chunk_new
+from eval_suite.image_utils import evaluate_sampled_images
+load_dotenv()
+with open(os.path.join(os.path.dirname(os.path.abspath(__file__)), "src", "utils", "allowed_models.json")) as f:
+    ALLOWED_MODELS = json.load(f)["allowed_models"]
+def combine_results(output_folder: str, combined_file: str, results: Dict[str, Dict]) -> None:
+    """
+    Combine all evaluation results into a single file.
+    Args:
+        output_folder (str): Directory to store the combined file.
+        combined_file (str): Name of the combined file.
+        results (Dict[str, Dict]): Dictionary of evaluation results with file names as keys.
+    Returns:
+        None
+    """
+    combined_path = os.path.join(output_folder, combined_file)
+    with open(combined_path, 'w') as output_file:
+        json.dump(results, output_file, indent=4)
+def save_individual_result(output_folder: str, file_name: str, result: Dict) -> None:
+    """
+    Save individual evaluation result to a file.
+    Args:
+        output_folder (str): Directory to store the evaluation file.
+        file_name (str): Name of the file.
+        result (Dict): Evaluation result.
+    Returns:
+        None
+    """
+    current_time = datetime.now().strftime("%Y%m%d_%H%M%S")
+    result_file = f"evaluation_{file_name}_{current_time}.json"
+    os.makedirs(output_folder, exist_ok=True)
+    result_path = os.path.join(output_folder, result_file)
+    with open(result_path, 'w') as output_file:
+        json.dump(result, output_file, indent=4)
+def evaluate_text_file(model, transcript_path, retry_limit):
+    """
+    Evaluate a text file using the provided model.
+    Args:
+        model: The model to use for evaluation.
+        transcript_path (str): Path to the transcript file (.srt or .txt).
+        retry_limit (int): Number of retry attempts for evaluation.
+    Returns:
+        Dict or None: Evaluation results if successful, None if file format unsupported.
+    """
+    if not transcript_path.endswith(('.srt', '.txt')):
+        print(f"Skipping {transcript_path}: Unsupported file format for text evaluation.")
+        return None
+    if transcript_path.endswith(".srt"):
+        transcript = parse_srt_to_text(transcript_path)
+    elif transcript_path.endswith(".txt"):
+        with open(transcript_path) as f:
+            transcript = f.read().strip()
+    else:
+        raise ValueError("Unrecognized transcript file format.")
+    capital_letter_proportion = sum(1 for c in transcript if c.isupper()) / sum(1 for c in transcript if c.isalpha())
+    if capital_letter_proportion < 0.01:
+        transcript = fix_transcript(model, transcript)
+    print(f"Performing text evaluation: {os.path.basename(transcript_path)}")
+    result = evaluate_text(model, transcript, retry_limit)
+    return result
+def evaluate_video_file(model, video_path, transcript_path, description_path, target_fps=None, output_folder=None):
+    """
+    Evaluate a video file using the provided model.
+    Args:
+        model: The model to use for evaluation.
+        video_path (str): Path to the video file.
+        transcript_path (str): Path to the transcript file.
+        description_path (str): Path to the description file.
+        target_fps (int, optional): Target frames per second for video processing.
+        output_folder (str, optional): Directory to store output files.
+    Returns:
+        Dict or None: Evaluation results if successful, None if file format unsupported.
+    """
+    if not video_path.endswith(('.mp4', '.mkv')):
+        print(f"Skipping {video_path}: Unsupported file format for video evaluation.")
+        return None
+    moviepy_temp_dir = os.path.join(output_folder, "moviepy_temp")
+    # Chunking
+    num_chunks = 10
+    with VideoFileClip(video_path) as clip:
+        duration = clip.duration
+        chunk_duration = duration / num_chunks
+        results = []
+        # Create a temporary directory in the output_folder
+        temp_dir_parent = output_folder or os.getcwd()
+        with tempfile.TemporaryDirectory(dir=temp_dir_parent) as temp_dir:
+            for i in range(10):
+                start = i * chunk_duration
+                end = min(start + chunk_duration, duration)
+                chunk = clip.subclipped(start, end)
+                chunk_path = os.path.join(temp_dir, f"chunk_{i+1}.mp4")
+                # Explicitly set the temp_audiofile path with matching codec
+                temp_audiofile = os.path.join(moviepy_temp_dir, f"temp_audio_chunk_{i+1}.m4a")
+                chunk.write_videofile(
+                    chunk_path,
+                    codec="libx264",
+                    audio_codec="aac",
+                    temp_audiofile=temp_audiofile,
+                    audio_bitrate="192k",
+                    preset="ultrafast",  # Speed up encoding
+                    logger=None
+                )
+                # Create processed videos folder inside output_folder
+                processed_videos_dir = os.path.join(output_folder, "processed_videos")
+                save_path = os.path.join(processed_videos_dir, f"processed_chunk_{i+1}.mp4")
+                result = evaluate_video_chunk_new(
+                    model,
+                    chunk_path,
+                    transcript_path,
+                    description_path,
+                    target_fps=target_fps,
+                    save_processed_video=save_path
+                )
+                results.append(result)
+    score_dict = {}
+    for key in results[0]["evaluation"].keys():
+        score_dict[key] = []
+        for result in results:
+            score_dict[key].append(result["evaluation"][key]["score"])
+    evaluation = {}
+    for key, scores in score_dict.items():
+        evaluation[key] = {"score": calculate_geometric_mean(scores)}
+    result_json = {
+        "evaluation": evaluation,
+        "video_chunks": results
+    }
+    return result_json
+def extract_scores(data: Union[Dict, List]) -> List[int]:
+    """
+    Extract all score values from a nested dictionary or list structure.
+    Args:
+        data (Union[Dict, List]): The data structure to extract scores from.
+    Returns:
+        List[int]: List of extracted score values.
+    """
+    scores = []
+    if isinstance(data, dict):
+        for key, value in data.items():
+            if "chunks" in key:
+                continue
+            elif isinstance(value, dict) or isinstance(value, list):
+                scores.extend(extract_scores(value))
+            elif key == 'score':
+                scores.append(value)
+    elif isinstance(data, list):
+        for item in data:
+            scores.extend(extract_scores(item))
+    return scores
+def calculate_overall_score(result: Dict) -> float:
+    """
+    Calculate the overall score from evaluation results.
+    Args:
+        result (Dict): Dictionary containing evaluation results.
+    Returns:
+        float: The calculated overall score.
+    """
+    scores = extract_scores(result)
+    overall_score = calculate_geometric_mean(scores)
+    return overall_score
+def process_topic_name(topic_name: str) -> str:
+    """
+    Process a topic name by capitalizing words and handling special characters.
+    Args:
+        topic_name (str): The topic name to process.
+    Returns:
+        str: The processed topic name.
+    """
+    words = topic_name.replace("_s_", "'s_").split("_")
+    return " ".join([word.capitalize() for word in words])
+def merge_dicts(dict1: dict, dict2: dict) -> dict:
+    """
+    Recursively merge two dictionaries.
+    Args:
+        dict1 (dict): First dictionary.
+        dict2 (dict): Second dictionary.
+    Returns:
+        dict: Merged dictionary.
+    """
+    merged = dict1.copy()
+    for key, value in dict2.items():
+        if key in merged and isinstance(merged[key], dict) and isinstance(value, dict):
+            merged[key] = merge_dicts(merged[key], value)
+        else:
+            merged[key] = value
+    return merged
+def process_theorem(models, file_path: str, eval_type: str, retry_limit: int,
+                    target_fps: int = None, use_parent_folder_as_topic: bool = False,
+                    output_folder: str = None) -> tuple[str, dict]:
+    """
+    Process a theorem file or directory for evaluation.
+    Args:
+        models: Dictionary of models for different evaluation types.
+        file_path (str): Path to the file or directory to evaluate.
+        eval_type (str): Type of evaluation to perform.
+        retry_limit (int): Number of retry attempts.
+        target_fps (int, optional): Target frames per second for video processing.
+        use_parent_folder_as_topic (bool, optional): Use parent folder name as topic.
+        output_folder (str, optional): Directory to store output files.
+    Returns:
+        tuple[str, dict]: Tuple of file name and evaluation results.
+    """
+    ext_map = {
+        'text': ('.txt', '.srt'),
+        'video': ('.mp4', '.mkv')
+    }
+    # Handle single file evaluation
+    if os.path.isfile(file_path):
+        file_ext = os.path.splitext(file_path)[1].lower()
+        file_name = os.path.basename(file_path)
+        if eval_type == "text" and file_ext in ext_map['text']:
+            return file_name, evaluate_text_file(models['text'], file_path, retry_limit)
+        elif eval_type == "video" and file_ext in ext_map['video']:
+            if use_parent_folder_as_topic:
+                topic_name = os.path.basename(os.path.dirname(file_path))
+            else:
+                topic_name = None
+            topic_name = process_topic_name(topic_name)
+            return file_name, evaluate_video_file(models['video'], file_path, None, topic_name, target_fps, output_folder)
+        elif eval_type == "image" and file_ext in ext_map['video']:
+            if use_parent_folder_as_topic:
+                topic_name = os.path.basename(os.path.dirname(file_path))
+            else:
+                topic_name = None
+            topic_name = process_topic_name(topic_name)
+            return file_name, evaluate_sampled_images(models['image'], file_path, topic_name, num_chunks=10, output_folder=output_folder)
+        elif eval_type == "all":
+            raise ValueError("Evaluation type 'all' is not supported for a single file. Try passing a folder with both a video and a subtitle file.")
+        else:
+            raise ValueError(f"File type of {file_path} does not match evaluation type {eval_type!r}")
+    # Handle directory evaluation
+    theorem_dir = file_path
+    all_files = os.listdir(theorem_dir)
+    # Look for transcript files, prioritizing .srt over .txt if both exist
+    transcript_file_candidates = [f for f in all_files if f.endswith(ext_map['text']) and not f.endswith('_scene_outline.txt')]
+    srt_files = [f for f in transcript_file_candidates if f.endswith('.srt')]
+    txt_files = [f for f in transcript_file_candidates if f.endswith('.txt')]
+    transcript_path = None
+    if srt_files:
+        transcript_path = os.path.join(theorem_dir, srt_files[0])
+    elif txt_files:
+        transcript_path = os.path.join(theorem_dir, txt_files[0])
+    video_file_candidates = [f for f in all_files if f.endswith(ext_map['video'])]
+    video_path = os.path.join(theorem_dir, video_file_candidates[0]) if len(video_file_candidates) == 1 else None
+    topic_name = os.path.basename(theorem_dir)
+    topic_name = process_topic_name(topic_name)
+    if not video_path:
+        print(f"Skipping {theorem_dir}: No video file found")
+        return None, None
+    text_result = video_result = image_result = None
+    if eval_type == "text" or eval_type == "all":
+        if transcript_path is None:
+            print(f"Warning: No suitable transcript file found in {theorem_dir}")
+        else:
+            text_result = evaluate_text_file(models['text'], transcript_path, retry_limit)
+    if eval_type == "video" or eval_type == "all":
+        assert video_path is not None, f"Expected 1 video file, got {len(video_file_candidates)} for {theorem_dir}"
+        video_result = evaluate_video_file(models['video'], video_path, transcript_path, topic_name, target_fps, output_folder)
+    if eval_type == "image" or eval_type == "all":
+        assert video_path is not None, f"Expected 1 video file, got {len(video_file_candidates)} for {theorem_dir}"
+        image_result = evaluate_sampled_images(models['image'], video_path, topic_name, num_chunks=10, output_folder=output_folder)
+    if eval_type == "all":
+        result = {}
+        if text_result:
+            result = merge_dicts(result, text_result)
+        if video_result:
+            result = merge_dicts(result, video_result)
+        if image_result:
+            result = merge_dicts(result, image_result)
+        if result:
+            result["evaluation"]["overall_score"] = calculate_overall_score(result)
+    else:
+        result = text_result if eval_type == "text" else video_result if eval_type == "video" else image_result if eval_type == "image" else None
+    file_name = os.path.basename(theorem_dir)
+    return file_name, result
+def main():
+    """
+    Main function to run the evaluation script.
+    Parses command line arguments and orchestrates the evaluation process
+    for text, video, and image content using specified AI models.
+    """
+    parser = argparse.ArgumentParser(description='Automatic evaluation of theorem explanation videos with LLMs')
+    parser.add_argument('--model_text', type=str,
+                       choices=ALLOWED_MODELS,
+                       default='azure/gpt-4o',
+                       help='Select the AI model to use for text evaluation')
+    parser.add_argument('--model_video', type=str,
+                       choices=['gemini/gemini-1.5-pro-002',
+                                'gemini/gemini-2.0-flash-exp',
+                                'gemini/gemini-2.0-pro-exp-02-05'],
+                       default='gemini/gemini-1.5-pro-002',
+                       help='Select the AI model to use for video evaluation')
+    parser.add_argument('--model_image', type=str,
+                       choices=ALLOWED_MODELS,
+                       default='azure/gpt-4o',
+                       help='Select the AI model to use for image evaluation')
+    parser.add_argument('--eval_type', type=str, choices=['text', 'video', 'image', 'all'], default='all', help='Type of evaluation to perform')
+    parser.add_argument('--file_path', type=str, help='Path to a file or a theorem folder', required=True)
+    parser.add_argument('--output_folder', type=str, help='Directory to store the evaluation files', required=True)
+    parser.add_argument('--retry_limit', type=int, default=3, help='Number of retry attempts for each inference')
+    parser.add_argument('--combine', action='store_true', help='Combine all results into a single JSON file')
+    parser.add_argument('--bulk_evaluate', action='store_true', help='Evaluate a folder of theorems together', default=False)
+    parser.add_argument('--target_fps', type=int, help='Target FPS for video processing. If not set, original video FPS will be used', required=False)
+    parser.add_argument('--use_parent_folder_as_topic', action='store_true', help='Use parent folder name as topic name for single file evaluation', default=True)
+    parser.add_argument('--max_workers', type=int, default=4, help='Maximum number of concurrent workers for parallel processing')
+    args = parser.parse_args()
+    # Initialize separate models
+    text_model = LiteLLMWrapper(
+        model_name=args.model_text,
+        temperature=0.0,
+    )
+    video_model = GeminiWrapper(
+        model_name=args.model_video,
+        temperature=0.0,
+    )
+    image_model = LiteLLMWrapper(
+        model_name=args.model_image,
+        temperature=0.0,
+    )
+    models = {
+        'text': text_model,
+        'video': video_model,
+        'image': image_model
+    }
+    theorem_dirs = []
+    if args.bulk_evaluate:
+        assert os.path.isdir(args.file_path), "File path must be a folder for --bulk_evaluate"
+        for root, dirnames, _ in os.walk(args.file_path):
+            if not any(f.endswith(".mp4") for f in os.listdir(root)):
+                continue
+            theorem_dirs.append(root)
+    elif os.path.isdir(args.file_path):
+        assert any(f.endswith(".mp4") for f in os.listdir(args.file_path)), "The provided folder must contain a video file"
+        theorem_dirs.append(args.file_path)
+    # Create output directory and its temp subdirectories if it doesn't exist
+    os.makedirs(args.output_folder, exist_ok=True)
+    moviepy_temp_dir = os.path.join(args.output_folder, "moviepy_temp")
+    os.makedirs(moviepy_temp_dir, exist_ok=True)
+    VideoFileClip.DEFAULT_TEMP_DIR = moviepy_temp_dir
+    processed_videos_dir = os.path.join(args.output_folder, "processed_videos")
+    os.makedirs(processed_videos_dir, exist_ok=True)
+    results = {}
+    if theorem_dirs:
+        for theorem_dir in theorem_dirs:
+            file_name, result = process_theorem(
+                models,
+                theorem_dir,
+                args.eval_type,
+                args.retry_limit,
+                args.target_fps,
+                args.use_parent_folder_as_topic,
+                args.output_folder
+            )
+            if result is not None:
+                results[file_name] = result
+                if not args.combine:
+                    save_individual_result(args.output_folder, file_name, result)
+    else:
+        file_name, result = process_theorem(
+            models,
+            args.file_path,
+            args.eval_type,
+            args.retry_limit,
+            args.target_fps,
+            args.use_parent_folder_as_topic,
+            args.output_folder
+        )
+        if result is not None:
+            results[file_name] = result
+            if not args.combine:
+                save_individual_result(args.output_folder, file_name, result)
+    if args.combine:
+        if len(results) > 1:
+            current_time = datetime.now().strftime("%Y%m%d_%H%M%S")
+            combined_file = f"evaluation_{current_time}.json"
+            combine_results(args.output_folder, combined_file, results)
+            print("Combining results completed.")
+        else:
+            for file_name, result in results.items():
+                save_individual_result(args.output_folder, file_name, result)
+    os.rmdir(moviepy_temp_dir)
+if __name__ == "__main__":
+    main()

generate_video.py ADDED Viewed

	@@ -0,0 +1,954 @@

+import os
+import json
+import random
+from typing import Union, List, Dict, Optional
+import subprocess
+import argparse
+import glob
+from PIL import Image
+import re
+from dotenv import load_dotenv
+import asyncio
+import uuid # Import uuid for generating trace_id
+from mllm_tools.litellm import LiteLLMWrapper
+from mllm_tools.utils import _prepare_text_inputs # Keep _prepare_text_inputs if still used directly in main
+# Import new modules
+from src.core.video_planner import VideoPlanner
+from src.core.code_generator import CodeGenerator
+from src.core.video_renderer import VideoRenderer
+from src.utils.utils import _print_response, _extract_code, extract_xml # Import utility functions
+from src.config.config import Config # Import Config class
+# Video parsing
+from src.core.parse_video import (
+    get_images_from_video,
+    image_with_most_non_black_space
+)
+from task_generator import get_banned_reasonings
+from task_generator.prompts_raw import (_code_font_size, _code_disable, _code_limit, _prompt_manim_cheatsheet)
+# Load allowed models list from JSON file
+allowed_models_path = os.path.join(os.path.dirname(__file__), 'src', 'utils', 'allowed_models.json')
+with open(allowed_models_path, 'r') as f:
+    allowed_models = json.load(f).get("allowed_models", [])
+load_dotenv(override=True)
+class VideoGenerator:
+    """
+    A class for generating manim videos using AI models.
+    This class coordinates the video generation pipeline by managing scene planning,
+    code generation, and video rendering. It supports concurrent scene processing,
+    visual code fixing, and RAG (Retrieval Augmented Generation).
+    Args:
+        planner_model: Model used for scene planning and high-level decisions
+        scene_model: Model used specifically for scene generation (defaults to planner_model)
+        helper_model: Helper model for additional tasks (defaults to planner_model)
+        output_dir (str): Directory to store generated files and videos
+        verbose (bool): Whether to print detailed output
+        use_rag (bool): Whether to use Retrieval Augmented Generation
+        use_context_learning (bool): Whether to use context learning with example code
+        context_learning_path (str): Path to context learning examples
+        chroma_db_path (str): Path to ChromaDB for RAG
+        manim_docs_path (str): Path to Manim documentation for RAG
+        embedding_model (str): Model to use for embeddings
+        use_visual_fix_code (bool): Whether to use visual feedback for code fixing
+        use_langfuse (bool): Whether to enable Langfuse logging
+        trace_id (str, optional): Trace ID for logging
+        max_scene_concurrency (int): Maximum number of scenes to process concurrently
+    Attributes:
+        output_dir (str): Directory for output files
+        verbose (bool): Verbosity flag
+        use_visual_fix_code (bool): Visual code fixing flag
+        session_id (str): Unique session identifier
+        scene_semaphore (asyncio.Semaphore): Controls concurrent scene processing
+        banned_reasonings (list): List of banned reasoning patterns
+        planner (VideoPlanner): Handles scene planning
+        code_generator (CodeGenerator): Handles code generation
+        video_renderer (VideoRenderer): Handles video rendering
+    """
+    def __init__(self,
+                 planner_model,
+                 scene_model=None,
+                 helper_model=None,
+                 output_dir="output",
+                 verbose=False,
+                 use_rag=False,
+                 use_context_learning=False,
+                 context_learning_path="data/context_learning",
+                 chroma_db_path="data/rag/chroma_db",
+                 manim_docs_path="data/rag/manim_docs",
+                 embedding_model="azure/text-embedding-3-large",
+                 use_visual_fix_code=False,
+                 use_langfuse=True,
+                 trace_id=None,
+                 max_scene_concurrency: int = 5):
+        self.output_dir = output_dir
+        self.verbose = verbose
+        self.use_visual_fix_code = use_visual_fix_code
+        self.session_id = self._load_or_create_session_id()  # Modified to load existing or create new
+        self.scene_semaphore = asyncio.Semaphore(max_scene_concurrency)
+        self.banned_reasonings = get_banned_reasonings()
+        # Initialize separate modules
+        self.planner = VideoPlanner(
+            planner_model=planner_model,
+            helper_model=helper_model,
+            output_dir=output_dir,
+            print_response=verbose,
+            use_context_learning=use_context_learning,
+            context_learning_path=context_learning_path,
+            use_rag=use_rag,
+            session_id=self.session_id,
+            chroma_db_path=chroma_db_path,
+            manim_docs_path=manim_docs_path,
+            embedding_model=embedding_model,
+            use_langfuse=use_langfuse
+        )
+        self.code_generator = CodeGenerator(
+            scene_model=scene_model if scene_model is not None else planner_model,
+            helper_model=helper_model if helper_model is not None else planner_model,
+            output_dir=output_dir,
+            print_response=verbose,
+            use_rag=use_rag,
+            use_context_learning=use_context_learning,
+            context_learning_path=context_learning_path,
+            chroma_db_path=chroma_db_path,
+            manim_docs_path=manim_docs_path,
+            embedding_model=embedding_model,
+            use_visual_fix_code=use_visual_fix_code,
+            use_langfuse=use_langfuse,
+            session_id=self.session_id
+        )
+        self.video_renderer = VideoRenderer(
+            output_dir=output_dir,
+            print_response=verbose,
+            use_visual_fix_code=use_visual_fix_code
+        )
+    def _load_or_create_session_id(self) -> str:
+        """
+        Load existing session ID from file or create a new one.
+        Returns:
+            str: The session ID either loaded from file or newly created.
+        """
+        session_file = os.path.join(self.output_dir, "session_id.txt")
+        if os.path.exists(session_file):
+            with open(session_file, 'r') as f:
+                session_id = f.read().strip()
+                print(f"Loaded existing session ID: {session_id}")
+                return session_id
+        # Create new session ID if none exists
+        session_id = str(uuid.uuid4())
+        os.makedirs(self.output_dir, exist_ok=True)
+        with open(session_file, 'w') as f:
+            f.write(session_id)
+        print(f"Created new session ID: {session_id}")
+        return session_id
+    def _save_topic_session_id(self, topic: str, session_id: str) -> None:
+        """
+        Save session ID for a specific topic.
+        Args:
+            topic (str): The topic to save the session ID for
+            session_id (str): The session ID to save
+        """
+        file_prefix = topic.lower()
+        file_prefix = re.sub(r'[^a-z0-9_]+', '_', file_prefix)
+        topic_dir = os.path.join(self.output_dir, file_prefix)
+        os.makedirs(topic_dir, exist_ok=True)
+        session_file = os.path.join(topic_dir, "session_id.txt")
+        with open(session_file, 'w') as f:
+            f.write(session_id)
+    def _load_topic_session_id(self, topic: str) -> Optional[str]:
+        """
+        Load session ID for a specific topic if it exists.
+        Args:
+            topic (str): The topic to load the session ID for
+        Returns:
+            Optional[str]: The session ID if found, None otherwise
+        """
+        file_prefix = topic.lower()
+        file_prefix = re.sub(r'[^a-z0-9_]+', '_', file_prefix)
+        session_file = os.path.join(self.output_dir, file_prefix, "session_id.txt")
+        if os.path.exists(session_file):
+            with open(session_file, 'r') as f:
+                return f.read().strip()
+        return None
+    def generate_scene_outline(self,
+                            topic: str,
+                            description: str,
+                            session_id: str) -> str:
+        """
+        Generate scene outline using VideoPlanner.
+        Args:
+            topic (str): The topic of the video
+            description (str): Description of the video content
+            session_id (str): Session identifier for tracking
+        Returns:
+            str: Generated scene outline
+        """
+        return self.planner.generate_scene_outline(topic, description, session_id)
+    async def generate_scene_implementation(self,
+                                      topic: str,
+                                      description: str,
+                                      plan: str,
+                                      session_id: str) -> List[str]:
+        """
+        Generate scene implementations using VideoPlanner.
+        Args:
+            topic (str): The topic of the video
+            description (str): Description of the video content
+            plan (str): The scene plan to implement
+            session_id (str): Session identifier for tracking
+        Returns:
+            List[str]: List of generated scene implementations
+        """
+        return await self.planner.generate_scene_implementation(topic, description, plan, session_id)
+    async def generate_scene_implementation_concurrently(self,
+                                              topic: str,
+                                              description: str,
+                                              plan: str,
+                                              session_id: str) -> List[str]:
+        """
+        Generate scene implementations concurrently using VideoPlanner.
+        Args:
+            topic (str): The topic of the video
+            description (str): Description of the video content
+            plan (str): The scene plan to implement
+            session_id (str): Session identifier for tracking
+        Returns:
+            List[str]: List of generated scene implementations
+        """
+        return await self.planner.generate_scene_implementation_concurrently(topic, description, plan, session_id, self.scene_semaphore) # Pass semaphore
+    def load_implementation_plans(self, topic: str) -> Dict[int, Optional[str]]:
+        """
+        Load implementation plans for each scene.
+        Args:
+            topic (str): The topic to load implementation plans for
+        Returns:
+            Dict[int, Optional[str]]: Dictionary mapping scene numbers to their plans.
+                                    If a scene's plan is missing, its value will be None.
+        """
+        file_prefix = topic.lower()
+        file_prefix = re.sub(r'[^a-z0-9_]+', '_', file_prefix)
+        # Load scene outline from file
+        scene_outline_path = os.path.join(self.output_dir, file_prefix, f"{file_prefix}_scene_outline.txt")
+        if not os.path.exists(scene_outline_path):
+            return {}
+        with open(scene_outline_path, "r") as f:
+            scene_outline = f.read()
+        # Extract scene outline to get number of scenes
+        scene_outline_content = extract_xml(scene_outline)
+        scene_number = len(re.findall(r'<SCENE_(\d+)>[^<]', scene_outline_content))
+        print(f"Number of scenes: {scene_number}")
+        implementation_plans = {}
+        # Check each scene's implementation plan
+        for i in range(1, scene_number + 1):
+            plan_path = os.path.join(self.output_dir, file_prefix, f"scene{i}", f"{file_prefix}_scene{i}_implementation_plan.txt")
+            if os.path.exists(plan_path):
+                with open(plan_path, "r") as f:
+                    implementation_plans[i] = f.read()
+                print(f"Found existing implementation plan for scene {i}")
+            else:
+                implementation_plans[i] = None
+                print(f"Missing implementation plan for scene {i}")
+        return implementation_plans
+    async def render_video_fix_code(self,
+                              topic: str,
+                              description: str,
+                              scene_outline: str,
+                              implementation_plans: List,
+                              max_retries=3,
+                              session_id: str = None) -> None:
+        """
+        Render the video for all scenes with code fixing capability.
+        Args:
+            topic (str): The topic of the video
+            description (str): Description of the video content
+            scene_outline (str): The overall scene outline
+            implementation_plans (List): List of implementation plans for each scene
+            max_retries (int, optional): Maximum number of code fix attempts. Defaults to 3.
+            session_id (str, optional): Session identifier for tracking
+        """
+        file_prefix = topic.lower()
+        file_prefix = re.sub(r'[^a-z0-9_]+', '_', file_prefix)
+        # Create tasks for each scene
+        tasks = []
+        for i, implementation_plan in enumerate(implementation_plans):
+            # Try to load scene trace id, or generate new one if it doesn't exist
+            scene_dir = os.path.join(self.output_dir, file_prefix, f"scene{i+1}")
+            subplan_dir = os.path.join(scene_dir, "subplans")
+            os.makedirs(subplan_dir, exist_ok=True)  # Create directories if they don't exist
+            scene_trace_id_path = os.path.join(subplan_dir, "scene_trace_id.txt")
+            try:
+                with open(scene_trace_id_path, 'r') as f:
+                    scene_trace_id = f.read().strip()
+            except FileNotFoundError:
+                scene_trace_id = str(uuid.uuid4())
+                with open(scene_trace_id_path, 'w') as f:
+                    f.write(scene_trace_id)
+            task = self.process_scene(i, scene_outline, implementation_plan, topic, description, max_retries, file_prefix, session_id, scene_trace_id)
+            tasks.append(task)
+        # Execute all tasks concurrently
+        await asyncio.gather(*tasks)
+    async def process_scene(self, i: int, scene_outline: str, scene_implementation: str, topic: str, description: str, max_retries: int, file_prefix: str, session_id: str, scene_trace_id: str): # added scene_trace_id
+        """
+        Process a single scene using CodeGenerator and VideoRenderer.
+        Args:
+            i (int): Scene index
+            scene_outline (str): Overall scene outline
+            scene_implementation (str): Implementation plan for this scene
+            topic (str): The topic of the video
+            description (str): Description of the video content
+            max_retries (int): Maximum number of code fix attempts
+            file_prefix (str): Prefix for file naming
+            session_id (str): Session identifier for tracking
+            scene_trace_id (str): Trace identifier for this scene
+        """
+        curr_scene = i + 1
+        curr_version = 0
+        # scene_trace_id = str(uuid.uuid4()) # Remove uuid generation
+        rag_queries_cache = {}  # Initialize RAG queries cache
+        # Create necessary directories
+        code_dir = os.path.join(self.output_dir, file_prefix, f"scene{curr_scene}", "code")
+        os.makedirs(code_dir, exist_ok=True)
+        media_dir = os.path.join(self.output_dir, file_prefix, "media") # Define media_dir here
+        async with self.scene_semaphore:
+            # Step 3A: Generate initial manim code
+            code, log = self.code_generator.generate_manim_code(
+                topic=topic,
+                description=description,
+                scene_outline=scene_outline,
+                scene_implementation=scene_implementation,
+                scene_number=curr_scene,
+                additional_context=[_prompt_manim_cheatsheet, _code_font_size, _code_limit, _code_disable],
+                scene_trace_id=scene_trace_id, # Use passed scene_trace_id
+                session_id=session_id,
+                rag_queries_cache=rag_queries_cache  # Pass the cache
+            )
+            # Save initial code and log (file operations can be offloaded if needed)
+            with open(os.path.join(code_dir, f"{file_prefix}_scene{curr_scene}_v{curr_version}_init_log.txt"), "w") as f:
+                f.write(log)
+            with open(os.path.join(code_dir, f"{file_prefix}_scene{curr_scene}_v{curr_version}.py"), "w") as f:
+                f.write(code)
+            print(f"Code saved to {code_dir}/{file_prefix}_scene{curr_scene}_v{curr_version}.py")
+            # Step 3B: Compile and fix code if needed
+            error_message = None
+            while True: # Retry loop controlled by break statements
+                code, error_message = await self.video_renderer.render_scene(
+                    code=code,
+                    file_prefix=file_prefix,
+                    curr_scene=curr_scene,
+                    curr_version=curr_version,
+                    code_dir=code_dir,
+                    media_dir=media_dir,
+                    max_retries=max_retries, # Pass max_retries here if needed in render_scene
+                    use_visual_fix_code=self.use_visual_fix_code,
+                    visual_self_reflection_func=self.code_generator.visual_self_reflection, # Pass visual_self_reflection function
+                    banned_reasonings=self.banned_reasonings, # Pass banned reasonings
+                    scene_trace_id=scene_trace_id,
+                    topic=topic,
+                    session_id=session_id
+                )
+                if error_message is None: # Render success if error_message is None
+                    break
+                if curr_version >= max_retries: # Max retries reached
+                    print(f"Max retries reached for scene {curr_scene}, error: {error_message}")
+                    break # Exit retry loop
+                curr_version += 1
+                # if program runs this, it means that the code is not rendered successfully
+                code, log = self.code_generator.fix_code_errors(
+                    implementation_plan=scene_implementation,
+                    code=code,
+                    error=error_message,
+                    scene_trace_id=scene_trace_id,
+                    topic=topic,
+                    scene_number=curr_scene,
+                    session_id=session_id,
+                    rag_queries_cache=rag_queries_cache
+                )
+                with open(os.path.join(code_dir, f"{file_prefix}_scene{curr_scene}_v{curr_version}_fix_log.txt"), "w") as f:
+                    f.write(log)
+                with open(os.path.join(code_dir, f"{file_prefix}_scene{curr_scene}_v{curr_version}.py"), "w") as f:
+                    f.write(code)
+                print(f"Code saved to {code_dir}/{file_prefix}_scene{curr_scene}_v{curr_version}.py")
+    def run_manim_process(self,
+                          topic: str):
+        """
+        Run manim on all generated manim code for a specific topic using VideoRenderer.
+        Args:
+            topic (str): The topic to render videos for
+        """
+        return self.video_renderer.run_manim_process(topic)
+    def create_snapshot_scene(self, topic: str, scene_number: int, version_number: int, return_type: str = "image"):
+        """
+        Create a snapshot of the video for a specific topic and scene using VideoRenderer.
+        Args:
+            topic (str): The topic of the video
+            scene_number (int): Scene number to snapshot
+            version_number (int): Version number to snapshot
+            return_type (str, optional): Type of snapshot to return. Defaults to "image".
+        Returns:
+            The snapshot in the specified format
+        """
+        return self.video_renderer.create_snapshot_scene(topic, scene_number, version_number, return_type)
+    def combine_videos(self, topic: str):
+        """
+        Combine all videos and subtitle files for a specific topic using VideoRenderer.
+        Args:
+            topic (str): The topic to combine videos for
+        """
+        self.video_renderer.combine_videos(topic)
+    async def _generate_scene_implementation_single(self, topic: str, description: str, scene_outline_i: str, i: int, file_prefix: str, session_id: str, scene_trace_id: str) -> str:
+        """
+        Generate detailed implementation plan for a single scene using VideoPlanner.
+        Args:
+            topic (str): The topic of the video
+            description (str): Description of the video content
+            scene_outline_i (str): Outline for this specific scene
+            i (int): Scene index
+            file_prefix (str): Prefix for file naming
+            session_id (str): Session identifier for tracking
+            scene_trace_id (str): Trace identifier for this scene
+        Returns:
+            str: Generated implementation plan
+        """
+        return await self.planner._generate_scene_implementation_single(topic, description, scene_outline_i, i, file_prefix, session_id, scene_trace_id)
+    async def generate_video_pipeline(self, topic: str, description: str, max_retries: int, only_plan: bool = False, specific_scenes: List[int] = None):
+        """
+        Modified pipeline to handle partial scene completions and option to only generate plans for specific scenes.
+        Args:
+            topic (str): The topic of the video
+            description (str): Description of the video content
+            max_retries (int): Maximum number of code fix attempts
+            only_plan (bool, optional): Whether to only generate plans without rendering. Defaults to False.
+            specific_scenes (List[int], optional): List of specific scenes to process. Defaults to None.
+        """
+        session_id = self._load_or_create_session_id()
+        self._save_topic_session_id(topic, session_id)
+        file_prefix = topic.lower()
+        file_prefix = re.sub(r'[^a-z0-9_]+', '_', file_prefix)
+        # Load or generate scene outline
+        scene_outline_path = os.path.join(self.output_dir, file_prefix, f"{file_prefix}_scene_outline.txt")
+        if os.path.exists(scene_outline_path):
+            with open(scene_outline_path, "r") as f:
+                scene_outline = f.read()
+            print(f"Loaded existing scene outline for topic: {topic}")
+            if self.planner.use_rag:
+                self.planner.relevant_plugins = self.planner.rag_integration.detect_relevant_plugins(topic, description) or []
+                self.planner.rag_integration.set_relevant_plugins(self.planner.relevant_plugins)
+                print(f"Detected relevant plugins: {self.planner.relevant_plugins}")
+        else:
+            print(f"Generating new scene outline for topic: {topic}")
+            scene_outline = self.planner.generate_scene_outline(topic, description, session_id)
+            os.makedirs(os.path.join(self.output_dir, file_prefix), exist_ok=True)
+            with open(scene_outline_path, "w") as f:
+                f.write(scene_outline)
+        # Load or generate implementation plans
+        implementation_plans_dict = self.load_implementation_plans(topic)
+        if not implementation_plans_dict:
+            scene_outline_content = extract_xml(scene_outline)
+            scene_numbers = len(re.findall(r'<SCENE_(\d+)>[^<]', scene_outline_content))
+            implementation_plans_dict = {i: None for i in range(1, scene_numbers + 1)}
+        # Generate missing implementation plans for specified scenes or all missing scenes
+        missing_scenes = []
+        for scene_num, plan in implementation_plans_dict.items():
+            if plan is None and (specific_scenes is None or scene_num in specific_scenes):
+                missing_scenes.append(scene_num)
+        if missing_scenes:
+            print(f"Generating implementation plans for missing scenes: {missing_scenes}")
+            for scene_num in missing_scenes:
+                scene_outline_content = extract_xml(scene_outline)
+                scene_match = re.search(f'<SCENE_{scene_num}>(.*?)</SCENE_{scene_num}>', scene_outline_content, re.DOTALL)
+                if scene_match:
+                    scene_outline_i = scene_match.group(1)
+                    scene_trace_id = str(uuid.uuid4())
+                    implementation_plan = await self._generate_scene_implementation_single(
+                        topic, description, scene_outline_i, scene_num, file_prefix, session_id, scene_trace_id)
+                    implementation_plans_dict[scene_num] = implementation_plan
+        if only_plan:
+            print(f"Only generating plans - skipping code generation and video rendering for topic: {topic}")
+            return
+        # Convert dictionary to list maintaining scene order
+        sorted_scene_numbers = sorted(implementation_plans_dict.keys())
+        implementation_plans = [implementation_plans_dict[i] for i in sorted_scene_numbers]
+        # Render scenes
+        print(f"Starting video rendering for topic: {topic}")
+        # Check which scenes need processing
+        scenes_to_process = []
+        for i, implementation_plan in enumerate(implementation_plans):
+            scene_dir = os.path.join(self.output_dir, file_prefix, f"scene{i+1}")
+            code_dir = os.path.join(scene_dir, "code")
+            # Check if scene has any code files
+            has_code = False
+            if os.path.exists(code_dir):
+                if any(f.endswith('.py') for f in os.listdir(code_dir)):
+                    has_code = True
+            # For only_render mode, only process scenes without code
+            if args.only_render:
+                if not has_code:
+                    scenes_to_process.append((i+1, implementation_plan))
+                    print(f"Scene {i+1} has no code, will process")
+                else:
+                    print(f"Scene {i+1} already has code, skipping")
+            # For normal mode, process scenes that haven't been successfully rendered
+            elif not os.path.exists(os.path.join(scene_dir, "succ_rendered.txt")):
+                scenes_to_process.append((i+1, implementation_plan))
+        if not scenes_to_process:
+            print(f"No scenes need processing for topic '{topic}'.")
+        else:
+            print(f"Rendering {len(scenes_to_process)} scenes that need processing...")
+            # Create a list of tuples with scene numbers and plans
+            scene_plans = [(scene_num, plan) for scene_num, plan in scenes_to_process]
+            # Sort by scene number to ensure correct order
+            scene_plans.sort(key=lambda x: x[0])
+            # Extract just the plans in the correct order
+            filtered_implementation_plans = [plan for _, plan in scene_plans]
+            await self.render_video_fix_code(topic, description, scene_outline, filtered_implementation_plans,
+                                           max_retries=max_retries, session_id=session_id)
+        if not args.only_render:  # Skip video combination in only_render mode
+            print(f"Video rendering completed for topic '{topic}'.")
+    def check_theorem_status(self, theorem: Dict) -> Dict[str, bool]:
+        """
+        Check if a theorem has its plan, code files, and rendered videos with detailed scene status.
+        Args:
+            theorem (Dict): Dictionary containing theorem information
+        Returns:
+            Dict[str, bool]: Dictionary containing status information for the theorem
+        """
+        topic = theorem['theorem']
+        file_prefix = topic.lower()
+        file_prefix = re.sub(r'[^a-z0-9_]+', '_', file_prefix)
+        # Check scene outline
+        scene_outline_path = os.path.join(self.output_dir, file_prefix, f"{file_prefix}_scene_outline.txt")
+        has_scene_outline = os.path.exists(scene_outline_path)
+        # Get number of scenes if outline exists
+        num_scenes = 0
+        if has_scene_outline:
+            with open(scene_outline_path, "r") as f:
+                scene_outline = f.read()
+            scene_outline_content = extract_xml(scene_outline)
+            num_scenes = len(re.findall(r'<SCENE_(\d+)>[^<]', scene_outline_content))
+        # Check implementation plans, code files, and rendered videos
+        implementation_plans = 0
+        code_files = 0
+        rendered_scenes = 0
+        # Track status of individual scenes
+        scene_status = []
+        for i in range(1, num_scenes + 1):
+            scene_dir = os.path.join(self.output_dir, file_prefix, f"scene{i}")
+            # Check implementation plan
+            plan_path = os.path.join(scene_dir, f"{file_prefix}_scene{i}_implementation_plan.txt")
+            has_plan = os.path.exists(plan_path)
+            if has_plan:
+                implementation_plans += 1
+            # Check code files
+            code_dir = os.path.join(scene_dir, "code")
+            has_code = False
+            if os.path.exists(code_dir):
+                if any(f.endswith('.py') for f in os.listdir(code_dir)):
+                    has_code = True
+                    code_files += 1
+            # Check rendered scene video
+            has_render = False
+            if os.path.exists(scene_dir):
+                succ_rendered_path = os.path.join(scene_dir, "succ_rendered.txt")
+                if os.path.exists(succ_rendered_path):
+                    has_render = True
+                    rendered_scenes += 1
+            scene_status.append({
+                'scene_number': i,
+                'has_plan': has_plan,
+                'has_code': has_code,
+                'has_render': has_render
+            })
+        # Check combined video
+        combined_video_path = os.path.join(self.output_dir, file_prefix, f"{file_prefix}_combined.mp4")
+        has_combined_video = os.path.exists(combined_video_path)
+        return {
+            'topic': topic,
+            'has_scene_outline': has_scene_outline,
+            'total_scenes': num_scenes,
+            'implementation_plans': implementation_plans,
+            'code_files': code_files,
+            'rendered_scenes': rendered_scenes,
+            'has_combined_video': has_combined_video,
+            'scene_status': scene_status
+        }
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description='Generate Manim videos using AI')
+    parser.add_argument('--model', type=str, choices=allowed_models,
+                      default='gemini/gemini-1.5-pro-002', help='Select the AI model to use')
+    parser.add_argument('--topic', type=str, default=None, help='Topic to generate videos for')
+    parser.add_argument('--context', type=str, default=None, help='Context of the topic')
+    parser.add_argument('--helper_model', type=str, choices=allowed_models,
+                      default=None, help='Select the helper model to use')
+    parser.add_argument('--only_gen_vid', action='store_true', help='Only generate videos to existing plans')
+    parser.add_argument('--only_combine', action='store_true', help='Only combine videos')
+    parser.add_argument('--peek_existing_videos', '--peek', action='store_true', help='Peek at existing videos')
+    parser.add_argument('--output_dir', type=str, default=Config.OUTPUT_DIR, help='Output directory') # Use Config
+    parser.add_argument('--theorems_path', type=str, default=None, help='Path to theorems json file')
+    parser.add_argument('--sample_size', '--sample', type=int, default=None, help='Number of theorems to sample')
+    parser.add_argument('--verbose', action='store_true', help='Print verbose output')
+    parser.add_argument('--max_retries', type=int, default=5, help='Maximum number of retries for code generation')
+    parser.add_argument('--use_rag', '--rag', action='store_true', help='Use Retrieval Augmented Generation')
+    parser.add_argument('--use_visual_fix_code','--visual_fix_code', action='store_true', help='Use VLM to fix code with rendered visuals')
+    parser.add_argument('--chroma_db_path', type=str, default=Config.CHROMA_DB_PATH, help="Path to Chroma DB") # Use Config
+    parser.add_argument('--manim_docs_path', type=str, default=Config.MANIM_DOCS_PATH, help="Path to manim docs") # Use Config
+    parser.add_argument('--embedding_model', type=str,
+                       default=Config.EMBEDDING_MODEL, # Use Config
+                       choices=["azure/text-embedding-3-large", "vertex_ai/text-embedding-005"],
+                       help='Select the embedding model to use')
+    parser.add_argument('--use_context_learning', action='store_true',
+                       help='Use context learning with example Manim code')
+    parser.add_argument('--context_learning_path', type=str,
+                       default=Config.CONTEXT_LEARNING_PATH, # Use Config
+                       help='Path to context learning examples')
+    parser.add_argument('--use_langfuse', action='store_true',
+                       help='Enable Langfuse logging')
+    parser.add_argument('--max_scene_concurrency', type=int, default=1, help='Maximum number of scenes to process concurrently')
+    parser.add_argument('--max_topic_concurrency', type=int, default=1,
+                       help='Maximum number of topics to process concurrently')
+    parser.add_argument('--debug_combine_topic', type=str, help='Debug combine videos', default=None)
+    parser.add_argument('--only_plan', action='store_true', help='Only generate scene outline and implementation plans')
+    parser.add_argument('--check_status', action='store_true',
+                       help='Check planning and code status for all theorems')
+    parser.add_argument('--only_render', action='store_true', help='Only render scenes without combining videos')
+    parser.add_argument('--scenes', nargs='+', type=int, help='Specific scenes to process (if theorems_path is provided)')
+    args = parser.parse_args()
+    # Initialize planner model using LiteLLM
+    if args.verbose:
+        verbose = True
+    else:
+        verbose = False
+    planner_model = LiteLLMWrapper(
+        model_name=args.model,
+        temperature=0.7,
+        print_cost=True,
+        verbose=verbose,
+        use_langfuse=args.use_langfuse
+    )
+    helper_model = LiteLLMWrapper(
+        model_name=args.helper_model if args.helper_model else args.model, # Use helper_model if provided, else planner_model
+        temperature=0.7,
+        print_cost=True,
+        verbose=verbose,
+        use_langfuse=args.use_langfuse
+    )
+    scene_model = LiteLLMWrapper( # Initialize scene_model separately
+        model_name=args.model,
+        temperature=0.7,
+        print_cost=True,
+        verbose=verbose,
+        use_langfuse=args.use_langfuse
+    )
+    print(f"Planner model: {args.model}, Helper model: {args.helper_model if args.helper_model else args.model}, Scene model: {args.model}") # Print all models
+    if args.theorems_path:
+        # Load the sample theorems
+        with open(args.theorems_path, "r") as f:
+            theorems = json.load(f)
+        if args.sample_size:
+            theorems = theorems[:args.sample_size]
+        if args.peek_existing_videos:
+            print(f"Here's the results of checking whether videos are rendered successfully in {args.output_dir}:")
+            # in output_dir, find all combined.mp4 files and print number of successful rendered videos out of total number of folders
+            successful_rendered_videos = 0
+            total_folders = 0
+            for item in os.listdir(args.output_dir):
+                if os.path.isdir(os.path.join(args.output_dir, item)):
+                    total_folders += 1
+                    if os.path.exists(os.path.join(args.output_dir, item, f"{item}_combined.mp4")):
+                        successful_rendered_videos += 1
+            print(f"Number of successful rendered videos: {successful_rendered_videos}/{total_folders}")
+            # also check whether any succ_rendered.txt in scene{i} folder, and then add up the number of successful rendered videos
+            successful_rendered_videos = 0
+            total_scenes = 0
+            for item in os.listdir(args.output_dir):
+                if os.path.isdir(os.path.join(args.output_dir, item)):
+                    for scene_folder in os.listdir(os.path.join(args.output_dir, item)):
+                        if "scene" in scene_folder and os.path.isdir(os.path.join(args.output_dir, item, scene_folder)):
+                            total_scenes += 1
+                            if os.path.exists(os.path.join(args.output_dir, item, scene_folder, "succ_rendered.txt")):
+                                successful_rendered_videos += 1
+            print(f"Number of successful rendered scenes: {successful_rendered_videos}/{total_scenes}")
+            exit()
+        video_generator = VideoGenerator(
+            planner_model=planner_model,
+            scene_model=scene_model, # Pass scene_model
+            helper_model=helper_model, # Pass helper_model
+            output_dir=args.output_dir,
+            verbose=args.verbose,
+            use_rag=args.use_rag,
+            use_context_learning=args.use_context_learning,
+            context_learning_path=args.context_learning_path,
+            chroma_db_path=args.chroma_db_path,
+            manim_docs_path=args.manim_docs_path,
+            embedding_model=args.embedding_model,
+            use_visual_fix_code=args.use_visual_fix_code,
+            use_langfuse=args.use_langfuse,
+            max_scene_concurrency=args.max_scene_concurrency
+        )
+        if args.debug_combine_topic is not None:
+            video_generator.combine_videos(args.debug_combine_topic)
+            exit()
+        if args.only_gen_vid:
+            # Generate videos for existing plans
+            print("Generating videos for existing plans...")
+            async def process_theorem(theorem, topic_semaphore):
+                async with topic_semaphore:
+                    topic = theorem['theorem']
+                    print(f"Processing topic: {topic}")
+                    await video_generator.render_video_fix_code(topic, theorem['description'], max_retries=args.max_retries)
+            async def main():
+                # Use the command-line argument for topic concurrency
+                topic_semaphore = asyncio.Semaphore(args.max_topic_concurrency)
+                tasks = [process_theorem(theorem, topic_semaphore) for theorem in theorems]
+                await asyncio.gather(*tasks)
+            asyncio.run(main())
+        elif args.check_status:
+            print("\nChecking theorem status...")
+            video_generator = VideoGenerator(
+                planner_model=planner_model,
+                scene_model=scene_model,
+                helper_model=helper_model,
+                output_dir=args.output_dir,
+                verbose=args.verbose,
+                use_rag=args.use_rag,
+                use_context_learning=args.use_context_learning,
+                context_learning_path=args.context_learning_path,
+                chroma_db_path=args.chroma_db_path,
+                manim_docs_path=args.manim_docs_path,
+                embedding_model=args.embedding_model,
+                use_visual_fix_code=args.use_visual_fix_code,
+                use_langfuse=args.use_langfuse,
+                max_scene_concurrency=args.max_scene_concurrency
+            )
+            all_statuses = [video_generator.check_theorem_status(theorem) for theorem in theorems]
+            # Print combined status table
+            print("\nTheorem Status:")
+            print("-" * 160)
+            print(f"{'Topic':<40} {'Outline':<8} {'Total':<8} {'Status (Plan/Code/Render)':<50} {'Combined':<10} {'Missing Components':<40}")
+            print("-" * 160)
+            for status in all_statuses:
+                # Create status string showing plan/code/render completion for each scene
+                scene_status_str = ""
+                for scene in status['scene_status']:
+                    scene_str = (
+                        ("P" if scene['has_plan'] else "-") +
+                        ("C" if scene['has_code'] else "-") +
+                        ("R" if scene['has_render'] else "-") + " "
+                    )
+                    scene_status_str += scene_str
+                # Collect missing components
+                missing_plans = []
+                missing_code = []
+                missing_renders = []
+                for scene in status['scene_status']:
+                    if not scene['has_plan']:
+                        missing_plans.append(str(scene['scene_number']))
+                    if not scene['has_code']:
+                        missing_code.append(str(scene['scene_number']))
+                    if not scene['has_render']:
+                        missing_renders.append(str(scene['scene_number']))
+                # Format missing components string
+                missing_str = []
+                if missing_plans:
+                    missing_str.append(f"P:{','.join(missing_plans)}")
+                if missing_code:
+                    missing_str.append(f"C:{','.join(missing_code)}")
+                if missing_renders:
+                    missing_str.append(f"R:{','.join(missing_renders)}")
+                missing_str = ' '.join(missing_str)
+                print(f"{status['topic'][:37]+'...' if len(status['topic'])>37 else status['topic']:<40} "
+                    f"{'✓' if status['has_scene_outline'] else '✗':<8} "
+                    f"{status['total_scenes']:<8} "
+                    f"{scene_status_str[:47]+'...' if len(scene_status_str)>47 else scene_status_str:<50} "
+                    f"{'✓' if status['has_combined_video'] else '✗':<10} "
+                    f"{missing_str[:37]+'...' if len(missing_str)>37 else missing_str:<40}")
+            # Print summary
+            print("\nSummary:")
+            print(f"Total theorems: {len(theorems)}")
+            print(f"Total scenes: {sum(status['total_scenes'] for status in all_statuses)}")
+            print(f"Scene completion status:")
+            print(f"  Plans: {sum(status['implementation_plans'] for status in all_statuses)} scenes")
+            print(f"  Code: {sum(status['code_files'] for status in all_statuses)} scenes")
+            print(f"  Renders: {sum(status['rendered_scenes'] for status in all_statuses)} scenes")
+            print(f"Combined videos: {sum(1 for status in all_statuses if status['has_combined_video'])}/{len(theorems)}")
+            exit()
+        else:
+            # Generate video pipeline from scratch
+            print("Generating video pipeline from scratch...")
+            async def process_theorem(theorem, topic_semaphore):
+                async with topic_semaphore:
+                    topic = theorem['theorem']
+                    description = theorem['description']
+                    print(f"Processing topic: {topic}")
+                    if args.only_combine:
+                        video_generator.combine_videos(topic)
+                    else:
+                        await video_generator.generate_video_pipeline(
+                            topic,
+                            description,
+                            max_retries=args.max_retries,
+                            only_plan=args.only_plan,
+                            specific_scenes=args.scenes
+                        )
+                        if not args.only_plan and not args.only_render:  # Add condition for only_render
+                            video_generator.combine_videos(topic)
+            async def main():
+                # Use the command-line argument for topic concurrency
+                topic_semaphore = asyncio.Semaphore(args.max_topic_concurrency)
+                tasks = [process_theorem(theorem, topic_semaphore) for theorem in theorems]
+                await asyncio.gather(*tasks)
+            asyncio.run(main())
+    elif args.topic and args.context:
+        video_generator = VideoGenerator(
+            planner_model=planner_model,
+            scene_model=scene_model, # Pass scene_model
+            helper_model=helper_model, # Pass helper_model
+            output_dir=args.output_dir,
+            verbose=args.verbose,
+            use_rag=args.use_rag,
+            use_context_learning=args.use_context_learning,
+            context_learning_path=args.context_learning_path,
+            chroma_db_path=args.chroma_db_path,
+            manim_docs_path=args.manim_docs_path,
+            embedding_model=args.embedding_model,
+            use_visual_fix_code=args.use_visual_fix_code,
+            use_langfuse=args.use_langfuse,
+            max_scene_concurrency=args.max_scene_concurrency
+        )
+        # Process single topic with context
+        print(f"Processing topic: {args.topic}")
+        if args.only_gen_vid:
+            video_generator.render_video_fix_code(args.topic, args.context, max_retries=args.max_retries)
+            exit()
+        if args.only_combine:
+            video_generator.combine_videos(args.topic)
+        else:
+            asyncio.run(video_generator.generate_video_pipeline(
+                args.topic,
+                args.context,
+                max_retries=args.max_retries,
+                only_plan=args.only_plan,
+            ))
+            if not args.only_plan and not args.only_render:
+                video_generator.combine_videos(args.topic)
+    else:
+        print("Please provide either (--theorems_path) or (--topic and --context)")
+        exit()

mllm_tools/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ # Empty file to make this directory a Python package

mllm_tools/gemini.py ADDED Viewed

	@@ -0,0 +1,176 @@

+from typing import List, Dict, Any, Union, Optional
+import io
+import os
+import base64
+from PIL import Image
+import mimetypes
+import google.generativeai as genai
+import tempfile
+import time
+from urllib.parse import urlparse
+import requests
+from io import BytesIO
+class GeminiWrapper:
+    """Wrapper for Gemini to support multiple models and logging"""
+    def __init__(
+        self,
+        model_name: str = "gemini-1.5-pro-002",
+        temperature: float = 0.7,
+        print_cost: bool = False,
+        verbose: bool = False,
+        use_langfuse: bool = False
+    ):
+        """
+        Initialize the Gemini wrapper
+        Args:
+            model_name: Name of the model to use
+            temperature: Temperature for completion
+            print_cost: Whether to print the cost of the completion
+            verbose: Whether to print verbose output
+            use_langfuse: Whether to enable Langfuse logging
+        """
+        self.model_name = model_name.split('/')[-1] if '/' in model_name else model_name
+        self.temperature = temperature
+        self.print_cost = print_cost
+        self.verbose = verbose
+        self.accumulated_cost = 0
+        api_key = os.getenv("GEMINI_API_KEY") or os.getenv("GOOGLE_API_KEY")
+        if not api_key:
+            raise ValueError("No API_KEY found. Please set the `GEMINI_API_KEY` or `GOOGLE_API_KEY` environment variable.")
+        genai.configure(api_key=api_key)
+        generation_config = {
+            "temperature": self.temperature,
+            "top_p": 0.95,
+            "response_mime_type": "text/plain",
+        }
+        safety_settings = [
+            {"category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_NONE"},
+            {"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_NONE"},
+            {"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_NONE"},
+            {"category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_NONE"},
+        ]
+        self.model = genai.GenerativeModel(
+            model_name=self.model_name,
+            safety_settings=safety_settings,
+            generation_config=generation_config,
+        )
+    def _get_mime_type(self, file_path: str) -> str:
+        """
+        Get the MIME type of a file based on its extension
+        Args:
+            file_path: Path to the file
+        Returns:
+            MIME type as a string (e.g., "image/jpeg", "audio/mp3")
+        """
+        mime_type, _ = mimetypes.guess_type(file_path)
+        if mime_type is None:
+            raise ValueError(f"Unsupported file type: {file_path}")
+        return mime_type
+    def _download_file(self, url: str) -> str:
+        """
+        Download a file from a URL and save it as a temporary file
+        Args:
+            url: URL of the file to download
+        Returns:
+            Path to the temporary file
+        """
+        response = requests.get(url)
+        if response.status_code == 200:
+            temp_file = tempfile.NamedTemporaryFile(delete=False)
+            temp_file.write(response.content)
+            temp_file.close()
+            return temp_file.name
+        else:
+            raise ValueError(f"Failed to download file from URL: {url}")
+    def _save_image_to_temp(self, image: Image.Image) -> str:
+        """
+        Save a PIL Image to a temporary file
+        Args:
+            image: PIL Image object
+        Returns:
+            Path to the temporary file
+        """
+        temp_file = tempfile.NamedTemporaryFile(suffix=".png", delete=False)
+        image.save(temp_file, format="PNG")
+        temp_file.close()
+        return temp_file.name
+    def _upload_to_gemini(self, file_path: str, mime_type: Optional[str] = None):
+        """
+        Uploads the given file to Gemini.
+        Args:
+            file_path: Path to the file
+            mime_type: MIME type of the file
+        Returns:
+            Uploaded file object
+        """
+        return genai.upload_file(file_path, mime_type=mime_type)
+    def __call__(self, messages: List[Dict[str, Any]], metadata: Optional[Dict[str, Any]] = None) -> str:
+        """
+        Process messages and return completion
+        Args:
+            messages: List of message dictionaries with 'type' and 'content' keys
+            metadata: Optional metadata to pass to Gemini completion
+        Returns:
+            Generated text response
+        """
+        contents = []
+        for msg in messages:
+            if msg["type"] == "text":
+                contents.append(msg["content"])
+            elif msg["type"] in ["image", "audio", "video"]:
+                if isinstance(msg["content"], Image.Image):
+                    file_path = self._save_image_to_temp(msg["content"])
+                    mime_type = "image/png"
+                elif isinstance(msg["content"], str):
+                    if msg["content"].startswith("http"):
+                        file_path = self._download_file(msg["content"])
+                        mime_type = self._get_mime_type(msg["content"])
+                    else:
+                        file_path = msg["content"]
+                        mime_type = self._get_mime_type(file_path)
+                else:
+                    raise ValueError("Unsupported content type")
+                uploaded_file = self._upload_to_gemini(file_path, mime_type)
+                while uploaded_file.state.name == "PROCESSING":
+                    print('.', end='')
+                    time.sleep(3)
+                    uploaded_file = genai.get_file(uploaded_file.name)
+                if uploaded_file.state.name == "FAILED":
+                    raise ValueError(uploaded_file.state.name)
+                print("Upload successfully")
+                contents.append(uploaded_file)
+            else:
+                raise ValueError("Unsupported message type")
+        response = self.model.generate_content(contents, request_options={"timeout": 600})
+        try:
+            return response.text
+        except Exception as e:
+            print(e)
+            print(response.prompt_feedback)
+            return str(response.prompt_feedback)
+if __name__ == "__main__":
+    pass

mllm_tools/litellm.py ADDED Viewed

	@@ -0,0 +1,193 @@

+import json
+import re
+from typing import List, Dict, Any, Union, Optional
+import io
+import os
+import base64
+from PIL import Image
+import mimetypes
+import litellm
+from litellm import completion, completion_cost
+from dotenv import load_dotenv
+load_dotenv()
+class LiteLLMWrapper:
+    """Wrapper for LiteLLM to support multiple models and logging"""
+    def __init__(
+        self,
+        model_name: str = "gpt-4-vision-preview",
+        temperature: float = 0.7,
+        print_cost: bool = False,
+        verbose: bool = False,
+        use_langfuse: bool = True,
+    ):
+        """
+        Initialize the LiteLLM wrapper
+        Args:
+            model_name: Name of the model to use (e.g. "azure/gpt-4", "vertex_ai/gemini-pro")
+            temperature: Temperature for completion
+            print_cost: Whether to print the cost of the completion
+            verbose: Whether to print verbose output
+            use_langfuse: Whether to enable Langfuse logging
+        """
+        self.model_name = model_name
+        self.temperature = temperature
+        self.print_cost = print_cost
+        self.verbose = verbose
+        self.accumulated_cost = 0
+        if self.verbose:
+            os.environ['LITELLM_LOG'] = 'DEBUG'
+        # Set langfuse callback only if enabled
+        if use_langfuse:
+            litellm.success_callback = ["langfuse"]
+            litellm.failure_callback = ["langfuse"]
+    def _encode_file(self, file_path: Union[str, Image.Image]) -> str:
+        """
+        Encode local file or PIL Image to base64 string
+        Args:
+            file_path: Path to local file or PIL Image object
+        Returns:
+            Base64 encoded file string
+        """
+        if isinstance(file_path, Image.Image):
+            buffered = io.BytesIO()
+            file_path.save(buffered, format="PNG")
+            return base64.b64encode(buffered.getvalue()).decode("utf-8")
+        else:
+            with open(file_path, "rb") as file:
+                return base64.b64encode(file.read()).decode("utf-8")
+    def _get_mime_type(self, file_path: str) -> str:
+        """
+        Get the MIME type of a file based on its extension
+        Args:
+            file_path: Path to the file
+        Returns:
+            MIME type as a string (e.g., "image/jpeg", "audio/mp3")
+        """
+        mime_type, _ = mimetypes.guess_type(file_path)
+        if mime_type is None:
+            raise ValueError(f"Unsupported file type: {file_path}")
+        return mime_type
+    def __call__(self, messages: List[Dict[str, Any]], metadata: Optional[Dict[str, Any]] = None) -> str:
+        """
+        Process messages and return completion
+        Args:
+            messages: List of message dictionaries with 'type' and 'content' keys
+            metadata: Optional metadata to pass to litellm completion, e.g. for Langfuse tracking
+        Returns:
+            Generated text response
+        """
+        if metadata is None:
+            print("No metadata provided, using empty metadata")
+            metadata = {}
+        metadata["trace_name"] = f"litellm-completion-{self.model_name}"
+        # Convert messages to LiteLLM format
+        formatted_messages = []
+        for msg in messages:
+            if msg["type"] == "text":
+                formatted_messages.append({
+                    "role": "user",
+                    "content": [{"type": "text", "text": msg["content"]}]
+                })
+            elif msg["type"] in ["image", "audio", "video"]:
+                # Check if content is a local file path or PIL Image
+                if isinstance(msg["content"], Image.Image) or os.path.isfile(msg["content"]):
+                    try:
+                        if isinstance(msg["content"], Image.Image):
+                            mime_type = "image/png"
+                        else:
+                            mime_type = self._get_mime_type(msg["content"])
+                        base64_data = self._encode_file(msg["content"])
+                        data_url = f"data:{mime_type};base64,{base64_data}"
+                    except ValueError as e:
+                        print(f"Error processing file {msg['content']}: {e}")
+                        continue
+                else:
+                    data_url = msg["content"]
+                # Append the formatted message based on the model
+                if "gemini" in self.model_name:
+                    formatted_messages.append({
+                        "role": "user",
+                        "content": [
+                            {
+                                "type": "image_url",
+                                "image_url": data_url
+                            }
+                        ]
+                    })
+                elif "gpt" in self.model_name:
+                    # GPT and other models expect a different format
+                    if msg["type"] == "image":
+                        # Default format for images and videos in GPT
+                        formatted_messages.append({
+                            "role": "user",
+                            "content": [
+                                {
+                                    "type": f"image_url",
+                                    f"{msg['type']}_url": {
+                                        "url": data_url,
+                                        "detail": "high"
+                                    }
+                                }
+                            ]
+                        })
+                    else:
+                        raise ValueError("For GPT, only text and image inferencing are supported")
+                else:
+                    raise ValueError("Only support Gemini and Gpt for Multimodal capability now")
+        try:
+            # if it's openai o series model, set temperature to None and reasoning_effort to "medium"
+            if (re.match(r"^o\d+.*$", self.model_name) or re.match(r"^openai/o.*$", self.model_name)):
+                self.temperature = None
+                self.reasoning_effort = "medium"
+                response = completion(
+                    model=self.model_name,
+                    messages=formatted_messages,
+                    temperature=self.temperature,
+                    reasoning_effort=self.reasoning_effort,
+                    metadata=metadata,
+                    max_retries=99
+                )
+            else:
+                response = completion(
+                    model=self.model_name,
+                    messages=formatted_messages,
+                    temperature=self.temperature,
+                    metadata=metadata,
+                    max_retries=99
+                )
+            if self.print_cost:
+                # pass your response from completion to completion_cost
+                cost = completion_cost(completion_response=response)
+                formatted_string = f"Cost: ${float(cost):.10f}"
+                # print(formatted_string)
+                self.accumulated_cost += cost
+                print(f"Accumulated Cost: ${self.accumulated_cost:.10f}")
+            content = response.choices[0].message.content
+            if content is None:
+                print(f"Got null response from model. Full response: {response}")
+            return content
+        except Exception as e:
+            print(f"Error in model completion: {e}")
+            return str(e)
+if __name__ == "__main__":
+    pass

mllm_tools/utils.py ADDED Viewed

	@@ -0,0 +1,174 @@

+from typing import Union, List, Dict, Any, Optional
+from PIL import Image
+import google.generativeai as genai
+import tempfile
+import os
+from .gemini import GeminiWrapper
+from .vertex_ai import VertexAIWrapper
+def _prepare_text_inputs(texts: List[str]) -> List[Dict[str, str]]:
+    """
+    Converts a list of text strings into the input format for the Agent model.
+    Args:
+        texts (List[str]): The list of text strings to be processed.
+    Returns:
+        List[Dict[str, str]]: A list of dictionaries formatted for the Agent model.
+    """
+    inputs = []
+    # Add each text string to the inputs
+    if isinstance(texts, str):
+        texts = [texts]
+    for text in texts:
+        inputs.append({
+            "type": "text",
+            "content": text
+        })
+    return inputs
+def _prepare_text_image_inputs(texts: Union[str, List[str]], images: Union[str, Image.Image, List[Union[str, Image.Image]]]) -> List[Dict[str, str]]:
+    """
+    Converts text strings and images into the input format for the Agent model.
+    Args:
+        texts (Union[str, List[str]]): Text string(s) to be processed.
+        images (Union[str, Image.Image, List[Union[str, Image.Image]]]): Image file path(s) or PIL Image object(s).
+    Returns:
+        List[Dict[str, str]]: A list of dictionaries formatted for the Agent model.
+    """
+    inputs = []
+    # Add each text string to the inputs
+    if isinstance(texts, str):
+        texts = [texts]
+    for text in texts:
+        inputs.append({
+            "type": "text",
+            "content": text
+        })
+    if isinstance(images, (str, Image.Image)):
+        images = [images]
+    for image in images:
+        inputs.append({
+            "type": "image",
+            "content": image
+        })
+    return inputs
+def _prepare_text_video_inputs(texts: Union[str, List[str]], videos: Union[str, List[str]]) -> List[Dict[str, str]]:
+    """
+    Converts text strings and video file paths into the input format for the Agent model.
+    Args:
+        texts (Union[str, List[str]]): Text string(s) to be processed.
+        videos (Union[str, List[str]]): Video file path(s).
+    Returns:
+        List[Dict[str, str]]: A list of dictionaries formatted for the Agent model.
+    """
+    inputs = []
+    # Add each text string to the inputs
+    if isinstance(texts, str):
+        texts = [texts]
+    for text in texts:
+        inputs.append({
+            "type": "text",
+            "content": text
+        })
+    # Add each video file path to the inputs
+    if isinstance(videos, str):
+        videos = [videos]
+    for video in videos:
+        inputs.append({
+            "type": "video",
+            "content": video
+        })
+    return inputs
+def _prepare_text_audio_inputs(texts: Union[str, List[str]], audios: Union[str, List[str]]) -> List[Dict[str, str]]:
+    """
+    Converts text strings and audio file paths into the input format for the Agent model.
+    Args:
+        texts (Union[str, List[str]]): Text string(s) to be processed.
+        audios (Union[str, List[str]]): Audio file path(s).
+    Returns:
+        List[Dict[str, str]]: A list of dictionaries formatted for the Agent model.
+    """
+    inputs = []
+    # Add each text string to the inputs
+    if isinstance(texts, str):
+        texts = [texts]
+    for text in texts:
+        inputs.append({
+            "type": "text",
+            "content": text
+        })
+    # Add each audio file path to the inputs
+    if isinstance(audios, str):
+        audios = [audios]
+    for audio in audios:
+        inputs.append({
+            "type": "audio",
+            "content": audio
+        })
+    return inputs
+def _extract_code(text: str) -> str:
+    """Helper to extract code block from model response, support Gemini style and OpenAI style"""
+    try:
+        # Find code between ```python and ``` tags
+        start = text.split("```python\n")[-1]
+        end = start.split("```")[0]
+        return end.strip()
+    except IndexError:
+        return text
+def _upload_to_gemini(input, mime_type=None):
+    """Uploads the given file or PIL image to Gemini.
+    See https://ai.google.dev/gemini-api/docs/prompting_with_media
+    """
+    if isinstance(input, str):
+        # Input is a file path
+        file = genai.upload_file(input, mime_type=mime_type)
+    elif isinstance(input, Image.Image):
+        # Input is a PIL image
+        with tempfile.NamedTemporaryFile(suffix=".jpg", delete=False) as tmp_file:
+            input.save(tmp_file, format="JPEG")
+            tmp_file_path = tmp_file.name
+        file = genai.upload_file(tmp_file_path, mime_type=mime_type or "image/jpeg")
+        os.remove(tmp_file_path)
+    else:
+        raise ValueError("Unsupported input type. Must be a file path or PIL Image.")
+    #print(f"Uploaded file '{file.display_name}' as: {file.uri}")
+    return file
+def get_media_wrapper(model_name: str) -> Optional[Union[GeminiWrapper, VertexAIWrapper]]:
+    """Get appropriate wrapper for media handling based on model name"""
+    if model_name.startswith('gemini/'):
+        return GeminiWrapper(model_name=model_name.split('/')[-1])
+    elif model_name.startswith('vertex_ai/'):
+        return VertexAIWrapper(model_name=model_name.split('/')[-1])
+    return None
+def prepare_media_messages(prompt: str, media_path: Union[str, Image.Image], model_name: str) -> List[Dict[str, Any]]:
+    """Prepare messages for media input based on model type"""
+    is_video = isinstance(media_path, str) and media_path.endswith('.mp4')
+    if is_video and (model_name.startswith('gemini/') or model_name.startswith('vertex_ai/')):
+        return [
+            {"type": "text", "content": prompt},
+            {"type": "video", "content": media_path}
+        ]
+    else:
+        # For images or non-Gemini/Vertex models
+        if isinstance(media_path, str):
+            media = Image.open(media_path)
+        else:
+            media = media_path
+        return [
+            {"type": "text", "content": prompt},
+            {"type": "image", "content": media}
+        ]

mllm_tools/vertex_ai.py ADDED Viewed

	@@ -0,0 +1,86 @@

+import os
+from typing import List, Dict, Any, Optional
+import vertexai
+from vertexai.generative_models import GenerativeModel, Part
+from google.auth import default
+from google.auth.transport import requests
+# TODO: check if this is the correct way to use Vertex AI
+# TODO: add langfuse support
+class VertexAIWrapper:
+    """Wrapper for Vertex AI to support Gemini models."""
+    def __init__(
+        self,
+        model_name: str = "gemini-1.5-pro",
+        temperature: float = 0.7,
+        print_cost: bool = False,
+        verbose: bool = False,
+        use_langfuse: bool = False
+    ):
+        """Initialize the Vertex AI wrapper.
+        Args:
+            model_name: Name of the model to use (e.g. "gemini-1.5-pro")
+            temperature: Temperature for generation between 0 and 1
+            print_cost: Whether to print the cost of the completion
+            verbose: Whether to print verbose output
+            use_langfuse: Whether to enable Langfuse logging
+        """
+        self.model_name = model_name
+        self.temperature = temperature
+        self.print_cost = print_cost
+        self.verbose = verbose
+        # Initialize Vertex AI
+        project_id = os.getenv("GOOGLE_CLOUD_PROJECT")
+        location = os.getenv("GOOGLE_CLOUD_LOCATION", "us-central1")
+        if not project_id:
+            raise ValueError("No GOOGLE_CLOUD_PROJECT found in environment variables")
+        vertexai.init(project=project_id, location=location)
+        self.model = GenerativeModel(model_name)
+    def __call__(self, messages: List[Dict[str, Any]], metadata: Optional[Dict[str, Any]] = None) -> str:
+        """Process messages and return completion.
+        Args:
+            messages: List of message dictionaries containing type and content
+            metadata: Optional metadata dictionary to pass to the model
+        Returns:
+            Generated text response from the model
+        Raises:
+            ValueError: If message type is not supported
+        """
+        parts = []
+        for msg in messages:
+            if msg["type"] == "text":
+                parts.append(Part.from_text(msg["content"]))
+            elif msg["type"] in ["image", "video"]:
+                mime_type = "video/mp4" if msg["type"] == "video" else "image/jpeg"
+                if isinstance(msg["content"], str):
+                    # Handle GCS URI
+                    parts.append(Part.from_uri(
+                        msg["content"],
+                        mime_type=mime_type
+                    ))
+                else:
+                    # Handle file path or bytes
+                    parts.append(Part.from_data(
+                        msg["content"],
+                        mime_type=mime_type
+                    ))
+        response = self.model.generate_content(
+            parts,
+            generation_config={
+                "temperature": self.temperature,
+                "top_p": 0.95,
+            }
+        )
+        return response.text

requirements.txt ADDED Viewed

	@@ -0,0 +1,101 @@

+annotated-types~=0.7.0
+azure-cognitiveservices-speech~=1.41.1
+cachetools~=5.5.0
+certifi~=2024.8.30
+charset-normalizer~=3.4.0
+click~=8.1.7
+cloup~=3.0.5
+Cython~=3.0.11
+decorator~=5.1.1
+glcontext~=3.0.0
+google-ai-generativelanguage~=0.6.10
+google-api-core~=2.22.0
+google-api-python-client~=2.151.0
+google-auth~=2.35.0
+google-auth-httplib2~=0.2.0
+google-generativeai~=0.8.3
+googleapis-common-protos~=1.65.0
+grpcio~=1.67.1
+grpcio-status~=1.67.1
+gTTS~=2.5.3
+httplib2~=0.22.0
+idna~=3.10
+isosurfaces~=0.1.2
+manim~=0.18.1
+manim-voiceover~=0.3.7
+ManimPango~=0.6.0 # sudo apt-get install libsdl-pango-dev    if you dont have pangocairo
+mapbox_earcut~=1.0.2
+markdown-it-py~=3.0.0
+mdurl~=0.1.2
+moderngl~=5.12.0
+multipledispatch~=1.0.0
+mutagen~=1.47.0
+networkx~=3.4.2
+numpy~=2.2.2
+pillow
+proto-plus~=1.25.0
+protobuf~=5.28.3
+pyasn1~=0.6.1
+pyasn1_modules~=0.4.1
+PyAudio~=0.2.14 #required brew install portaudio for mac
+pycairo~=1.27.0
+pydantic~=2.9.2
+pydantic_core~=2.23.4
+pydub~=0.25.1
+pyglet~=2.0.18
+Pygments~=2.18.0
+#pyobjc-core~=10.3.1 # only for mac
+#pyobjc-framework-Cocoa~=10.3.1 # only for mac
+pyparsing~=3.2.0
+pyrr~=0.10.3
+python-dotenv~=0.21.1
+python-slugify~=8.0.4
+requests~=2.32.3
+rich~=13.9.3
+rsa~=4.9
+scipy~=1.14.1
+screeninfo~=0.8.1
+skia-pathops~=0.8.0.post2
+sox~=1.5.0
+srt~=3.5.3
+svgelements~=1.9.6
+text-unidecode~=1.3
+tqdm~=4.66.5
+typing_extensions~=4.12.2
+uritemplate~=4.1.1
+urllib3~=2.2.3
+watchdog~=5.0.3
+inquirer
+openai~=1.61.0
+tiktoken~=0.8.0
+timm
+sentencepiece
+transformers
+litellm~=1.60.5
+pysrt
+moviepy~=2.1.2
+yt-dlp
+imageio_ffmpeg~=0.5.1
+langchain~=0.3.14
+langchain_community~=0.3.14
+SpeechRecognition~=3.14.1
+boto3~=1.36.9
+manim-physics~=0.4.0
+manim-ml~=0.0.24
+manim-chemistry~=0.4.4
+manim-dsa~=0.2.0
+manim-circuit~=0.0.3
+langfuse~=2.58.1
+chromadb~=0.6.3
+google-cloud-aiplatform~=1.79.0
+cairosvg
+pylatexenc~=2.10
+ffmpeg-python~=0.2.0
+kokoro-onnx[gpu] # if you have a GPU, otherwise kokoro-onnx
+soundfile~=0.13.1
+krippendorff~=0.8.1
+statsmodels~=0.14.4
+opencv-python~=4.11.0
+fastapi
+uvicorn
+gradio

src/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ # This is essential for the release to work

src/config/__init__.py ADDED Viewed

File without changes

src/config/config.py ADDED Viewed

	@@ -0,0 +1,20 @@

+import os
+from dotenv import load_dotenv
+# Load environment variables from .env file
+load_dotenv()
+class Config:
+    OUTPUT_DIR = "output"
+    THEOREMS_PATH = os.path.join("data", "easy_20.json")
+    CONTEXT_LEARNING_PATH = "data/context_learning"
+    CHROMA_DB_PATH = "data/rag/chroma_db"
+    MANIM_DOCS_PATH = "data/rag/manim_docs"
+    EMBEDDING_MODEL = "azure/text-embedding-3-large"
+    # Kokoro TTS configurations
+    KOKORO_MODEL_PATH = os.getenv('KOKORO_MODEL_PATH')
+    KOKORO_VOICES_PATH = os.getenv('KOKORO_VOICES_PATH')
+    KOKORO_DEFAULT_VOICE = os.getenv('KOKORO_DEFAULT_VOICE')
+    KOKORO_DEFAULT_SPEED = float(os.getenv('KOKORO_DEFAULT_SPEED', '1.0'))
+    KOKORO_DEFAULT_LANG = os.getenv('KOKORO_DEFAULT_LANG')

src/core/__init__.py ADDED Viewed

File without changes

src/core/code_generator.py ADDED Viewed

	@@ -0,0 +1,454 @@

+import os
+import re
+import json
+from typing import Union, List, Dict
+from PIL import Image
+import glob
+from src.utils.utils import extract_json
+from mllm_tools.utils import _prepare_text_inputs, _extract_code, _prepare_text_image_inputs
+from mllm_tools.gemini import GeminiWrapper
+from mllm_tools.vertex_ai import VertexAIWrapper
+from task_generator import (
+    get_prompt_code_generation,
+    get_prompt_fix_error,
+    get_prompt_visual_fix_error,
+    get_banned_reasonings,
+    get_prompt_rag_query_generation_fix_error,
+    get_prompt_context_learning_code,
+    get_prompt_rag_query_generation_code
+)
+from task_generator.prompts_raw import (
+    _code_font_size,
+    _code_disable,
+    _code_limit,
+    _prompt_manim_cheatsheet
+)
+from src.rag.vector_store import RAGVectorStore # Import RAGVectorStore
+class CodeGenerator:
+    """A class for generating and managing Manim code."""
+    def __init__(self, scene_model, helper_model, output_dir="output", print_response=False, use_rag=False, use_context_learning=False, context_learning_path="data/context_learning", chroma_db_path="rag/chroma_db", manim_docs_path="rag/manim_docs", embedding_model="azure/text-embedding-3-large", use_visual_fix_code=False, use_langfuse=True, session_id=None):
+        """Initialize the CodeGenerator.
+        Args:
+            scene_model: The model used for scene generation
+            helper_model: The model used for helper tasks
+            output_dir (str, optional): Directory for output files. Defaults to "output".
+            print_response (bool, optional): Whether to print model responses. Defaults to False.
+            use_rag (bool, optional): Whether to use RAG. Defaults to False.
+            use_context_learning (bool, optional): Whether to use context learning. Defaults to False.
+            context_learning_path (str, optional): Path to context learning examples. Defaults to "data/context_learning".
+            chroma_db_path (str, optional): Path to ChromaDB. Defaults to "rag/chroma_db".
+            manim_docs_path (str, optional): Path to Manim docs. Defaults to "rag/manim_docs".
+            embedding_model (str, optional): Name of embedding model. Defaults to "azure/text-embedding-3-large".
+            use_visual_fix_code (bool, optional): Whether to use visual code fixing. Defaults to False.
+            use_langfuse (bool, optional): Whether to use Langfuse logging. Defaults to True.
+            session_id (str, optional): Session identifier. Defaults to None.
+        """
+        self.scene_model = scene_model
+        self.helper_model = helper_model
+        self.output_dir = output_dir
+        self.print_response = print_response
+        self.use_rag = use_rag
+        self.use_context_learning = use_context_learning
+        self.context_learning_path = context_learning_path
+        self.context_examples = self._load_context_examples() if use_context_learning else None
+        self.manim_docs_path = manim_docs_path
+        self.use_visual_fix_code = use_visual_fix_code
+        self.banned_reasonings = get_banned_reasonings()
+        self.session_id = session_id # Use session_id passed from VideoGenerator
+        if use_rag:
+            self.vector_store = RAGVectorStore(
+                chroma_db_path=chroma_db_path,
+                manim_docs_path=manim_docs_path,
+                embedding_model=embedding_model,
+                session_id=self.session_id,
+                use_langfuse=use_langfuse
+            )
+        else:
+            self.vector_store = None
+    def _load_context_examples(self) -> str:
+        """Load all context learning examples from the specified directory.
+        Returns:
+            str: Formatted context learning examples, or None if no examples found.
+        """
+        examples = []
+        for example_file in glob.glob(f"{self.context_learning_path}/**/*.py", recursive=True):
+            with open(example_file, 'r') as f:
+                examples.append(f"# Example from {os.path.basename(example_file)}\n{f.read()}\n")
+        # Format examples using get_prompt_context_learning_code instead of _prompt_context_learning
+        if examples:
+            formatted_examples = get_prompt_context_learning_code(
+                examples="\n".join(examples)
+            )
+            return formatted_examples
+        return None
+    def _generate_rag_queries_code(self, implementation: str, scene_trace_id: str = None, topic: str = None, scene_number: int = None, session_id: str = None, relevant_plugins: List[str] = []) -> List[str]:
+        """Generate RAG queries from the implementation plan.
+        Args:
+            implementation (str): The implementation plan text
+            scene_trace_id (str, optional): Trace ID for the scene. Defaults to None.
+            topic (str, optional): Topic of the scene. Defaults to None.
+            scene_number (int, optional): Scene number. Defaults to None.
+            session_id (str, optional): Session identifier. Defaults to None.
+            relevant_plugins (List[str], optional): List of relevant plugins. Defaults to empty list.
+        Returns:
+            List[str]: List of generated RAG queries
+        """
+        # Create a cache key for this scene
+        cache_key = f"{topic}_scene{scene_number}"
+        # Check if we already have a cache file for this scene
+        cache_dir = os.path.join(self.output_dir, re.sub(r'[^a-z0-9_]+', '_', topic.lower()), f"scene{scene_number}", "rag_cache")
+        os.makedirs(cache_dir, exist_ok=True)
+        cache_file = os.path.join(cache_dir, "rag_queries_code.json")
+        # If cache file exists, load and return cached queries
+        if os.path.exists(cache_file):
+            with open(cache_file, 'r') as f:
+                cached_queries = json.load(f)
+                print(f"Using cached RAG queries for {cache_key}")
+                return cached_queries
+        # Generate new queries if not cached
+        if relevant_plugins:
+            prompt = get_prompt_rag_query_generation_code(implementation, ", ".join(relevant_plugins))
+        else:
+            prompt = get_prompt_rag_query_generation_code(implementation, "No plugins are relevant.")
+        queries = self.helper_model(
+            _prepare_text_inputs(prompt),
+            metadata={"generation_name": "rag_query_generation", "trace_id": scene_trace_id, "tags": [topic, f"scene{scene_number}"], "session_id": session_id}
+        )
+        print(f"RAG queries: {queries}")
+        # retreive json triple backticks
+        try: # add try-except block to handle potential json decode errors
+            queries = re.search(r'```json(.*)```', queries, re.DOTALL).group(1)
+            queries = json.loads(queries)
+        except json.JSONDecodeError as e:
+            print(f"JSONDecodeError when parsing RAG queries for storyboard: {e}")
+            print(f"Response text was: {queries}")
+            return [] # Return empty list in case of parsing error
+        # Cache the queries
+        with open(cache_file, 'w') as f:
+            json.dump(queries, f)
+        return queries
+    def _generate_rag_queries_error_fix(self, error: str, code: str, scene_trace_id: str = None, topic: str = None, scene_number: int = None, session_id: str = None, relevant_plugins: List[str] = []) -> List[str]:
+        """Generate RAG queries for fixing code errors.
+        Args:
+            error (str): The error message to fix
+            code (str): The code containing the error
+            scene_trace_id (str, optional): Trace ID for the scene. Defaults to None.
+            topic (str, optional): Topic of the scene. Defaults to None.
+            scene_number (int, optional): Scene number. Defaults to None.
+            session_id (str, optional): Session identifier. Defaults to None.
+            relevant_plugins (List[str], optional): List of relevant plugins. Defaults to empty list.
+        Returns:
+            List[str]: List of generated RAG queries for error fixing
+        """
+        # Create a cache key for this scene and error
+        cache_key = f"{topic}_scene{scene_number}_error_fix"
+        # Check if we already have a cache file for error fix queries
+        cache_dir = os.path.join(self.output_dir, re.sub(r'[^a-z0-9_]+', '_', topic.lower()), f"scene{scene_number}", "rag_cache")
+        os.makedirs(cache_dir, exist_ok=True)
+        cache_file = os.path.join(cache_dir, "rag_queries_error_fix.json")
+        # If cache file exists, load and return cached queries
+        if os.path.exists(cache_file):
+            with open(cache_file, 'r') as f:
+                cached_queries = json.load(f)
+                print(f"Using cached RAG queries for error fix in {cache_key}")
+                return cached_queries
+        # Generate new queries for error fix if not cached
+        prompt = get_prompt_rag_query_generation_fix_error(
+            error=error,
+            code=code,
+            relevant_plugins=", ".join(relevant_plugins) if relevant_plugins else "No plugins are relevant."
+        )
+        queries = self.helper_model(
+            _prepare_text_inputs(prompt),
+            metadata={"generation_name": "rag-query-generation-fix-error", "trace_id": scene_trace_id, "tags": [topic, f"scene{scene_number}"], "session_id": session_id}
+        )
+        # remove json triple backticks
+        queries = queries.replace("```json", "").replace("```", "")
+        try: # add try-except block to handle potential json decode errors
+            queries = json.loads(queries)
+        except json.JSONDecodeError as e:
+            print(f"JSONDecodeError when parsing RAG queries for error fix: {e}")
+            print(f"Response text was: {queries}")
+            return [] # Return empty list in case of parsing error
+        # Cache the queries
+        with open(cache_file, 'w') as f:
+            json.dump(queries, f)
+        return queries
+    def _extract_code_with_retries(self, response_text: str, pattern: str, generation_name: str = None, trace_id: str = None, session_id: str = None, max_retries: int = 10) -> str:
+        """Extract code from response text with retry logic.
+        Args:
+            response_text (str): The text containing code to extract
+            pattern (str): Regex pattern for extracting code
+            generation_name (str, optional): Name of generation step. Defaults to None.
+            trace_id (str, optional): Trace identifier. Defaults to None.
+            session_id (str, optional): Session identifier. Defaults to None.
+            max_retries (int, optional): Maximum number of retries. Defaults to 10.
+        Returns:
+            str: The extracted code
+        Raises:
+            ValueError: If code extraction fails after max retries
+        """
+        retry_prompt = """
+        Please extract the Python code in the correct format using the pattern: {pattern}.
+        You MUST NOT include any other text or comments.
+        You MUST return the exact same code as in the previous response, NO CONTENT EDITING is allowed.
+        Previous response:
+        {response_text}
+        """
+        for attempt in range(max_retries):
+            code_match = re.search(pattern, response_text, re.DOTALL)
+            if code_match:
+                return code_match.group(1)
+            if attempt < max_retries - 1:
+                print(f"Attempt {attempt + 1}: Failed to extract code pattern. Retrying...")
+                # Regenerate response with a more explicit prompt
+                response_text = self.scene_model(
+                    _prepare_text_inputs(retry_prompt.format(pattern=pattern, response_text=response_text)),
+                    metadata={
+                        "generation_name": f"{generation_name}_format_retry_{attempt + 1}",
+                        "trace_id": trace_id,
+                        "session_id": session_id
+                    }
+                )
+        raise ValueError(f"Failed to extract code pattern after {max_retries} attempts. Pattern: {pattern}")
+    def generate_manim_code(self,
+                            topic: str,
+                            description: str,
+                            scene_outline: str,
+                            scene_implementation: str,
+                            scene_number: int,
+                            additional_context: Union[str, List[str]] = None,
+                            scene_trace_id: str = None,
+                            session_id: str = None,
+                            rag_queries_cache: Dict = None) -> str:
+        """Generate Manim code from video plan.
+        Args:
+            topic (str): Topic of the scene
+            description (str): Description of the scene
+            scene_outline (str): Outline of the scene
+            scene_implementation (str): Implementation details
+            scene_number (int): Scene number
+            additional_context (Union[str, List[str]], optional): Additional context. Defaults to None.
+            scene_trace_id (str, optional): Trace identifier. Defaults to None.
+            session_id (str, optional): Session identifier. Defaults to None.
+            rag_queries_cache (Dict, optional): Cache for RAG queries. Defaults to None.
+        Returns:
+            Tuple[str, str]: Generated code and response text
+        """
+        if self.use_context_learning:
+            # Add context examples to additional_context
+            if additional_context is None:
+                additional_context = []
+            elif isinstance(additional_context, str):
+                additional_context = [additional_context]
+            # Now using the properly formatted code examples
+            if self.context_examples:
+                additional_context.append(self.context_examples)
+        if self.use_rag:
+            # Generate RAG queries (will use cache if available)
+            rag_queries = self._generate_rag_queries_code(
+                implementation=scene_implementation,
+                scene_trace_id=scene_trace_id,
+                topic=topic,
+                scene_number=scene_number,
+                session_id=session_id
+            )
+            retrieved_docs = self.vector_store.find_relevant_docs(
+                queries=rag_queries,
+                k=2, # number of documents to retrieve
+                trace_id=scene_trace_id,
+                topic=topic,
+                scene_number=scene_number
+            )
+            # Format the retrieved documents into a string
+            if additional_context is None:
+                additional_context = []
+            additional_context.append(retrieved_docs)
+        # Format code generation prompt with plan and retrieved context
+        prompt = get_prompt_code_generation(
+            scene_outline=scene_outline,
+            scene_implementation=scene_implementation,
+            topic=topic,
+            description=description,
+            scene_number=scene_number,
+            additional_context=additional_context
+        )
+        # Generate code using model
+        response_text = self.scene_model(
+            _prepare_text_inputs(prompt),
+            metadata={"generation_name": "code_generation", "trace_id": scene_trace_id, "tags": [topic, f"scene{scene_number}"], "session_id": session_id}
+        )
+        # Extract code with retries
+        code = self._extract_code_with_retries(
+            response_text,
+            r"```python(.*)```",
+            generation_name="code_generation",
+            trace_id=scene_trace_id,
+            session_id=session_id
+        )
+        return code, response_text
+    def fix_code_errors(self, implementation_plan: str, code: str, error: str, scene_trace_id: str, topic: str, scene_number: int, session_id: str, rag_queries_cache: Dict = None) -> str:
+        """Fix errors in generated Manim code.
+        Args:
+            implementation_plan (str): Original implementation plan
+            code (str): Code containing errors
+            error (str): Error message to fix
+            scene_trace_id (str): Trace identifier
+            topic (str): Topic of the scene
+            scene_number (int): Scene number
+            session_id (str): Session identifier
+            rag_queries_cache (Dict, optional): Cache for RAG queries. Defaults to None.
+        Returns:
+            Tuple[str, str]: Fixed code and response text
+        """
+        # Format error fix prompt
+        prompt = get_prompt_fix_error(implementation_plan=implementation_plan, manim_code=code, error=error)
+        if self.use_rag:
+            # Generate RAG queries for error fixing
+            rag_queries = self._generate_rag_queries_error_fix(
+                error=error,
+                code=code,
+                scene_trace_id=scene_trace_id,
+                topic=topic,
+                scene_number=scene_number,
+                session_id=session_id
+            )
+            retrieved_docs = self.vector_store.find_relevant_docs(
+                queries=rag_queries,
+                k=2, # number of documents to retrieve for error fixing
+                trace_id=scene_trace_id,
+                topic=topic,
+                scene_number=scene_number
+            )
+            # Format the retrieved documents into a string
+            prompt = get_prompt_fix_error(implementation_plan=implementation_plan, manim_code=code, error=error, additional_context=retrieved_docs)
+        # Get fixed code from model
+        response_text = self.scene_model(
+            _prepare_text_inputs(prompt),
+            metadata={"generation_name": "code_fix_error", "trace_id": scene_trace_id, "tags": [topic, f"scene{scene_number}"], "session_id": session_id}
+        )
+        # Extract fixed code with retries
+        fixed_code = self._extract_code_with_retries(
+            response_text,
+            r"```python(.*)```",
+            generation_name="code_fix_error",
+            trace_id=scene_trace_id,
+            session_id=session_id
+        )
+        return fixed_code, response_text
+    def visual_self_reflection(self, code: str, media_path: Union[str, Image.Image], scene_trace_id: str, topic: str, scene_number: int, session_id: str) -> str:
+        """Use snapshot image or mp4 video to fix code.
+        Args:
+            code (str): Code to fix
+            media_path (Union[str, Image.Image]): Path to media file or PIL Image
+            scene_trace_id (str): Trace identifier
+            topic (str): Topic of the scene
+            scene_number (int): Scene number
+            session_id (str): Session identifier
+        Returns:
+            Tuple[str, str]: Fixed code and response text
+        """
+        # Determine if we're dealing with video or image
+        is_video = isinstance(media_path, str) and media_path.endswith('.mp4')
+        # Load prompt template
+        with open('task_generator/prompts_raw/prompt_visual_self_reflection.txt', 'r') as f:
+            prompt_template = f.read()
+        # Format prompt
+        prompt = prompt_template.format(code=code)
+        # Prepare input based on media type
+        if is_video and isinstance(self.scene_model, (GeminiWrapper, VertexAIWrapper)):
+            # For video with Gemini models
+            messages = [
+                {"type": "text", "content": prompt},
+                {"type": "video", "content": media_path}
+            ]
+        else:
+            # For images or non-Gemini models
+            if isinstance(media_path, str):
+                media = Image.open(media_path)
+            else:
+                media = media_path
+            messages = [
+                {"type": "text", "content": prompt},
+                {"type": "image", "content": media}
+            ]
+        # Get model response
+        response_text = self.scene_model(
+            messages,
+            metadata={
+                "generation_name": "visual_self_reflection",
+                "trace_id": scene_trace_id,
+                "tags": [topic, f"scene{scene_number}"],
+                "session_id": session_id
+            }
+        )
+        # Extract code with retries
+        fixed_code = self._extract_code_with_retries(
+            response_text,
+            r"```python(.*)```",
+            generation_name="visual_self_reflection",
+            trace_id=scene_trace_id,
+            session_id=session_id
+        )
+        return fixed_code, response_text

src/core/parse_video.py ADDED Viewed

	@@ -0,0 +1,227 @@

+import os
+import pysrt
+from moviepy import VideoFileClip
+import shutil
+from PIL import Image, ImageOps
+import numpy as np
+import speech_recognition as sr
+def get_images_from_video(video_path, fps=0.2):
+    """Extract frames from a video file at specified FPS.
+    Args:
+        video_path (str): Path to the video file.
+        fps (float, optional): Frames per second to extract. Defaults to 0.2.
+    Returns:
+        list: List of frames as numpy arrays.
+    """
+    clip = VideoFileClip(video_path)
+    images = clip.iter_frames(fps=fps)
+    return images
+def image_with_most_non_black_space(images, output_path, return_type="path"):
+    """Find and save the image with the most non-black space from a list of images.
+    Args:
+        images (list): List of image file paths, PIL Image objects, or numpy arrays.
+        output_path (str): Path where the output image should be saved.
+        return_type (str, optional): Type of return value - "path" or "image". Defaults to "path".
+    Returns:
+        Union[str, PIL.Image, None]: Path to saved image, PIL Image object, or None if no valid image found.
+    """
+    max_non_black_area = 0
+    image_with_max_non_black_space = None
+    for img in images:
+        try:
+            # If img is a path, open the image
+            if isinstance(img, str):
+                image = Image.open(img)
+            elif isinstance(img, Image.Image):
+                image = img
+            elif isinstance(img, np.ndarray):
+                image = Image.fromarray(img)
+            else:
+                print(f"Unsupported type: {type(img)}. Skipping.")
+                continue
+            # Convert to grayscale
+            gray = ImageOps.grayscale(image)
+            # Convert to numpy array
+            gray_array = np.array(gray)
+            # Count non-black pixels (threshold to consider near-black as black)
+            non_black_pixels = np.sum(gray_array > 10)  # Threshold 10 to account for slight variations in black
+            if non_black_pixels > max_non_black_area:
+                max_non_black_area = non_black_pixels
+                image_with_max_non_black_space = image
+        except Exception as e:
+            print(f"Warning: Unable to process image {img}: {e}")
+    if image_with_max_non_black_space is not None:
+        image_with_max_non_black_space.save(output_path)
+        print(f"Saved image with most non-black space to {output_path}")
+        if return_type == "path":
+            return output_path
+        else:
+            return image_with_max_non_black_space
+    return image_with_max_non_black_space
+def parse_srt_to_text(output_dir, topic_name):
+    """Convert SRT subtitle file to plain text.
+    Args:
+        output_dir (str): Directory containing the topic folders.
+        topic_name (str): Name of the topic/video.
+    """
+    topic_name = topic_name.replace(" ", "_").lower()
+    srt_path = os.path.join(output_dir, topic_name, f"{topic_name}_combined.srt")
+    txt_path = os.path.join(output_dir, topic_name, f"{topic_name}_combined.txt")
+    subs = pysrt.open(srt_path)
+    with open(txt_path, 'w') as f:
+        full_text = ""
+        for sub in subs:
+            sub.text = sub.text.replace("...", ".")
+            full_text += sub.text + " "
+        f.write(full_text.strip())
+def parse_srt_and_extract_frames(output_dir, topic_name):
+    """Extract frames from video at subtitle timestamps and save with corresponding text.
+    Args:
+        output_dir (str): Directory containing the topic folders.
+        topic_name (str): Name of the topic/video.
+    """
+    topic_name = topic_name.replace(" ", "_").lower()
+    video_path = os.path.join(output_dir, topic_name, f"{topic_name}_combined.mp4")
+    srt_path = os.path.join(output_dir, topic_name, f"{topic_name}_combined.srt")
+    subs = pysrt.open(srt_path)
+    # Create extract_images folder if it doesn't exist
+    images_dir = os.path.join(output_dir, topic_name, "extract_images")
+    if os.path.exists(images_dir):
+        shutil.rmtree(images_dir)
+    os.makedirs(images_dir)
+    # Load the video file
+    video = VideoFileClip(video_path)
+    # Dictionary to store image-text pairs
+    pairs = {}
+    i = 0
+    while i < len(subs):
+        sub = subs[i]
+        text = sub.text
+        sub_indexes = [sub.index]
+        # Check if we need to concatenate with next subtitle
+        while i < len(subs) - 1 and not text.strip().endswith('.'):
+            i += 1
+            next_sub = subs[i]
+            text += " " + next_sub.text
+            sub_indexes.append(next_sub.index)
+        # Get the end time of the last concatenated subtitle
+        end_time = sub.end.to_time()
+        # Convert end time to seconds
+        end_time_seconds = end_time.hour * 3600 + end_time.minute * 60 + end_time.second + end_time.microsecond / 1e6
+        # Save the frame as an image in extract_images folder
+        frame_path = os.path.join(images_dir, f"{sub.index}.jpg")
+        video.save_frame(frame_path, t=end_time_seconds)
+        # Save the subtitle text to a txt file
+        text_path = os.path.join(images_dir, f"{sub.index}.txt")
+        with open(text_path, 'w') as f:
+            f.write(text)
+        # Add pair to dictionary
+        pairs[str(sub.index)] = {
+            "image_path": f"{sub.index}.jpg",
+            "text": text,
+            "text_path": f"{sub.index}.txt",
+            "srt_index": sub_indexes,
+        }
+        i += 1
+    # Save pairs to json file
+    import json
+    json_path = os.path.join(images_dir, "pairs.json")
+    with open(json_path, 'w') as f:
+        json.dump(pairs, f, indent=4)
+    # Close the video file
+    video.close()
+def extract_trasnscript(video_path):
+    """Extract transcript from video audio using Google Speech Recognition.
+    Args:
+        video_path (str): Path to the video file.
+    Returns:
+        str: Transcribed text from the video audio.
+    Raises:
+        FileNotFoundError: If video file does not exist.
+    """
+    if not os.path.exists(video_path):
+        raise FileNotFoundError(f"Video file not found: {video_path}")
+    clip = VideoFileClip(video_path)
+    # write the video to a temporary audio file
+    audio_path = os.path.join(os.path.dirname(video_path), "audio.wav")
+    clip.audio.write_audiofile(audio_path)
+    try:
+        # extract the subtitles from the audio file
+        recognizer = sr.Recognizer()
+        with sr.AudioFile(audio_path) as source:
+            audio = recognizer.record(source)
+        return recognizer.recognize_google(audio)
+    finally:
+        # clean up the temporary audio file
+        if os.path.exists(audio_path):
+            os.remove(audio_path)
+if __name__ == "__main__":
+    import argparse
+    def process_all_topics(output_folder):
+        """Process all topic folders in the output directory.
+        Args:
+            output_folder (str): Directory containing the topic folders.
+        """
+        # Only get immediate subdirectories
+        topics = [d for d in os.listdir(output_folder)
+                 if os.path.isdir(os.path.join(output_folder, d))]
+        for topic in topics:
+            print(f"\nProcessing topic: {topic}")
+            try:
+                parse_srt_to_text(output_folder, topic)
+                parse_srt_and_extract_frames(output_folder, topic)
+            except Exception as e:
+                print(f"Error processing {topic}: {str(e)}")
+                continue
+    # Set up argument parser
+    parser = argparse.ArgumentParser(description='Process video files and extract frames with subtitles')
+    parser.add_argument('--output_dir', type=str, default="output",
+                      help='Directory containing the topic folders')
+    args = parser.parse_args()
+    # Process topics using provided output directory
+    process_all_topics(args.output_dir)

src/core/video_planner.py ADDED Viewed

	@@ -0,0 +1,417 @@

+import os
+import re
+import json
+import glob
+from typing import List, Optional
+import uuid
+import asyncio
+from mllm_tools.utils import _prepare_text_inputs
+from src.utils.utils import extract_xml
+from task_generator import (
+    get_prompt_scene_plan,
+    get_prompt_scene_vision_storyboard,
+    get_prompt_scene_technical_implementation,
+    get_prompt_scene_animation_narration,
+    get_prompt_context_learning_scene_plan,
+    get_prompt_context_learning_vision_storyboard,
+    get_prompt_context_learning_technical_implementation,
+    get_prompt_context_learning_animation_narration,
+    get_prompt_context_learning_code
+)
+from src.rag.rag_integration import RAGIntegration
+class VideoPlanner:
+    """A class for planning and generating video content.
+    This class handles the planning and generation of video content including scene outlines,
+    vision storyboards, technical implementations, and animation narrations.
+    Args:
+        planner_model: The model used for planning tasks
+        helper_model: Optional helper model, defaults to planner_model if None
+        output_dir (str): Directory for output files. Defaults to "output"
+        print_response (bool): Whether to print model responses. Defaults to False
+        use_context_learning (bool): Whether to use context learning. Defaults to False
+        context_learning_path (str): Path to context learning examples. Defaults to "data/context_learning"
+        use_rag (bool): Whether to use RAG. Defaults to False
+        session_id (str): Session identifier. Defaults to None
+        chroma_db_path (str): Path to ChromaDB. Defaults to "data/rag/chroma_db"
+        manim_docs_path (str): Path to Manim docs. Defaults to "data/rag/manim_docs"
+        embedding_model (str): Name of embedding model. Defaults to "text-embedding-ada-002"
+        use_langfuse (bool): Whether to use Langfuse logging. Defaults to True
+    """
+    def __init__(self, planner_model, helper_model=None, output_dir="output", print_response=False, use_context_learning=False, context_learning_path="data/context_learning", use_rag=False, session_id=None, chroma_db_path="data/rag/chroma_db", manim_docs_path="data/rag/manim_docs", embedding_model="text-embedding-ada-002", use_langfuse=True):
+        self.planner_model = planner_model
+        self.helper_model = helper_model if helper_model is not None else planner_model
+        self.output_dir = output_dir
+        self.print_response = print_response
+        self.use_context_learning = use_context_learning
+        self.context_learning_path = context_learning_path
+        # Initialize different types of context examples
+        self.scene_plan_examples = self._load_context_examples('scene_plan') if use_context_learning else None
+        self.vision_storyboard_examples = self._load_context_examples('scene_vision_storyboard') if use_context_learning else None
+        self.technical_implementation_examples = self._load_context_examples('technical_implementation') if use_context_learning else None
+        self.animation_narration_examples = self._load_context_examples('scene_animation_narration') if use_context_learning else None
+        self.code_examples = self._load_context_examples('code') if use_context_learning else None
+        self.use_rag = use_rag
+        self.rag_integration = None
+        if use_rag:
+            self.rag_integration = RAGIntegration(
+                helper_model=helper_model,
+                output_dir=output_dir,
+                chroma_db_path=chroma_db_path,
+                manim_docs_path=manim_docs_path,
+                embedding_model=embedding_model,
+                use_langfuse=use_langfuse,
+                session_id=session_id
+            )
+        self.relevant_plugins = []  # Initialize as an empty list
+    def _load_context_examples(self, example_type: str) -> str:
+        """Load context learning examples of a specific type from files.
+        Args:
+            example_type (str): Type of examples to load ('scene_plan', 'scene_vision_storyboard', etc.)
+        Returns:
+            str: Formatted string containing the loaded examples, or None if no examples found
+        """
+        examples = []
+        # Define file patterns for different types
+        file_patterns = {
+            'scene_plan': '*_scene_plan.txt',
+            'scene_vision_storyboard': '*_scene_vision_storyboard.txt',
+            'technical_implementation': '*_technical_implementation.txt',
+            'scene_animation_narration': '*_scene_animation_narration.txt',
+            'code': '*.py'
+        }
+        pattern = file_patterns.get(example_type)
+        if not pattern:
+            return None
+        # Search in subdirectories of context_learning_path
+        for root, _, _ in os.walk(self.context_learning_path):
+            for example_file in glob.glob(os.path.join(root, pattern)):
+                with open(example_file, 'r') as f:
+                    content = f.read()
+                    if example_type == 'code':
+                        examples.append(f"# Example from {os.path.basename(example_file)}\n{content}\n")
+                    else:
+                        examples.append(f"# Example from {os.path.basename(example_file)}\n{content}\n")
+        # Format examples using appropriate template
+        if examples:
+            formatted_examples = self._format_examples(example_type, examples)
+            return formatted_examples
+        return None
+    def _format_examples(self, example_type: str, examples: List[str]) -> str:
+        """Format examples using the appropriate template based on their type.
+        Args:
+            example_type (str): Type of examples to format
+            examples (List[str]): List of example strings to format
+        Returns:
+            str: Formatted examples string, or None if no template found
+        """
+        templates = {
+            'scene_plan': get_prompt_context_learning_scene_plan,
+            'scene_vision_storyboard': get_prompt_context_learning_vision_storyboard,
+            'technical_implementation': get_prompt_context_learning_technical_implementation,
+            'scene_animation_narration': get_prompt_context_learning_animation_narration,
+            'code': get_prompt_context_learning_code
+        }
+        template = templates.get(example_type)
+        if template:
+            return template(examples="\n".join(examples))
+        return None
+    def generate_scene_outline(self,
+                            topic: str,
+                            description: str,
+                            session_id: str) -> str:
+        """Generate a scene outline based on the topic and description.
+        Args:
+            topic (str): The topic of the video
+            description (str): Description of the video content
+            session_id (str): Session identifier
+        Returns:
+            str: Generated scene outline
+        """
+        # Detect relevant plugins upfront if RAG is enabled
+        if self.use_rag:
+            self.relevant_plugins = self.rag_integration.detect_relevant_plugins(topic, description) or []
+            self.rag_integration.set_relevant_plugins(self.relevant_plugins)
+            print(f"Detected relevant plugins: {self.relevant_plugins}")
+        prompt = get_prompt_scene_plan(topic, description)
+        if self.use_context_learning and self.scene_plan_examples:
+            prompt += f"\n\nHere are some example scene plans for reference:\n{self.scene_plan_examples}"
+        # Generate plan using planner model
+        response_text = self.planner_model(
+            _prepare_text_inputs(prompt),
+            metadata={"generation_name": "scene_outline", "tags": [topic, "scene-outline"], "session_id": session_id}
+        )
+        # extract scene outline <SCENE_OUTLINE> ... </SCENE_OUTLINE>
+        scene_outline_match = re.search(r'(<SCENE_OUTLINE>.*?</SCENE_OUTLINE>)', response_text, re.DOTALL)
+        scene_outline = scene_outline_match.group(1) if scene_outline_match else response_text
+        # replace all spaces and special characters with underscores for file path compatibility
+        file_prefix = topic.lower()
+        file_prefix = re.sub(r'[^a-z0-9_]+', '_', file_prefix)
+        # save plan to file
+        os.makedirs(os.path.join(self.output_dir, file_prefix), exist_ok=True) # Ensure directory exists
+        with open(os.path.join(self.output_dir, file_prefix, f"{file_prefix}_scene_outline.txt"), "w") as f:
+            f.write(scene_outline)
+        print(f"Plan saved to {file_prefix}_scene_outline.txt")
+        return scene_outline
+    async def _generate_scene_implementation_single(self, topic: str, description: str, scene_outline_i: str, i: int, file_prefix: str, session_id: str, scene_trace_id: str) -> str:
+        """Generate implementation plan for a single scene.
+        Args:
+            topic (str): The topic of the video
+            description (str): Description of the video content
+            scene_outline_i (str): Outline for this specific scene
+            i (int): Scene number
+            file_prefix (str): Prefix for output files
+            session_id (str): Session identifier
+            scene_trace_id (str): Unique trace ID for this scene
+        Returns:
+            str: Generated implementation plan for the scene
+        """
+        # Initialize empty implementation plan
+        implementation_plan = ""
+        scene_dir = os.path.join(self.output_dir, file_prefix, f"scene{i}")
+        subplan_dir = os.path.join(scene_dir, "subplans")
+        os.makedirs(scene_dir, exist_ok=True)
+        os.makedirs(subplan_dir, exist_ok=True)
+        # Save scene_trace_id to file
+        trace_id_file = os.path.join(subplan_dir, "scene_trace_id.txt")
+        with open(trace_id_file, 'w') as f:
+            f.write(scene_trace_id)
+        print(f"Scene trace ID saved to {trace_id_file}")
+        # ===== Step 1: Generate Scene Vision and Storyboard =====
+        # ===================================================
+        prompt_vision_storyboard = get_prompt_scene_vision_storyboard(i, topic, description, scene_outline_i, self.relevant_plugins)
+        # Add vision storyboard examples only for this stage if available
+        if self.use_context_learning and self.vision_storyboard_examples:
+            prompt_vision_storyboard += f"\n\nHere are some example storyboards:\n{self.vision_storyboard_examples}"
+        if self.rag_integration:
+            # Use the already detected plugins instead of detecting again
+            # relevant_plugins = self.relevant_plugins # Removed redundant variable
+            # print(f"Using detected plugins: {relevant_plugins}") # Removed redundant print
+            # Generate RAG queries
+            rag_queries = self.rag_integration._generate_rag_queries_storyboard(
+                scene_plan=scene_outline_i,
+                scene_trace_id=scene_trace_id,
+                topic=topic,
+                scene_number=i,
+                session_id=session_id,
+                relevant_plugins=self.relevant_plugins # Use self.relevant_plugins directly
+            )
+            retrieved_docs = self.rag_integration.get_relevant_docs(
+                rag_queries=rag_queries,
+                scene_trace_id=scene_trace_id,
+                topic=topic,
+                scene_number=i
+            )
+            # Add documentation to prompt
+            prompt_vision_storyboard += f"\n\n{retrieved_docs}"
+        vision_storyboard_plan = self.planner_model(
+            _prepare_text_inputs(prompt_vision_storyboard),
+            metadata={"generation_name": "scene_vision_storyboard", "trace_id": scene_trace_id, "tags": [topic, f"scene{i}"], "session_id": session_id}
+        )
+        # extract vision storyboard plan <SCENE_VISION_STORYBOARD_PLAN> ... </SCENE_VISION_STORYBOARD_PLAN>
+        vision_match = re.search(r'(<SCENE_VISION_STORYBOARD_PLAN>.*?</SCENE_VISION_STORYBOARD_PLAN>)', vision_storyboard_plan, re.DOTALL)
+        vision_storyboard_plan = vision_match.group(1) if vision_match else vision_storyboard_plan
+        implementation_plan += vision_storyboard_plan + "\n\n"
+        file_path_vs = os.path.join(subplan_dir, f"{file_prefix}_scene{i}_vision_storyboard_plan.txt")
+        with open(file_path_vs, "w") as f:
+            f.write(vision_storyboard_plan)
+        print(f"Scene {i} Vision and Storyboard Plan saved to {file_path_vs}")
+        # ===== Step 2: Generate Technical Implementation Plan =====
+        # =========================================================
+        prompt_technical_implementation = get_prompt_scene_technical_implementation(i, topic, description, scene_outline_i, vision_storyboard_plan, self.relevant_plugins)
+        # Add technical implementation examples only for this stage if available
+        if self.use_context_learning and self.technical_implementation_examples:
+            prompt_technical_implementation += f"\n\nHere are some example technical implementations:\n{self.technical_implementation_examples}"
+        if self.rag_integration:
+            # Use the already detected plugins instead of detecting again
+            # relevant_plugins = self.relevant_plugins # Removed redundant variable
+            # print(f"Using detected plugins: {relevant_plugins}") # Removed redundant print
+            # Generate RAG queries
+            rag_queries = self.rag_integration._generate_rag_queries_technical(
+                storyboard=vision_storyboard_plan,
+                scene_trace_id=scene_trace_id,
+                topic=topic,
+                scene_number=i,
+                session_id=session_id,
+                relevant_plugins=self.relevant_plugins # Use self.relevant_plugins directly
+            )
+            retrieved_docs = self.rag_integration.get_relevant_docs(
+                rag_queries=rag_queries,
+                scene_trace_id=scene_trace_id,
+                topic=topic,
+                scene_number=i
+            )
+            # Add documentation to prompt
+            prompt_technical_implementation += f"\n\n{retrieved_docs}"
+        technical_implementation_plan = self.planner_model(
+            _prepare_text_inputs(prompt_technical_implementation),
+            metadata={"generation_name": "scene_technical_implementation", "trace_id": scene_trace_id, "tags": [topic, f"scene{i}"], "session_id": session_id}
+        )
+        # extract technical implementation plan <SCENE_TECHNICAL_IMPLEMENTATION_PLAN> ... </SCENE_TECHNICAL_IMPLEMENTATION_PLAN>
+        technical_match = re.search(r'(<SCENE_TECHNICAL_IMPLEMENTATION_PLAN>.*?</SCENE_TECHNICAL_IMPLEMENTATION_PLAN>)', technical_implementation_plan, re.DOTALL)
+        technical_implementation_plan = technical_match.group(1) if technical_match else technical_implementation_plan
+        implementation_plan += technical_implementation_plan + "\n\n"
+        file_path_ti = os.path.join(subplan_dir, f"{file_prefix}_scene{i}_technical_implementation_plan.txt")
+        with open(file_path_ti, "w") as f:
+            f.write(technical_implementation_plan)
+        print(f"Scene {i} Technical Implementation Plan saved to {file_path_ti}")
+        # ===== Step 3: Generate Animation and Narration Plan =====
+        # =========================================================
+        prompt_animation_narration = get_prompt_scene_animation_narration(i, topic, description, scene_outline_i, vision_storyboard_plan, technical_implementation_plan, self.relevant_plugins)
+        # Add animation narration examples only for this stage if available
+        if self.use_context_learning and self.animation_narration_examples:
+            prompt_animation_narration += f"\n\nHere are some example animation and narration plans:\n{self.animation_narration_examples}"
+        if self.rag_integration:
+            rag_queries = self.rag_integration._generate_rag_queries_narration(
+                storyboard=vision_storyboard_plan,
+                scene_trace_id=scene_trace_id,
+                topic=topic,
+                scene_number=i,
+                session_id=session_id,
+                relevant_plugins=self.relevant_plugins # Use self.relevant_plugins directly
+            )
+            retrieved_docs = self.rag_integration.get_relevant_docs(
+                rag_queries=rag_queries,
+                scene_trace_id=scene_trace_id,
+                topic=topic,
+                scene_number=i
+            )
+            prompt_animation_narration += f"\n\n{retrieved_docs}"
+        animation_narration_plan = self.planner_model(
+            _prepare_text_inputs(prompt_animation_narration),
+            metadata={"generation_name": "scene_animation_narration", "trace_id": scene_trace_id, "tags": [topic, f"scene{i}"], "session_id": session_id}
+        )
+        # extract animation narration plan <SCENE_ANIMATION_NARRATION_PLAN> ... </SCENE_ANIMATION_NARRATION_PLAN>
+        animation_match = re.search(r'(<SCENE_ANIMATION_NARRATION_PLAN>.*?</SCENE_ANIMATION_NARRATION_PLAN>)', animation_narration_plan, re.DOTALL)
+        animation_narration_plan = animation_match.group(1) if animation_match else animation_narration_plan
+        implementation_plan += animation_narration_plan + "\n\n"
+        file_path_an = os.path.join(subplan_dir, f"{file_prefix}_scene{i}_animation_narration_plan.txt")
+        with open(file_path_an, "w") as f:
+            f.write(animation_narration_plan)
+        print(f"Scene {i} Animation and Narration Plan saved to {file_path_an}")
+        # ===== Step 4: Save Implementation Plan =====
+        # ==========================================
+        # save the overall implementation plan to file
+        with open(os.path.join(self.output_dir, file_prefix, f"scene{i}", f"{file_prefix}_scene{i}_implementation_plan.txt"), "w") as f:
+            f.write(f"# Scene {i} Implementation Plan\n\n")
+            f.write(implementation_plan)
+        print(f"Scene {i} Implementation Plan saved to {file_path_ti}")
+        return implementation_plan
+    async def generate_scene_implementation(self,
+                                      topic: str,
+                                      description: str,
+                                      plan: str,
+                                      session_id: str) -> List[str]:
+        """Generate detailed implementation plans for all scenes.
+        Args:
+            topic (str): The topic of the video
+            description (str): Description of the video content
+            plan (str): Overall scene plan
+            session_id (str): Session identifier
+        Returns:
+            List[str]: List of implementation plans for each scene
+        """
+        # extract scene outline <SCENE_OUTLINE> ... </SCENE_OUTLINE>
+        scene_outline = re.search(r'(<SCENE_OUTLINE>.*?</SCENE_OUTLINE>)', plan, re.DOTALL).group(1)
+        # check the number of scenes in the outline
+        scene_number = len(re.findall(r'<SCENE_(\d+)>[^<]', scene_outline))
+        # replace all spaces and special characters with underscores for file path compatibility
+        file_prefix = topic.lower()
+        file_prefix = re.sub(r'[^a-z0-9_]+', '_', file_prefix)
+        # generate implementation plan for each scene
+        all_scene_implementation_plans = []
+        tasks = []
+        for i in range(1, scene_number):
+            print(f"Generating implementation plan for scene {i} in topic {topic}")
+            scene_outline_i = re.search(r'(<SCENE_{i}>.*?</SCENE_{i}>)'.format(i=i), scene_outline, re.DOTALL).group(1)
+            scene_trace_id = str(uuid.uuid4())
+            task = asyncio.create_task(self._generate_scene_implementation_single(topic, description, scene_outline_i, i, file_prefix, session_id, scene_trace_id))
+            tasks.append(task)
+        all_scene_implementation_plans = await asyncio.gather(*tasks)
+        return all_scene_implementation_plans
+    async def generate_scene_implementation_concurrently(self,
+                                              topic: str,
+                                              description: str,
+                                              plan: str,
+                                              session_id: str,
+                                              scene_semaphore) -> List[str]:
+        """Generate detailed implementation plans for all scenes concurrently with controlled concurrency.
+        Args:
+            topic (str): The topic of the video
+            description (str): Description of the video content
+            plan (str): Overall scene plan
+            session_id (str): Session identifier
+            scene_semaphore: Semaphore to control concurrent scene generation
+        Returns:
+            List[str]: List of implementation plans for each scene
+        """
+        scene_outline = extract_xml(plan)
+        scene_number = len(re.findall(r'<SCENE_(\d+)>[^<]', scene_outline))
+        file_prefix = re.sub(r'[^a-z0-9_]+', '_', topic.lower())
+        all_scene_implementation_plans = []
+        async def generate_single_scene_implementation(i):
+            async with scene_semaphore:  # controls parallelism
+                print(f"Generating implementation plan for scene {i} in topic {topic}")
+                scene_outline_i = re.search(r'(<SCENE_{i}>.*?</SCENE_{i}>)'.format(i=i), scene_outline, re.DOTALL).group(1)
+                scene_trace_id = str(uuid.uuid4())  # Generate UUID here
+                return await self._generate_scene_implementation_single(topic, description, scene_outline_i, i, file_prefix, session_id, scene_trace_id)
+        tasks = [generate_single_scene_implementation(i + 1) for i in range(scene_number)]
+        all_scene_implementation_plans = await asyncio.gather(*tasks)
+        return all_scene_implementation_plans

src/core/video_renderer.py ADDED Viewed

	@@ -0,0 +1,448 @@

+import os
+import re
+import subprocess
+import asyncio
+from PIL import Image
+from typing import Optional, List
+import traceback
+import sys
+from src.core.parse_video import (
+    get_images_from_video,
+    image_with_most_non_black_space
+)
+from mllm_tools.vertex_ai import VertexAIWrapper
+from mllm_tools.gemini import GeminiWrapper
+class VideoRenderer:
+    """Class for rendering and combining Manim animation videos."""
+    def __init__(self, output_dir="output", print_response=False, use_visual_fix_code=False):
+        """Initialize the VideoRenderer.
+        Args:
+            output_dir (str, optional): Directory for output files. Defaults to "output".
+            print_response (bool, optional): Whether to print responses. Defaults to False.
+            use_visual_fix_code (bool, optional): Whether to use visual fix code. Defaults to False.
+        """
+        self.output_dir = output_dir
+        self.print_response = print_response
+        self.use_visual_fix_code = use_visual_fix_code
+    async def render_scene(self, code: str, file_prefix: str, curr_scene: int, curr_version: int, code_dir: str, media_dir: str, max_retries: int = 3, use_visual_fix_code=False, visual_self_reflection_func=None, banned_reasonings=None, scene_trace_id=None, topic=None, session_id=None):
+        """Render a single scene and handle error retries and visual fixes.
+        Args:
+            code (str): The Manim code to render
+            file_prefix (str): Prefix for output files
+            curr_scene (int): Current scene number
+            curr_version (int): Current version number
+            code_dir (str): Directory for code files
+            media_dir (str): Directory for media output
+            max_retries (int, optional): Maximum retry attempts. Defaults to 3.
+            use_visual_fix_code (bool, optional): Whether to use visual fix code. Defaults to False.
+            visual_self_reflection_func (callable, optional): Function for visual self-reflection. Defaults to None.
+            banned_reasonings (list, optional): List of banned reasoning strings. Defaults to None.
+            scene_trace_id (str, optional): Scene trace identifier. Defaults to None.
+            topic (str, optional): Topic name. Defaults to None.
+            session_id (str, optional): Session identifier. Defaults to None.
+        Returns:
+            tuple: (code, error_message) where error_message is None on success
+        """
+        retries = 0
+        while retries < max_retries:
+            try:
+                # Execute manim in a thread to prevent blocking
+                file_path = os.path.join(code_dir, f"{file_prefix}_scene{curr_scene}_v{curr_version}.py")
+                result = await asyncio.to_thread(
+                    subprocess.run,
+                    ["manim", "-qh", file_path, "--media_dir", media_dir, "--progress_bar", "none"],
+                    capture_output=True,
+                    text=True
+                )
+                # if result.returncode != 0, it means that the code is not rendered successfully
+                # so we need to fix the code by returning the code and the error message
+                if result.returncode != 0:
+                    raise Exception(result.stderr)
+                if use_visual_fix_code and visual_self_reflection_func and banned_reasonings:
+                    # Get the rendered video path
+                    video_path = os.path.join(
+                        media_dir,
+                        "videos",
+                        f"{file_prefix}_scene{curr_scene}_v{curr_version}.mp4"
+                    )
+                    # For Gemini/Vertex AI models, pass the video directly
+                    if self.scene_model.model_name.startswith(('gemini/', 'vertex_ai/')):
+                        media_input = video_path
+                    else:
+                        # For other models, use image snapshot
+                        media_input = self.create_snapshot_scene(
+                            topic, curr_scene, curr_version, return_type="path"
+                        )
+                    new_code, log = visual_self_reflection_func(
+                        code,
+                        media_input,
+                        scene_trace_id=scene_trace_id,
+                        topic=topic,
+                        scene_number=curr_scene,
+                        session_id=session_id
+                    )
+                    with open(os.path.join(code_dir, f"{file_prefix}_scene{curr_scene}_v{curr_version}_vfix_log.txt"), "w") as f:
+                        f.write(log)
+                    # Check for termination markers
+                    if "<LGTM>" in new_code or any(word in new_code for word in banned_reasonings):
+                        break
+                    code = new_code
+                    curr_version += 1
+                    with open(os.path.join(code_dir, f"{file_prefix}_scene{curr_scene}_v{curr_version}.py"), "w") as f:
+                        f.write(code)
+                    print(f"Code saved to scene{curr_scene}/code/{file_prefix}_scene{curr_scene}_v{curr_version}.py")
+                    retries = 0
+                    continue
+                break  # Exit retry loop on success
+            except Exception as e:
+                print(f"Error: {e}")
+                print(f"Retrying {retries+1} of {max_retries}...")
+                with open(os.path.join(code_dir, f"{file_prefix}_scene{curr_scene}_v{curr_version}_error.log"), "a") as f:
+                    f.write(f"\nError in attempt {retries}:\n{str(e)}\n")
+                retries += 1
+                return code, str(e) # Indicate failure and return error message
+        print(f"Successfully rendered {file_path}")
+        with open(os.path.join(self.output_dir, file_prefix, f"scene{curr_scene}", "succ_rendered.txt"), "w") as f:
+            f.write("")
+        return code, None # Indicate success
+    def run_manim_process(self,
+                          topic: str):
+        """Run manim on all generated manim code for a specific topic.
+        Args:
+            topic (str): Topic name to process
+        Returns:
+            subprocess.CompletedProcess: Result of the final manim process
+        """
+        file_prefix = topic.lower()
+        file_prefix = re.sub(r'[^a-z0-9_]+', '_', file_prefix)
+        search_path = os.path.join(self.output_dir, file_prefix)
+        # Iterate through scene folders
+        scene_folders = [f for f in os.listdir(search_path) if os.path.isdir(os.path.join(search_path, f))]
+        scene_folders.sort()  # Sort to process scenes in order
+        for folder in scene_folders:
+            folder_path = os.path.join(search_path, folder)
+            # Get all Python files in version order
+            py_files = [f for f in os.listdir(folder_path) if f.endswith('.py')]
+            py_files.sort(key=lambda x: int(x.split('_v')[-1].split('.')[0]))  # Sort by version number
+            for file in py_files:
+                file_path = os.path.join(folder_path, file)
+                try:
+                    media_dir = os.path.join(self.output_dir, file_prefix, "media")
+                    result = subprocess.run(
+                        f"manim -qh {file_path} --media_dir {media_dir}",
+                        shell=True,
+                        capture_output=True,
+                        text=True
+                    )
+                    if result.returncode != 0:
+                        raise Exception(result.stderr)
+                    print(f"Successfully rendered {file}")
+                    break  # Move to next scene folder if successful
+                except Exception as e:
+                    print(f"Error rendering {file}: {e}")
+                    error_log_path = os.path.join(folder_path, f"{file.split('.')[0]}_error.log") # drop the extra py
+                    with open(error_log_path, "w") as f:
+                        f.write(f"Error:\n{str(e)}\n")
+                    print(f"Error log saved to {error_log_path}")
+        return result
+    def create_snapshot_scene(self, topic: str, scene_number: int, version_number: int, return_type: str = "image"):
+        """Create a snapshot of the video for a specific topic and scene.
+        Args:
+            topic (str): Topic name
+            scene_number (int): Scene number
+            version_number (int): Version number
+            return_type (str, optional): Type of return value - "path" or "image". Defaults to "image".
+        Returns:
+            Union[str, PIL.Image]: Path to saved image or PIL Image object
+        Raises:
+            FileNotFoundError: If no mp4 files found in video folder
+        """
+        file_prefix = topic.lower()
+        file_prefix = re.sub(r'[^a-z0-9_]+', '_', file_prefix)
+        search_path = os.path.join(self.output_dir, file_prefix)
+        video_folder_path = os.path.join(search_path, "media", "videos", f"{file_prefix}_scene{scene_number}_v{version_number}", "1080p60")
+        os.makedirs(video_folder_path, exist_ok=True)
+        snapshot_path = os.path.join(video_folder_path, "snapshot.png")
+        # Get the mp4 video file from the video folder path
+        video_files = [f for f in os.listdir(video_folder_path) if f.endswith('.mp4')]
+        if not video_files:
+            raise FileNotFoundError(f"No mp4 files found in {video_folder_path}")
+        video_path = os.path.join(video_folder_path, video_files[0])
+        saved_image = image_with_most_non_black_space(get_images_from_video(video_path), snapshot_path, return_type=return_type)
+        return saved_image
+    def combine_videos(self, topic: str):
+        """Combine all videos and subtitle files for a specific topic using ffmpeg.
+        Args:
+            topic (str): Topic name to combine videos for
+        This function will:
+        - Find all scene videos and subtitles
+        - Combine videos with or without audio
+        - Merge subtitle files with correct timing
+        - Save combined video and subtitles to output directory
+        """
+        file_prefix = topic.lower()
+        file_prefix = re.sub(r'[^a-z0-9_]+', '_', file_prefix)
+        search_path = os.path.join(self.output_dir, file_prefix, "media", "videos")
+        # Create output directory if it doesn't exist
+        video_output_dir = os.path.join(self.output_dir, file_prefix)
+        os.makedirs(video_output_dir, exist_ok=True)
+        output_video_path = os.path.join(video_output_dir, f"{file_prefix}_combined.mp4")
+        output_srt_path = os.path.join(video_output_dir, f"{file_prefix}_combined.srt")
+        if os.path.exists(output_video_path) and os.path.exists(output_srt_path):
+            print(f"Combined video and subtitles already exist at {output_video_path}, not combining again.")
+            return
+        # Get scene count from outline
+        scene_outline_path = os.path.join(self.output_dir, file_prefix, f"{file_prefix}_scene_outline.txt")
+        if not os.path.exists(scene_outline_path):
+            print(f"Warning: Scene outline file not found at {scene_outline_path}. Cannot determine scene count.")
+            return
+        with open(scene_outline_path) as f:
+            plan = f.read()
+        scene_outline = re.search(r'(<SCENE_OUTLINE>.*?</SCENE_OUTLINE>)', plan, re.DOTALL).group(1)
+        scene_count = len(re.findall(r'<SCENE_(\d+)>[^<]', scene_outline))
+        # Find all scene folders and videos
+        scene_folders = []
+        for root, dirs, files in os.walk(search_path):
+            for dir in dirs:
+                if dir.startswith(file_prefix + "_scene"):
+                    scene_folders.append(os.path.join(root, dir))
+        scene_videos = []
+        scene_subtitles = []
+        for scene_num in range(1, scene_count + 1):
+            folders = [f for f in scene_folders if int(f.split("scene")[-1].split("_")[0]) == scene_num]
+            if not folders:
+                print(f"Warning: Missing scene {scene_num}")
+                continue
+            folders.sort(key=lambda f: int(f.split("_v")[-1]))
+            folder = folders[-1]
+            video_found = False
+            subtitles_found = False
+            for filename in os.listdir(os.path.join(folder, "1080p60")):
+                if filename.endswith('.mp4'):
+                    scene_videos.append(os.path.join(folder, "1080p60", filename))
+                    video_found = True
+                elif filename.endswith('.srt'):
+                    scene_subtitles.append(os.path.join(folder, "1080p60", filename))
+                    subtitles_found = True
+            if not video_found:
+                print(f"Warning: Missing video for scene {scene_num}")
+            if not subtitles_found:
+                scene_subtitles.append(None)
+        if len(scene_videos) != scene_count:
+            print("Not all videos/subtitles are found, aborting video combination.")
+            return
+        try:
+            import ffmpeg # You might need to install ffmpeg-python package: pip install ffmpeg-python
+            from tqdm import tqdm
+            print("Analyzing video streams...")
+            # Check if videos have audio streams
+            has_audio = []
+            for video in tqdm(scene_videos, desc="Checking audio streams"):
+                probe = ffmpeg.probe(video)
+                audio_streams = [stream for stream in probe['streams'] if stream['codec_type'] == 'audio']
+                has_audio.append(len(audio_streams) > 0)
+            print("Preparing video combination...")
+            # If any video has audio, we need to ensure all videos have audio streams
+            if any(has_audio):
+                # Create list to store video and audio streams
+                streams = []
+                for video, has_aud in tqdm(list(zip(scene_videos, has_audio)), desc="Processing videos"):
+                    if has_aud:
+                        # Video has audio, use as is
+                        input_vid = ffmpeg.input(video)
+                        streams.extend([input_vid['v'], input_vid['a']])
+                    else:
+                        # Video lacks audio, add silent audio
+                        input_vid = ffmpeg.input(video)
+                        # Generate silent audio for the duration of the video
+                        probe = ffmpeg.probe(video)
+                        duration = float(probe['streams'][0]['duration'])
+                        silent_audio = ffmpeg.input(f'anullsrc=channel_layout=stereo:sample_rate=44100',
+                                                  f='lavfi', t=duration)['a']
+                        streams.extend([input_vid['v'], silent_audio])
+                print("Combining videos with audio...")
+                try:
+                    # Concatenate all streams using optimized CPU encoding settings
+                    concat = ffmpeg.concat(*streams, v=1, a=1, unsafe=True)
+                    process = (
+                        concat
+                        .output(output_video_path,
+                               **{'c:v': 'libx264',
+                                  'c:a': 'aac',
+                                  'preset': 'veryfast',    # Changed from ultrafast for better speed/quality balance
+                                  'crf': '28',             # Same quality setting
+                                  'threads': '0',          # Use all CPU threads
+                                  'tune': 'fastdecode',    # Optimize for decoding speed
+                                  'profile:v': 'baseline', # Simpler profile for faster encoding
+                                  'level': '4.0',
+                                  'x264-params': 'aq-mode=0:no-deblock:no-cabac:ref=1:subme=0:trellis=0:weightp=0',  # Added aggressive speed optimizations
+                                  'movflags': '+faststart',
+                                  'stats': None,
+                                  'progress': 'pipe:1'})
+                        .overwrite_output()
+                        .run_async(pipe_stdout=True, pipe_stderr=True)
+                    )
+                    # Process progress output
+                    while True:
+                        line = process.stdout.readline().decode('utf-8')
+                        if not line:
+                            break
+                        if 'frame=' in line:
+                            sys.stdout.write('\rProcessing: ' + line.strip())
+                            sys.stdout.flush()
+                    # Wait for the process to complete and capture output
+                    stdout, stderr = process.communicate()
+                    print("\nEncoding complete!")
+                except ffmpeg.Error as e:
+                    print(f"FFmpeg stdout:\n{e.stdout.decode('utf8')}")
+                    print(f"FFmpeg stderr:\n{e.stderr.decode('utf8')}")
+                    raise
+            else:
+                # No videos have audio, concatenate video streams only
+                streams = []
+                for video in tqdm(scene_videos, desc="Processing videos"):
+                    streams.append(ffmpeg.input(video)['v'])
+                print("Combining videos without audio...")
+                try:
+                    concat = ffmpeg.concat(*streams, v=1, unsafe=True)
+                    process = (
+                        concat
+                        .output(output_video_path,
+                               **{'c:v': 'libx264',
+                                  'preset': 'medium',
+                                  'crf': '23',
+                                  'stats': None,  # Enable progress stats
+                                  'progress': 'pipe:1'})  # Output progress to pipe
+                        .overwrite_output()
+                        .run_async(pipe_stdout=True, pipe_stderr=True)
+                    )
+                    # Process progress output
+                    while True:
+                        line = process.stdout.readline().decode('utf-8')
+                        if not line:
+                            break
+                        if 'frame=' in line:
+                            sys.stdout.write('\rProcessing: ' + line.strip())
+                            sys.stdout.flush()
+                    # Wait for the process to complete and capture output
+                    stdout, stderr = process.communicate()
+                    print("\nEncoding complete!")
+                except ffmpeg.Error as e:
+                    print(f"FFmpeg stdout:\n{e.stdout.decode('utf8')}")
+                    print(f"FFmpeg stderr:\n{e.stderr.decode('utf8')}")
+                    raise
+            print(f"Successfully combined videos into {output_video_path}")
+            # Handle subtitle combination (existing subtitle code remains the same)
+            if scene_subtitles:
+                with open(output_srt_path, 'w', encoding='utf-8') as outfile:
+                    current_time_offset = 0
+                    subtitle_index = 1
+                    for srt_file, video_file in zip(scene_subtitles, scene_videos):
+                        if srt_file is None:
+                            continue
+                        with open(srt_file, 'r', encoding='utf-8') as infile:
+                            lines = infile.readlines()
+                            i = 0
+                            while i < len(lines):
+                                line = lines[i].strip()
+                                if line.isdigit():  # Subtitle index
+                                    outfile.write(f"{subtitle_index}\n")
+                                    subtitle_index += 1
+                                    i += 1
+                                    # Time codes line
+                                    time_line = lines[i].strip()
+                                    start_time, end_time = time_line.split(' --> ')
+                                    # Convert time codes and add offset
+                                    def adjust_time(time_str, offset):
+                                        h, m, s = time_str.replace(',', '.').split(':')
+                                        total_seconds = float(h) * 3600 + float(m) * 60 + float(s) + offset
+                                        h = int(total_seconds // 3600)
+                                        m = int((total_seconds % 3600) // 60)
+                                        s = total_seconds % 60
+                                        return f"{h:02d}:{m:02d}:{s:06.3f}".replace('.', ',')
+                                    new_start = adjust_time(start_time, current_time_offset)
+                                    new_end = adjust_time(end_time, current_time_offset)
+                                    outfile.write(f"{new_start} --> {new_end}\n")
+                                    i += 1
+                                    # Subtitle text (could be multiple lines)
+                                    while i < len(lines) and lines[i].strip():
+                                        outfile.write(lines[i])
+                                        i += 1
+                                    outfile.write('\n')
+                                else:
+                                    i += 1
+                        # Update time offset using ffprobe
+                        probe = ffmpeg.probe(video_file)
+                        duration = float(probe['streams'][0]['duration'])
+                        current_time_offset += duration
+            print(f"Successfully combined videos into {output_video_path}")
+            if scene_subtitles:
+                print(f"Successfully combined subtitles into {output_srt_path}")
+        except Exception as e:
+            print(f"Error combining videos and subtitles: {e}")
+            traceback.print_exc()

src/rag/__init__.py ADDED Viewed

File without changes

src/rag/rag_integration.py ADDED Viewed

	@@ -0,0 +1,390 @@

+import os
+import re
+import json
+from typing import List, Dict
+from mllm_tools.utils import _prepare_text_inputs
+from task_generator import (
+    get_prompt_rag_query_generation_fix_error,
+    get_prompt_detect_plugins,
+    get_prompt_rag_query_generation_technical,
+    get_prompt_rag_query_generation_vision_storyboard,
+    get_prompt_rag_query_generation_narration,
+    get_prompt_rag_query_generation_code
+)
+from src.rag.vector_store import RAGVectorStore
+class RAGIntegration:
+    """Class for integrating RAG (Retrieval Augmented Generation) functionality.
+    This class handles RAG integration including plugin detection, query generation,
+    and document retrieval.
+    Args:
+        helper_model: Model used for generating queries and processing text
+        output_dir (str): Directory for output files
+        chroma_db_path (str): Path to ChromaDB
+        manim_docs_path (str): Path to Manim documentation
+        embedding_model (str): Name of embedding model to use
+        use_langfuse (bool, optional): Whether to use Langfuse logging. Defaults to True
+        session_id (str, optional): Session identifier. Defaults to None
+    """
+    def __init__(self, helper_model, output_dir, chroma_db_path, manim_docs_path, embedding_model, use_langfuse=True, session_id=None):
+        self.helper_model = helper_model
+        self.output_dir = output_dir
+        self.manim_docs_path = manim_docs_path
+        self.session_id = session_id
+        self.relevant_plugins = None
+        self.vector_store = RAGVectorStore(
+            chroma_db_path=chroma_db_path,
+            manim_docs_path=manim_docs_path,
+            embedding_model=embedding_model,
+            session_id=self.session_id,
+            use_langfuse=use_langfuse,
+            helper_model=helper_model
+        )
+    def set_relevant_plugins(self, plugins: List[str]) -> None:
+        """Set the relevant plugins for the current video.
+        Args:
+            plugins (List[str]): List of plugin names to set as relevant
+        """
+        self.relevant_plugins = plugins
+    def detect_relevant_plugins(self, topic: str, description: str) -> List[str]:
+        """Detect which plugins might be relevant based on topic and description.
+        Args:
+            topic (str): Topic of the video
+            description (str): Description of the video content
+        Returns:
+            List[str]: List of detected relevant plugin names
+        """
+        # Load plugin descriptions
+        plugins = self._load_plugin_descriptions()
+        if not plugins:
+            return []
+        # Get formatted prompt using the task_generator function
+        prompt = get_prompt_detect_plugins(
+            topic=topic,
+            description=description,
+            plugin_descriptions=json.dumps([{'name': p['name'], 'description': p['description']} for p in plugins], indent=2)
+        )
+        try:
+            response = self.helper_model(
+                _prepare_text_inputs(prompt),
+                metadata={"generation_name": "detect-relevant-plugins", "tags": [topic, "plugin-detection"], "session_id": self.session_id}
+            )
+            # Clean the response to ensure it only contains the JSON array
+            response = re.search(r'```json(.*)```', response, re.DOTALL).group(1)
+            try:
+                relevant_plugins = json.loads(response)
+            except json.JSONDecodeError as e:
+                print(f"JSONDecodeError when parsing relevant plugins: {e}")
+                print(f"Response text was: {response}")
+                return []
+            print(f"LLM detected relevant plugins: {relevant_plugins}")
+            return relevant_plugins
+        except Exception as e:
+            print(f"Error detecting plugins with LLM: {e}")
+            return []
+    def _load_plugin_descriptions(self) -> list:
+        """Load plugin descriptions from JSON file.
+        Returns:
+            list: List of plugin descriptions, empty list if loading fails
+        """
+        try:
+            plugin_config_path = os.path.join(
+                self.manim_docs_path,
+                "plugin_docs",
+                "plugins.json"
+            )
+            if os.path.exists(plugin_config_path):
+                with open(plugin_config_path, "r") as f:
+                    return json.load(f)
+            else:
+                print(f"Plugin descriptions file not found at {plugin_config_path}")
+                return []
+        except Exception as e:
+            print(f"Error loading plugin descriptions: {e}")
+            return []
+    def _generate_rag_queries_storyboard(self, scene_plan: str, scene_trace_id: str = None, topic: str = None, scene_number: int = None, session_id: str = None, relevant_plugins: List[str] = []) -> List[str]:
+        """Generate RAG queries from the scene plan to help create storyboard.
+        Args:
+            scene_plan (str): Scene plan text to generate queries from
+            scene_trace_id (str, optional): Trace identifier for the scene. Defaults to None
+            topic (str, optional): Topic name. Defaults to None
+            scene_number (int, optional): Scene number. Defaults to None
+            session_id (str, optional): Session identifier. Defaults to None
+            relevant_plugins (List[str], optional): List of relevant plugins. Defaults to empty list
+        Returns:
+            List[str]: List of generated RAG queries
+        """
+        cache_key = f"{topic}_scene{scene_number}_storyboard_rag"
+        cache_dir = os.path.join(self.output_dir, re.sub(r'[^a-z0-9_]+', '_', topic.lower()), f"scene{scene_number}", "rag_cache")
+        os.makedirs(cache_dir, exist_ok=True)
+        cache_file = os.path.join(cache_dir, "rag_queries_storyboard.json")
+        if os.path.exists(cache_file):
+            with open(cache_file, 'r') as f:
+                return json.load(f)
+        # Format relevant plugins as a string
+        plugins_str = ", ".join(relevant_plugins) if relevant_plugins else "No plugins are relevant."
+        # Generate the prompt with only the required arguments
+        prompt = get_prompt_rag_query_generation_vision_storyboard(
+            scene_plan=scene_plan,
+            relevant_plugins=plugins_str
+        )
+        queries = self.helper_model(
+            _prepare_text_inputs(prompt),
+            metadata={"generation_name": "rag_query_generation_storyboard", "trace_id": scene_trace_id, "tags": [topic, f"scene{scene_number}"], "session_id": session_id}
+        )
+        # retreive json triple backticks
+        try: # add try-except block to handle potential json decode errors
+            queries = re.search(r'```json(.*)```', queries, re.DOTALL).group(1)
+            queries = json.loads(queries)
+        except json.JSONDecodeError as e:
+            print(f"JSONDecodeError when parsing RAG queries for storyboard: {e}")
+            print(f"Response text was: {queries}")
+            return [] # Return empty list in case of parsing error
+        # Cache the queries
+        with open(cache_file, 'w') as f:
+            json.dump(queries, f)
+        return queries
+    def _generate_rag_queries_technical(self, storyboard: str, scene_trace_id: str = None, topic: str = None, scene_number: int = None, session_id: str = None, relevant_plugins: List[str] = []) -> List[str]:
+        """Generate RAG queries from the storyboard to help create technical implementation.
+        Args:
+            storyboard (str): Storyboard text to generate queries from
+            scene_trace_id (str, optional): Trace identifier for the scene. Defaults to None
+            topic (str, optional): Topic name. Defaults to None
+            scene_number (int, optional): Scene number. Defaults to None
+            session_id (str, optional): Session identifier. Defaults to None
+            relevant_plugins (List[str], optional): List of relevant plugins. Defaults to empty list
+        Returns:
+            List[str]: List of generated RAG queries
+        """
+        cache_key = f"{topic}_scene{scene_number}_technical_rag"
+        cache_dir = os.path.join(self.output_dir, re.sub(r'[^a-z0-9_]+', '_', topic.lower()), f"scene{scene_number}", "rag_cache")
+        os.makedirs(cache_dir, exist_ok=True)
+        cache_file = os.path.join(cache_dir, "rag_queries_technical.json")
+        if os.path.exists(cache_file):
+            with open(cache_file, 'r') as f:
+                return json.load(f)
+        prompt = get_prompt_rag_query_generation_technical(
+            storyboard=storyboard,
+            relevant_plugins=", ".join(relevant_plugins) if relevant_plugins else "No plugins are relevant."
+        )
+        queries = self.helper_model(
+            _prepare_text_inputs(prompt),
+            metadata={"generation_name": "rag_query_generation_technical", "trace_id": scene_trace_id, "tags": [topic, f"scene{scene_number}"], "session_id": session_id}
+        )
+        try: # add try-except block to handle potential json decode errors
+            queries = re.search(r'```json(.*)```', queries, re.DOTALL).group(1)
+            queries = json.loads(queries)
+        except json.JSONDecodeError as e:
+            print(f"JSONDecodeError when parsing RAG queries for technical implementation: {e}")
+            print(f"Response text was: {queries}")
+            return [] # Return empty list in case of parsing error
+        # Cache the queries
+        with open(cache_file, 'w') as f:
+            json.dump(queries, f)
+        return queries
+    def _generate_rag_queries_narration(self, storyboard: str, scene_trace_id: str = None, topic: str = None, scene_number: int = None, session_id: str = None, relevant_plugins: List[str] = []) -> List[str]:
+        """Generate RAG queries from the storyboard to help create narration plan.
+        Args:
+            storyboard (str): Storyboard text to generate queries from
+            scene_trace_id (str, optional): Trace identifier for the scene. Defaults to None
+            topic (str, optional): Topic name. Defaults to None
+            scene_number (int, optional): Scene number. Defaults to None
+            session_id (str, optional): Session identifier. Defaults to None
+            relevant_plugins (List[str], optional): List of relevant plugins. Defaults to empty list
+        Returns:
+            List[str]: List of generated RAG queries
+        """
+        cache_key = f"{topic}_scene{scene_number}_narration_rag"
+        cache_dir = os.path.join(self.output_dir, re.sub(r'[^a-z0-9_]+', '_', topic.lower()), f"scene{scene_number}", "rag_cache")
+        os.makedirs(cache_dir, exist_ok=True)
+        cache_file = os.path.join(cache_dir, "rag_queries_narration.json")
+        if os.path.exists(cache_file):
+            with open(cache_file, 'r') as f:
+                return json.load(f)
+        prompt = get_prompt_rag_query_generation_narration(
+            storyboard=storyboard,
+            relevant_plugins=", ".join(relevant_plugins) if relevant_plugins else "No plugins are relevant."
+        )
+        queries = self.helper_model(
+            _prepare_text_inputs(prompt),
+            metadata={"generation_name": "rag_query_generation_narration", "trace_id": scene_trace_id, "tags": [topic, f"scene{scene_number}"], "session_id": session_id}
+        )
+        try: # add try-except block to handle potential json decode errors
+            queries = re.search(r'```json(.*)```', queries, re.DOTALL).group(1)
+            queries = json.loads(queries)
+        except json.JSONDecodeError as e:
+            print(f"JSONDecodeError when parsing narration RAG queries: {e}")
+            print(f"Response text was: {queries}")
+            return [] # Return empty list in case of parsing error
+        # Cache the queries
+        with open(cache_file, 'w') as f:
+            json.dump(queries, f)
+        return queries
+    def get_relevant_docs(self, rag_queries: List[Dict], scene_trace_id: str, topic: str, scene_number: int) -> List[str]:
+        """Get relevant documentation using the vector store.
+        Args:
+            rag_queries (List[Dict]): List of RAG queries to search for
+            scene_trace_id (str): Trace identifier for the scene
+            topic (str): Topic name
+            scene_number (int): Scene number
+        Returns:
+            List[str]: List of relevant documentation snippets
+        """
+        return self.vector_store.find_relevant_docs(
+            queries=rag_queries,
+            k=2,
+            trace_id=scene_trace_id,
+            topic=topic,
+            scene_number=scene_number
+        )
+    def _generate_rag_queries_code(self, implementation_plan: str, scene_trace_id: str = None, topic: str = None, scene_number: int = None, relevant_plugins: List[str] = None) -> List[str]:
+        """Generate RAG queries from implementation plan.
+        Args:
+            implementation_plan (str): Implementation plan text to generate queries from
+            scene_trace_id (str, optional): Trace identifier for the scene. Defaults to None
+            topic (str, optional): Topic name. Defaults to None
+            scene_number (int, optional): Scene number. Defaults to None
+            relevant_plugins (List[str], optional): List of relevant plugins. Defaults to None
+        Returns:
+            List[str]: List of generated RAG queries
+        """
+        cache_key = f"{topic}_scene{scene_number}"
+        cache_dir = os.path.join(self.output_dir, re.sub(r'[^a-z0-9_]+', '_', topic.lower()), f"scene{scene_number}", "rag_cache")
+        os.makedirs(cache_dir, exist_ok=True)
+        cache_file = os.path.join(cache_dir, "rag_queries_code.json")
+        if os.path.exists(cache_file):
+            with open(cache_file, 'r') as f:
+                return json.load(f)
+        prompt = get_prompt_rag_query_generation_code(
+            implementation_plan=implementation_plan,
+            relevant_plugins=", ".join(relevant_plugins) if relevant_plugins else "No plugins are relevant."
+        )
+        try:
+            response = self.helper_model(
+                _prepare_text_inputs(prompt),
+                metadata={"generation_name": "rag_query_generation_code", "trace_id": scene_trace_id, "tags": [topic, f"scene{scene_number}"], "session_id": self.session_id}
+            )
+            # Clean and parse response
+            response = re.search(r'```json(.*)```', response, re.DOTALL).group(1)
+            queries = json.loads(response)
+            # Cache the queries
+            with open(cache_file, 'w') as f:
+                json.dump(queries, f)
+            return queries
+        except Exception as e:
+            print(f"Error generating RAG queries: {e}")
+            return []
+    def _generate_rag_queries_error_fix(self, error: str, code: str, scene_trace_id: str = None, topic: str = None, scene_number: int = None, session_id: str = None) -> List[str]:
+        """Generate RAG queries for fixing code errors.
+        Args:
+            error (str): Error message to generate queries from
+            code (str): Code containing the error
+            scene_trace_id (str, optional): Trace identifier for the scene. Defaults to None
+            topic (str, optional): Topic name. Defaults to None
+            scene_number (int, optional): Scene number. Defaults to None
+            session_id (str, optional): Session identifier. Defaults to None
+        Returns:
+            List[str]: List of generated RAG queries
+        """
+        if self.relevant_plugins is None:
+            print("Warning: No plugins have been detected yet")
+            plugins_str = "No plugins are relevant."
+        else:
+            plugins_str = ", ".join(self.relevant_plugins) if self.relevant_plugins else "No plugins are relevant."
+        cache_key = f"{topic}_scene{scene_number}_error_fix"
+        cache_dir = os.path.join(self.output_dir, re.sub(r'[^a-z0-9_]+', '_', topic.lower()), f"scene{scene_number}", "rag_cache")
+        os.makedirs(cache_dir, exist_ok=True)
+        cache_file = os.path.join(cache_dir, "rag_queries_error_fix.json")
+        if os.path.exists(cache_file):
+            with open(cache_file, 'r') as f:
+                cached_queries = json.load(f)
+                print(f"Using cached RAG queries for error fix in {cache_key}")
+                return cached_queries
+        prompt = get_prompt_rag_query_generation_fix_error(
+            error=error,
+            code=code,
+            relevant_plugins=plugins_str
+        )
+        queries = self.helper_model(
+            _prepare_text_inputs(prompt),
+            metadata={"generation_name": "rag-query-generation-fix-error", "trace_id": scene_trace_id, "tags": [topic, f"scene{scene_number}"], "session_id": session_id}
+        )
+        try:
+            # retrieve json triple backticks
+            queries = re.search(r'```json(.*)```', queries, re.DOTALL).group(1)
+            queries = json.loads(queries)
+        except json.JSONDecodeError as e:
+            print(f"JSONDecodeError when parsing RAG queries for error fix: {e}")
+            print(f"Response text was: {queries}")
+            return []
+        # Cache the queries
+        with open(cache_file, 'w') as f:
+            json.dump(queries, f)
+        return queries