diff --git a/.env.template b/.env.template
new file mode 100644
index 0000000000000000000000000000000000000000..8053266bc1811cd1fd1573b7b77c7dee7037b140
--- /dev/null
+++ b/.env.template
@@ -0,0 +1,33 @@
+# OpenAI
+OPENAI_API_KEY=""
+
+# Azure OpenAI
+AZURE_API_KEY=""
+AZURE_API_BASE=""
+AZURE_API_VERSION=""
+
+# Google Vertex AI
+VERTEXAI_PROJECT=""
+VERTEXAI_LOCATION=""
+GOOGLE_APPLICATION_CREDENTIALS=""
+
+# Google Gemini
+GEMINI_API_KEY=""
+
+# AWS Bedrock / S3
+AWS_ACCESS_KEY_ID=""
+AWS_SECRET_ACCESS_KEY=""
+AWS_REGION_NAME=""
+AWS_S3_BUCKET=""
+
+# Langfuse
+LANGFUSE_PUBLIC_KEY=""
+LANGFUSE_SECRET_KEY=""
+LANGFUSE_HOST=""
+
+# Kokoro TTS Settings
+KOKORO_MODEL_PATH="models/kokoro-v0_19.onnx"
+KOKORO_VOICES_PATH="models/voices.bin"
+KOKORO_DEFAULT_VOICE="af"
+KOKORO_DEFAULT_SPEED="1.0"
+KOKORO_DEFAULT_LANG="en-us"
\ No newline at end of file
diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md
new file mode 100644
index 0000000000000000000000000000000000000000..2fcf1c6d005f3836a3ac8223741894a34180e3ab
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/bug_report.md
@@ -0,0 +1,32 @@
+---
+name: Bug report
+about: Create a report to help us improve
+title: ''
+labels: ''
+assignees: ''
+
+---
+
+**Describe the bug**
+A clear and concise description of what the bug is.
+
+**To Reproduce**
+Steps to reproduce the behavior:
+1. Go to '...'
+2. Click on '....'
+3. Scroll down to '....'
+4. See error
+
+**Expected behavior**
+A clear and concise description of what you expected to happen.
+
+**Screenshots**
+If applicable, add screenshots to help explain your problem.
+
+**Desktop (please complete the following information):**
+ - OS: [e.g. iOS]
+ - Browser [e.g. chrome, safari]
+ - Version [e.g. 22]
+
+**Additional context**
+Add any other context about the problem here.
diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md
new file mode 100644
index 0000000000000000000000000000000000000000..bbcbbe7d61558adde3cbfd0c7a63a67c27ed6d30
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/feature_request.md
@@ -0,0 +1,20 @@
+---
+name: Feature request
+about: Suggest an idea for this project
+title: ''
+labels: ''
+assignees: ''
+
+---
+
+**Is your feature request related to a problem? Please describe.**
+A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
+
+**Describe the solution you'd like**
+A clear and concise description of what you want to happen.
+
+**Describe alternatives you've considered**
+A clear and concise description of any alternative solutions or features you've considered.
+
+**Additional context**
+Add any other context or screenshots about the feature request here.
diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000000000000000000000000000000000000..107138e57f15a2096340c674ef582480391c653e
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,3 @@
+**/__pycache__/
+
+.env
diff --git a/.specstory/history/.what-is-this.md b/.specstory/history/.what-is-this.md
new file mode 100644
index 0000000000000000000000000000000000000000..1b65ce595bbf44b1de91162ba950e5c5c6641f85
--- /dev/null
+++ b/.specstory/history/.what-is-this.md
@@ -0,0 +1,65 @@
+
+# SpecStory Artifacts Directory
+
+This directory is automatically created and maintained by the SpecStory extension to preserve your Cursor composer and chat history.
+
+## What's Here?
+
+- `.specstory/history`: Contains markdown files of your AI coding sessions
+ - Each file represents a separate chat or composer session
+ - Files are automatically updated as you work
+- `.specstory/cursor_rules_backups`: Contains backups of the `.cursor/rules/derived-cursor-rules.mdc` file
+ - Backups are automatically created each time the `.cursor/rules/derived-cursor-rules.mdc` file is updated
+ - You can enable/disable the Cursor Rules feature in the SpecStory settings
+
+## Valuable Uses
+
+- Capture: Keep your context window up-to-date when starting new Chat/Composer sessions via @ references
+- Search: For previous prompts and code snippets
+- Learn: Meta-analyze your patterns and learn from your past experiences
+- Derive: Keep Cursor on course with your past decisions by automatically deriving Cursor rules from your AI interactions
+
+## Version Control
+
+We recommend keeping this directory under version control to maintain a history of your AI interactions. However, if you prefer not to version these files, you can exclude them by adding this to your `.gitignore`:
+
+```
+.specstory
+```
+
+We recommend not keeping the `.specstory/cursor_rules_backups` directory under version control if you are already using git to version the `.cursor/rules` directory, and committing regularly. You can exclude it by adding this to your `.gitignore`:
+
+```
+.specstory/cursor_rules_backups
+```
+
+## Searching Your Codebase
+
+When searching your codebase in Cursor, search results may include your previous AI coding interactions. To focus solely on your actual code files, you can exclude the AI interaction history from search results.
+
+To exclude AI interaction history:
+
+1. Open the "Find in Files" search in Cursor (Cmd/Ctrl + Shift + F)
+2. Navigate to the "files to exclude" section
+3. Add the following pattern:
+
+```
+.specstory/*
+```
+
+This will ensure your searches only return results from your working codebase files.
+
+## Notes
+
+- Auto-save only works when Cursor/sqlite flushes data to disk. This results in a small delay after the AI response is complete before SpecStory can save the history.
+- Auto-save does not yet work on remote WSL workspaces.
+
+## Settings
+
+You can control auto-saving behavior in Cursor:
+
+1. Open Cursor → Settings → VS Code Settings (Cmd/Ctrl + ,)
+2. Search for "SpecStory"
+3. Find "Auto Save" setting to enable/disable
+
+Auto-save occurs when changes are detected in Cursor's sqlite database, or every 2 minutes as a safety net.
\ No newline at end of file
diff --git a/Dockerfile b/Dockerfile
new file mode 100644
index 0000000000000000000000000000000000000000..62340f0500edb1c846a72dad5b854e51acd9d786
--- /dev/null
+++ b/Dockerfile
@@ -0,0 +1,39 @@
+# Start with a Python base image
+FROM python:3.11-slim
+
+# Set working directory
+WORKDIR /app
+
+# Install system dependencies for Manim
+# This is a large installation and will take time
+RUN apt-get update && apt-get install -y --no-install-recommends \
+ ffmpeg \
+ texlive-full \
+ pango1.0-tools \
+ libcairo2-dev \
+ libjpeg-dev \
+ libgif-dev \
+ libpango1.0-dev \
+ libsdl-pango-dev \
+ portaudio19-dev \
+ git \
+ && rm -rf /var/lib/apt/lists/*
+
+# Copy the entire project into the container
+COPY . .
+
+# Install Python requirements
+# Manim is included in requirements.txt
+RUN pip install --no-cache-dir -r requirements.txt
+
+# Download Kokoro TTS models during the build process
+RUN mkdir -p models && \
+ wget -P models https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/kokoro-v0_19.onnx && \
+ wget -P models https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/voices.bin
+
+# Expose the port the API will run on (e.g., 7860 for Gradio/FastAPI)
+EXPOSE 7860
+
+# Command to run the application
+# We will use Gradio to create the UI endpoint
+CMD ["python", "app.py"]
\ No newline at end of file
diff --git a/LICENSE b/LICENSE
new file mode 100644
index 0000000000000000000000000000000000000000..297addaef1a30c8491b973e9eaf221e1d8ea8a28
--- /dev/null
+++ b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2025 TIGER Lab
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/README.md b/README.md
index 290bc457dfefc6b81bc53c120ecd6825d7f7b750..604814ed60efab86a0b2533f2920a15cbbcf5786 100644
--- a/README.md
+++ b/README.md
@@ -1,14 +1,359 @@
----
-title: TheoremExplainAgent
-emoji: 😻
-colorFrom: pink
-colorTo: red
-sdk: gradio
-sdk_version: 5.33.2
-app_file: app.py
-pinned: false
-license: mit
-short_description: TheoremExplainAgent
----
-
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
+# TheoremExplainAgent (TEA) 🍵
+[](https://arxiv.org/abs/2502.19400)
+
+
+[**🌐 Homepage**](https://tiger-ai-lab.github.io/TheoremExplainAgent/) | [**📖 arXiv**](https://arxiv.org/abs/2502.19400) | [**🤗 HuggingFace Dataset**](https://huggingface.co/datasets/TIGER-Lab/TheoremExplainBench) | [🎥Video Data](https://drive.google.com/file/d/18kmzXvbxaFGyJw-g51jnq9m93v_ez4aJ/view)
+
+[](https://github.com/TIGER-AI-Lab/TheoremExplainAgent/graphs/contributors)
+[](https://github.com/TIGER-AI-Lab/TheoremExplainAgent/blob/main/LICENSE)
+[](https://github.com/TIGER-AI-Lab/TheoremExplainAgent)
+[](https://hits.seeyoufarm.com)
+
+This repo contains the codebase for our paper [TheoremExplainAgent: Towards Video-based Multimodal Explanations for LLM Theorem Understanding](https://arxiv.org/abs/2502.19400)
+
+**ACL 2025 main**
+
+## Introduction
+TheoremExplainAgent is an AI system that generates long-form Manim videos to visually explain theorems, proving its deep understanding while uncovering reasoning flaws that text alone often hides.
+
+
+
+https://github.com/user-attachments/assets/17f2f4f2-8f2c-4abc-b377-ac92ebda69f3
+
+
+## 📰 News
+* 2025 Jun 8: We released our generated video data for researchers to serve as baselines.
+* 2025 May 15: Paper accepted to ACL 2025 main conference.
+* 2025 Mar 3: Generation code and Evaluation code released. Thanks for the wait!
+
+* 2025 Feb 27: Paper available on [Arxiv](https://arxiv.org/abs/2502.19400). Thanks AK for putting our paper on [HF Daily](https://huggingface.co/papers/2502.19400).
+
+## Downloading Generated Video Data
+Skip this section if you just want to try out the code.
+If you are researchers who just need the baseline videos as baseline comparison, download it here:
+```shell
+wget --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=18kmzXvbxaFGyJw-g51jnq9m93v_ez4aJ' -O /tmp/gdrive.html && wget --load-cookies /tmp/cookies.txt -O baseline_videos.zip "https://drive.usercontent.google.com/download?id=18kmzXvbxaFGyJw-g51jnq9m93v_ez4aJ&export=download&confirm=$(sed -rn 's/.*name="confirm" value="([^"]+)".*/\\1/p' /tmp/gdrive.html)&uuid=$(sed -rn 's/.*name="uuid" value="([^"]+)".*/\\1/p' /tmp/gdrive.html)" && rm /tmp/gdrive.html /tmp/cookies.txt
+```
+
+## Installation
+
+> **Look at the [FAQ section in this README doc](https://github.com/TIGER-AI-Lab/TheoremExplainAgent?tab=readme-ov-file#-faq) if you encountered any errors. If that didnt help, create a issue**
+
+1. Setting up conda environment
+```shell
+conda create --name tea python=3.12.8
+conda activate tea
+pip install -r requirements.txt
+```
+
+2. You may also need to install latex and other dependencies for Manim Community. Look at [Manim Installation Docs](https://docs.manim.community/en/stable/installation.html) for more details.
+```shell
+# You might need these dependencies if you are using Linux Ubuntu:
+sudo apt-get install portaudio19-dev
+sudo apt-get install libsdl-pango-dev
+```
+
+3. Then Download the Kokoro model and voices using the commands to enable TTS service.
+
+```shell
+mkdir -p models && wget -P models https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/kokoro-v0_19.onnx && wget -P models https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/voices.bin
+```
+
+4. Create `.env` based on `.env.template`, filling in the environmental variables according to the models you choose to use.
+See [LiteLLM](https://docs.litellm.ai/docs/providers) for reference.
+
+```shell
+touch .env
+```
+Then open the `.env` file and edit it with whatever text editor you like.
+
+Your `.env` file should look like the following:
+```shell
+# OpenAI
+OPENAI_API_KEY=""
+
+# Azure OpenAI
+AZURE_API_KEY=""
+AZURE_API_BASE=""
+AZURE_API_VERSION=""
+
+# Google Vertex AI
+VERTEXAI_PROJECT=""
+VERTEXAI_LOCATION=""
+GOOGLE_APPLICATION_CREDENTIALS=""
+
+# Google Gemini
+GEMINI_API_KEY=""
+
+...
+
+# Kokoro TTS Settings
+KOKORO_MODEL_PATH="models/kokoro-v0_19.onnx"
+KOKORO_VOICES_PATH="models/voices.bin"
+KOKORO_DEFAULT_VOICE="af"
+KOKORO_DEFAULT_SPEED="1.0"
+KOKORO_DEFAULT_LANG="en-us"
+```
+Fill in the API keys according to the model you wanted to use.
+
+5. Configure Python path. Note that you need to configure the python path to make it work. Otherwise you may encounter import issues (like not being able to import src etc.)
+```shell
+export PYTHONPATH=$(pwd):$PYTHONPATH
+```
+
+6. (Optional) To setup RAG, See [https://github.com/TIGER-AI-Lab/TheoremExplainAgent?tab=readme-ov-file#generation-with-rag](https://github.com/TIGER-AI-Lab/TheoremExplainAgent?tab=readme-ov-file#generation-with-rag).
+
+> **Look at the [FAQ section in this README doc](https://github.com/TIGER-AI-Lab/TheoremExplainAgent?tab=readme-ov-file#-faq) if you encountered any errors. If that didnt help, create a issue**
+
+## Generation
+
+### Supported Models
+
+The model naming follows the LiteLLM convention. For details on how models should be named, please refer to the [LiteLLM documentation](https://docs.litellm.ai/docs/providers).
+
+### Generation (Single topic)
+```shell
+python generate_video.py \
+ --model "openai/o3-mini" \
+ --helper_model "openai/o3-mini" \
+ --output_dir "output/your_exp_name" \
+ --topic "your_topic" \
+ --context "description of your topic, e.g. 'This is a topic about the properties of a triangle'" \
+```
+
+Example:
+```shell
+python generate_video.py \
+ --model "openai/o3-mini" \
+ --helper_model "openai/o3-mini" \
+ --output_dir "output/my_exp_name" \
+ --topic "Big O notation" \
+ --context "most common type of asymptotic notation in computer science used to measure worst case complexity" \
+```
+
+### Generation (in batch)
+```shell
+python generate_video.py \
+ --model "openai/o3-mini" \
+ --helper_model "openai/o3-mini" \
+ --output_dir "output/my_exp_name" \
+ --theorems_path data/thb_easy/math.json \
+ --max_scene_concurrency 7 \
+ --max_topic_concurrency 20 \
+```
+
+### Generation with RAG
+Before using RAG, download the RAG documentation from this [Google Drive link](https://drive.google.com/file/d/1Tn6J_JKVefFZRgZbjns93KLBtI9ullRv/view?usp=sharing). After downloading, unzip the file. For example, if you unzip it to `data/rag/manim_docs`, then you should set `--manim_docs_path` to `data/rag/manim_docs`. The vector database will be created the first time you run with RAG.
+
+```shell
+python generate_video.py \
+ --model "openai/o3-mini" \
+ --helper_model "openai/o3-mini" \
+ --output_dir "output/with_rag/o3-mini/vtutorbench_easy/math" \
+ --topic "Big O notation" \
+ --context "most common type of asymptotic notation in computer science used to measure worst case complexity" \
+ --use_rag \
+ --chroma_db_path "data/rag/chroma_db" \
+ --manim_docs_path "data/rag/manim_docs" \
+ --embedding_model "vertex_ai/text-embedding-005"
+```
+
+We support more options for generation, see below for more details:
+```shell
+usage: generate_video.py [-h]
+ [--model]
+ [--topic TOPIC] [--context CONTEXT]
+ [--helper_model]
+ [--only_gen_vid] [--only_combine] [--peek_existing_videos] [--output_dir OUTPUT_DIR] [--theorems_path THEOREMS_PATH]
+ [--sample_size SAMPLE_SIZE] [--verbose] [--max_retries MAX_RETRIES] [--use_rag] [--use_visual_fix_code]
+ [--chroma_db_path CHROMA_DB_PATH] [--manim_docs_path MANIM_DOCS_PATH]
+ [--embedding_model {azure/text-embedding-3-large,vertex_ai/text-embedding-005}] [--use_context_learning]
+ [--context_learning_path CONTEXT_LEARNING_PATH] [--use_langfuse] [--max_scene_concurrency MAX_SCENE_CONCURRENCY]
+ [--max_topic_concurrency MAX_TOPIC_CONCURRENCY] [--debug_combine_topic DEBUG_COMBINE_TOPIC] [--only_plan] [--check_status]
+ [--only_render] [--scenes SCENES [SCENES ...]]
+
+Generate Manim videos using AI
+
+options:
+ -h, --help show this help message and exit
+ --model Select the AI model to use
+ --topic TOPIC Topic to generate videos for
+ --context CONTEXT Context of the topic
+ --helper_model Select the helper model to use
+ --only_gen_vid Only generate videos to existing plans
+ --only_combine Only combine videos
+ --peek_existing_videos, --peek
+ Peek at existing videos
+ --output_dir OUTPUT_DIR
+ Output directory
+ --theorems_path THEOREMS_PATH
+ Path to theorems json file
+ --sample_size SAMPLE_SIZE, --sample SAMPLE_SIZE
+ Number of theorems to sample
+ --verbose Print verbose output
+ --max_retries MAX_RETRIES
+ Maximum number of retries for code generation
+ --use_rag, --rag Use Retrieval Augmented Generation
+ --use_visual_fix_code, --visual_fix_code
+ Use VLM to fix code with rendered visuals
+ --chroma_db_path CHROMA_DB_PATH
+ Path to Chroma DB
+ --manim_docs_path MANIM_DOCS_PATH
+ Path to manim docs
+ --embedding_model {azure/text-embedding-3-large,vertex_ai/text-embedding-005}
+ Select the embedding model to use
+ --use_context_learning
+ Use context learning with example Manim code
+ --context_learning_path CONTEXT_LEARNING_PATH
+ Path to context learning examples
+ --use_langfuse Enable Langfuse logging
+ --max_scene_concurrency MAX_SCENE_CONCURRENCY
+ Maximum number of scenes to process concurrently
+ --max_topic_concurrency MAX_TOPIC_CONCURRENCY
+ Maximum number of topics to process concurrently
+ --debug_combine_topic DEBUG_COMBINE_TOPIC
+ Debug combine videos
+ --only_plan Only generate scene outline and implementation plans
+ --check_status Check planning and code status for all theorems
+ --only_render Only render scenes without combining videos
+ --scenes SCENES [SCENES ...]
+ Specific scenes to process (if theorems_path is provided)
+```
+
+## Evaluation
+Note that Gemini and GPT4o is required for evaluation.
+
+Currently, evaluation requires a video file and a subtitle file (SRT format).
+
+Video evaluation:
+```shell
+usage: evaluate.py [-h]
+ [--model_text {gemini/gemini-1.5-pro-002,gemini/gemini-1.5-flash-002,gemini/gemini-2.0-flash-001,vertex_ai/gemini-1.5-flash-002,vertex_ai/gemini-1.5-pro-002,vertex_ai/gemini-2.0-flash-001,openai/o3-mini,gpt-4o,azure/gpt-4o,azure/gpt-4o-mini,bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0,bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0,bedrock/anthropic.claude-3-5-haiku-20241022-v1:0,bedrock/us.anthropic.claude-3-7-sonnet-20250219-v1:0}]
+ [--model_video {gemini/gemini-1.5-pro-002,gemini/gemini-2.0-flash-exp,gemini/gemini-2.0-pro-exp-02-05}]
+ [--model_image {gemini/gemini-1.5-pro-002,gemini/gemini-1.5-flash-002,gemini/gemini-2.0-flash-001,vertex_ai/gemini-1.5-flash-002,vertex_ai/gemini-1.5-pro-002,vertex_ai/gemini-2.0-flash-001,openai/o3-mini,gpt-4o,azure/gpt-4o,azure/gpt-4o-mini,bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0,bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0,bedrock/anthropic.claude-3-5-haiku-20241022-v1:0,bedrock/us.anthropic.claude-3-7-sonnet-20250219-v1:0}]
+ [--eval_type {text,video,image,all}] --file_path FILE_PATH --output_folder OUTPUT_FOLDER [--retry_limit RETRY_LIMIT] [--combine] [--bulk_evaluate] [--target_fps TARGET_FPS]
+ [--use_parent_folder_as_topic] [--max_workers MAX_WORKERS]
+
+Automatic evaluation of theorem explanation videos with LLMs
+
+options:
+ -h, --help show this help message and exit
+ --model_text {gemini/gemini-1.5-pro-002,gemini/gemini-1.5-flash-002,gemini/gemini-2.0-flash-001,vertex_ai/gemini-1.5-flash-002,vertex_ai/gemini-1.5-pro-002,vertex_ai/gemini-2.0-flash-001,openai/o3-mini,gpt-4o,azure/gpt-4o,azure/gpt-4o-mini,bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0,bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0,bedrock/anthropic.claude-3-5-haiku-20241022-v1:0,bedrock/us.anthropic.claude-3-7-sonnet-20250219-v1:0}
+ Select the AI model to use for text evaluation
+ --model_video {gemini/gemini-1.5-pro-002,gemini/gemini-2.0-flash-exp,gemini/gemini-2.0-pro-exp-02-05}
+ Select the AI model to use for video evaluation
+ --model_image {gemini/gemini-1.5-pro-002,gemini/gemini-1.5-flash-002,gemini/gemini-2.0-flash-001,vertex_ai/gemini-1.5-flash-002,vertex_ai/gemini-1.5-pro-002,vertex_ai/gemini-2.0-flash-001,openai/o3-mini,gpt-4o,azure/gpt-4o,azure/gpt-4o-mini,bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0,bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0,bedrock/anthropic.claude-3-5-haiku-20241022-v1:0,bedrock/us.anthropic.claude-3-7-sonnet-20250219-v1:0}
+ Select the AI model to use for image evaluation
+ --eval_type {text,video,image,all}
+ Type of evaluation to perform
+ --file_path FILE_PATH
+ Path to a file or a theorem folder
+ --output_folder OUTPUT_FOLDER
+ Directory to store the evaluation files
+ --retry_limit RETRY_LIMIT
+ Number of retry attempts for each inference
+ --combine Combine all results into a single JSON file
+ --bulk_evaluate Evaluate a folder of theorems together
+ --target_fps TARGET_FPS
+ Target FPS for video processing. If not set, original video FPS will be used
+ --use_parent_folder_as_topic
+ Use parent folder name as topic name for single file evaluation
+ --max_workers MAX_WORKERS
+ Maximum number of concurrent workers for parallel processing
+```
+* For `file_path`, it is recommended to pass a folder containing both an MP4 file and an SRT file.
+
+## Misc: Modify the system prompt in TheoremExplainAgent
+
+If you want to modify the system prompt, you need to:
+
+1. Modify files in `task_generator/prompts_raw` folder.
+2. Run `task_generator/parse_prompt.py` to rebuild the `__init__.py` file.
+
+```python
+cd task_generator
+python parse_prompt.py
+cd ..
+```
+
+## TheoremExplainBench (TEB)
+
+TheoremExplainBench can be found on https://huggingface.co/datasets/TIGER-Lab/TheoremExplainBench.
+
+How to use:
+```python
+import datasets
+dataset = datasets.load_dataset("TIGER-Lab/TheoremExplainBench")
+```
+
+Dataset info:
+```shell
+DatasetDict({
+ train: Dataset({
+ features: ['uid', 'subject', 'difficulty', 'theorem', 'description', 'subfield'],
+ num_rows: 240
+ })
+})
+```
+
+## ❓ FAQ
+
+The FAQ should cover the most common errors you could encounter. If you see something new, report it on issues.
+
+Q: Error `src.utils.kokoro_voiceover import KokoroService # You MUST import like this as this is our custom voiceover service. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ModuleNotFoundError: No module named 'src'`.
+A: Please run `export PYTHONPATH=$(pwd):$PYTHONPATH` when you start a new terminal.
+
+Q: Error `Files not found`
+A: Check your Manim installation.
+
+Q: Error `latex ...`
+A: Check your latex installation.
+
+Q: The output log is not showing response?
+A: It could be API-related issues. Make sure your `.env` file is properly configured (fill in your API keys), or you can enable litellm debug mode to figure out the issues.
+
+Q: Plans / Scenes are missing?
+A: It could be API-related issues. Make sure your `.env` file is properly configured (fill in your API keys), or you can enable litellm debug mode to figure out the issues.
+
+
+## 🖊️ Citation
+
+Please kindly cite our paper if you use our code, data, models or results:
+```bibtex
+@misc{ku2025theoremexplainagentmultimodalexplanationsllm,
+ title={TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding},
+ author={Max Ku and Thomas Chong and Jonathan Leung and Krish Shah and Alvin Yu and Wenhu Chen},
+ year={2025},
+ eprint={2502.19400},
+ archivePrefix={arXiv},
+ primaryClass={cs.AI},
+ url={https://arxiv.org/abs/2502.19400},
+}
+```
+
+## 🎫 License
+
+This project is released under the [the MIT License](LICENSE).
+
+## ⭐ Star History
+
+[](https://star-history.com/#TIGER-AI-Lab/TheoremExplainAgent&Date)
+
+## 💞 Acknowledgements
+
+We want to thank [Votee AI](https://votee.ai/) for sponsoring API keys to access the close-sourced models.
+
+The code is built upon the below repositories, we thank all the contributors for open-sourcing.
+* [Manim Community](https://www.manim.community/)
+* [kokoro-manim-voiceover](https://github.com/xposed73/kokoro-manim-voiceover)
+* [manim-physics](https://github.com/Matheart/manim-physics)
+* [manim-Chemistry](https://github.com/UnMolDeQuimica/manim-Chemistry)
+* [ManimML](https://github.com/helblazer811/ManimML)
+* [manim-dsa](https://github.com/F4bbi/manim-dsa)
+* [manim-circuit](https://github.com/Mr-FuzzyPenguin/manim-circuit)
+
+## 🚨 Disclaimer
+
+**This work is intended for research purposes only. The authors do not encourage or endorse the use of this codebase for commercial applications. The code is provided "as is" without any warranties, and users assume all responsibility for its use.**
+
+Tested Environment: MacOS, Linux
diff --git a/app.py b/app.py
index 04cc31aa8d0e06aeaac3b59bb361ed71d831e43f..c7846d86e398870d8e6cf25d0bb83e00b3d3d18b 100644
--- a/app.py
+++ b/app.py
@@ -1,7 +1,167 @@
import gradio as gr
+import uuid
+import subprocess
+import threading
+import os
+import time
+from fastapi import FastAPI
+from fastapi.responses import FileResponse
+import asyncio
-def greet(name):
- return "Hello " + name + "!!"
-demo = gr.Interface(fn=greet, inputs="text", outputs="text")
-demo.launch()
+# A simple in-memory dictionary to track task status.
+# For a production system, you'd use a database or Redis.
+tasks = {}
+
+def run_video_generation(task_id: str, topic: str, context: str):
+ """
+ This function runs the main generation script in a separate process.
+ """
+ tasks[task_id]['status'] = 'running'
+
+ # Sanitize topic to create a valid directory name
+ file_prefix = "".join(c if c.isalnum() else "_" for c in topic.lower())
+ output_dir = os.path.join("output", file_prefix)
+
+ command = [
+ "python", "generate_video.py",
+ "--model", "openai/o3-mini", # Or get from request
+ "--topic", topic,
+ "--context", context,
+ "--output_dir", "output",
+ "--use_langfuse" # Assuming you have secrets set
+ ]
+
+ try:
+ # Using subprocess to run the existing script
+ process = subprocess.run(command, check=True, capture_output=True, text=True)
+
+ # Assume the final video is named based on the topic
+ # Note: The actual video path might differ. This is an assumption.
+ # You may need to parse the stdout from generate_video.py to get the exact path.
+ video_path = None
+ for file in os.listdir(output_dir):
+ if file.endswith("_combined.mp4"):
+ video_path = os.path.join(output_dir, file)
+ break
+
+ if video_path and os.path.exists(video_path):
+ tasks[task_id]['status'] = 'completed'
+ tasks[task_id]['video_path'] = video_path
+ else:
+ tasks[task_id]['status'] = 'failed'
+ tasks[task_id]['error'] = "Video file not found after generation."
+ tasks[task_id]['stdout'] = process.stdout
+ tasks[task_id]['stderr'] = process.stderr
+
+ except subprocess.CalledProcessError as e:
+ tasks[task_id]['status'] = 'failed'
+ tasks[task_id]['error'] = str(e)
+ tasks[task_id]['stdout'] = e.stdout
+ tasks[task_id]['stderr'] = e.stderr
+ except Exception as e:
+ tasks[task_id]['status'] = 'failed'
+ tasks[task_id]['error'] = str(e)
+
+def start_generation_thread(topic: str, context: str):
+ if not topic or not context:
+ return "Topic and Context cannot be empty.", "", None
+
+ task_id = str(uuid.uuid4())
+ tasks[task_id] = {'status': 'queued'}
+
+ # Use a background thread to run the time-consuming task
+ thread = threading.Thread(
+ target=run_video_generation,
+ args=(task_id, topic, context)
+ )
+ thread.start()
+
+ return f"Task started. Your Task ID is: {task_id}", task_id, None
+
+
+def check_status(task_id: str):
+ if not task_id:
+ return "Please provide a Task ID.", None
+
+ task = tasks.get(task_id)
+ if not task:
+ return "Task not found.", None
+
+ status = task.get('status')
+ if status == 'completed':
+ video_path = task.get('video_path')
+ return f"Status: {status}", video_path
+ elif status == 'failed':
+ error = task.get('error', 'Unknown error')
+ stdout = task.get('stdout', '')
+ stderr = task.get('stderr', '')
+ return f"Status: {status}\nError: {error}\nOutput: {stdout}\nStderr: {stderr}", None
+
+ return f"Status: {status}", None
+
+# We need a lightweight FastAPI app in the background to serve the video files.
+# Gradio can't serve files directly from arbitrary paths in a secure way.
+fastapi_app = FastAPI()
+
+@fastapi_app.get("/videos/{task_id}")
+def get_video(task_id: str):
+ """
+ Serves the final generated video file.
+ """
+ task = tasks.get(task_id)
+ if not task or task.get('status') != 'completed':
+ return {"error": "Task not completed or not found"}
+
+ video_path = task.get('video_path')
+ if not os.path.exists(video_path):
+ return {"error": "Video file not found."}
+
+ return FileResponse(video_path, media_type="video/mp4", filename=os.path.basename(video_path))
+
+
+# Gradio Interface
+with gr.Blocks() as demo:
+ gr.Markdown("# Theorem-Explain-Agent Video Generation")
+ gr.Markdown("Start a video generation task and check its status.")
+
+ with gr.Tab("Start Generation"):
+ topic_input = gr.Textbox(label="Topic", placeholder="e.g., The Pythagorean Theorem")
+ context_input = gr.Textbox(label="Context", placeholder="A short explanation of the theorem.")
+ start_button = gr.Button("Generate Video")
+
+ with gr.Column():
+ task_id_output = gr.Textbox(label="Task ID", interactive=False)
+ status_output_start = gr.Textbox(label="Status", interactive=False)
+
+ with gr.Tab("Check Status"):
+ task_id_input = gr.Textbox(label="Task ID", placeholder="Enter the Task ID you received.")
+ check_button = gr.Button("Check Status")
+
+ with gr.Column():
+ status_output_check = gr.Textbox(label="Status", interactive=False)
+ video_output = gr.Video(label="Generated Video")
+
+ # Actions
+ start_button.click(
+ fn=start_generation_thread,
+ inputs=[topic_input, context_input],
+ outputs=[status_output_start, task_id_output, video_output] # Clear video on new task
+ )
+
+ check_button.click(
+ fn=check_status,
+ inputs=[task_id_input],
+ outputs=[status_output_check, video_output]
+ )
+
+ gr.Markdown("### How to Use")
+ gr.Markdown(
+ "1. Enter a `Topic` and `Context` in the 'Start Generation' tab and click 'Generate Video'.\n"
+ "2. Copy the `Task ID` that appears.\n"
+ "3. Go to the 'Check Status' tab, paste the `Task ID`, and click 'Check Status' periodically.\n"
+ "4. When the generation is complete, the video will appear."
+ )
+
+# To run both Gradio and FastAPI, we mount the FastAPI app into Gradio's internal FastAPI app.
+app = gr.mount_ όπου(demo, fastapi_app, path="/")
\ No newline at end of file
diff --git a/data/thb_easy/chemistry.json b/data/thb_easy/chemistry.json
new file mode 100644
index 0000000000000000000000000000000000000000..867885d99faba1dd7839c150bd0c5e6f8c8c40ca
--- /dev/null
+++ b/data/thb_easy/chemistry.json
@@ -0,0 +1,142 @@
+[
+ {
+ "theorem": "The Aufbau Principle",
+ "description": "Electrons fill atomic orbitals in order of increasing energy levels. This means the lowest energy orbitals are filled first, followed by higher energy orbitals. This helps in predicting electronic configuration and understanding the properties of elements.",
+ "difficulty": "Easy",
+ "remark": "Fundamental principle for building the electron configurations of atoms and understanding the periodic table.",
+ "subfield": "Atomic Structure"
+ },
+ {
+ "theorem": "The Law of Conservation of Mass",
+ "description": "In a closed system, the total mass of the reactants is equal to the total mass of the products. This implies that matter is neither created nor destroyed during a chemical reaction, only transformed. This principle is fundamental for understanding stoichiometry.",
+ "difficulty": "Easy",
+ "remark": "A cornerstone of chemistry, this principle allows us to balance chemical equations and make quantitative predictions.",
+ "subfield": "Chemical Reactions and Stoichiometry"
+ },
+ {
+ "theorem": "The Octet Rule",
+ "description": "Atoms tend to gain, lose, or share electrons in order to achieve a full outer shell of eight electrons (or two in the case of hydrogen and some other exceptions). This explains the bonding behaviour of most main group elements, guiding the formations of compounds.",
+ "difficulty": "Easy",
+ "remark": "Simple and powerful rule to understand the formations of chemical bonds and predict molecules' structures.",
+ "subfield": "Chemical Bonding"
+ },
+ {
+ "theorem": "Alkali metals",
+ "description": "The alkali metals consist of the chemical elements lithium (Li), sodium (Na), potassium (K), rubidium (Rb), caesium (Cs), and francium (Fr).",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Periodic Table and Elements"
+ },
+ {
+ "theorem": "Distillation",
+ "description": "In chemistry, Distillation is among the most useful methods available to chemists for separating the parts of a liquid. A process that relies on a cycle of heating, vaporization, condensing and cooling. A liquid of a lower boiling point will vaporize before a liquid of higher boiling point.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Separation Techniques"
+ },
+ {
+ "theorem": "Crystallization",
+ "description": "In chemistry, Crystallization, or crystallisation, is the process of atoms or molecules arranging into a well-defined, rigid crystal lattice in order to minimize their energetic state. The smallest entity of a crystal lattice is called a unit cell, which can accept atoms or molecules to grow a macroscopic crystal.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Solid State Chemistry"
+ },
+ {
+ "theorem": "Titration",
+ "description": "Titration is a common laboratory method of quantitative chemical analysis to determine the concentration of an identified analyte. A reagent, termed the titrant or titrator, is prepared as a standard solution of known concentration and volume.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Analytical Chemistry"
+ },
+ {
+ "theorem": "Ionic Compound",
+ "description": "An ionic compound is a chemical compound composed of ions. Ionic compounds are formed by the electrostatic attraction between positively charged cations and negatively charged anions.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Chemical Bonding"
+ },
+ {
+ "theorem": "Noble gas",
+ "description": "The noble gases are so named because they rarely react with other elements. Helium, neon, argon, krypton, xenon and radon atoms all have a full outer valence shell of electrons, which makes them quite unreactive.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Periodic Table and Elements"
+ },
+ {
+ "theorem": "Transition Metal",
+ "description": "Transition metal, any of various chemical elements that have valence electrons—i.e., electrons that can participate in the formation of chemical bonds—in two shells instead of only one.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Periodic Table and Elements"
+ },
+ {
+ "theorem": "Balance Chemical Equation",
+ "description": "A balanced equation is an equation for a chemical reaction in which the number of atoms for each element in the reaction and the total charge are the same for both the reactants and the products.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Chemical Reactions and Stoichiometry"
+ },
+ {
+ "theorem": "Combustion analysis",
+ "description": "Combustion analysis is a method used in both organic chemistry and analytical chemistry to determine the elemental composition (more precisely empirical formula) of a pure organic compound by combusting the sample under conditions where the resulting combustion products can be quantitatively analyzed.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Analytical Chemistry"
+ },
+ {
+ "theorem": "Oxidation",
+ "description": "In chemistry, the oxidation state, or oxidation number, is the hypothetical charge of an atom if all of its bonds to other atoms were fully ionic. It describes the degree of oxidation of an atom in a chemical compound. Conceptually, the oxidation state may be positive, negative or zero.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Redox Chemistry"
+ },
+ {
+ "theorem": "First law of thermodynamics",
+ "description": "The first law of thermodynamics is a formulation of the law of conservation of energy in the context of thermodynamic processes. The law distinguishes two principal forms of energy transfer, heat and thermodynamic work, that modify a thermodynamic system containing a constant amount of matter.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Thermodynamics"
+ },
+ {
+ "theorem": "Hess's Law",
+ "description": "The enthalpy change of a reaction is independent of the path taken from reactants to products. This allows the calculation of enthalpy changes for reactions that cannot be easily measured directly by using a series of reactions with known enthalpy changes. The overall enthalpy change is the sum of enthalpy changes of individual steps.",
+ "difficulty": "Easy",
+ "remark": "Useful for calculating enthalpy changes of complex reactions. It's based on the state function of enthalpy.",
+ "subfield": "Thermodynamics"
+ },
+ {
+ "theorem": "The Ideal Gas Law",
+ "description": "The product of the pressure and volume of an ideal gas is proportional to the product of the amount of gas and its absolute temperature: PV = nRT. This law describes the behavior of ideal gases and helps predict their volume, pressure, temperature, or amount under given conditions.",
+ "difficulty": "Easy",
+ "remark": "Ideal for understanding the behaviour of gases, often used in stoichiometry related to gases. Assumes no intermolecular forces or particle volume.",
+ "subfield": "Gas Laws"
+ },
+ {
+ "theorem": "Charles's Law",
+ "description": "Charles's law (also known as the law of volumes) is an experimental gas law that describes how gases tend to expand when heated.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Gas Laws"
+ },
+ {
+ "theorem": "Gay-Lussac's Law",
+ "description": "Gay-Lussac's law usually refers to Joseph-Louis Gay-Lussac's law of combining volumes of gases, discovered in 1808 and published in 1809.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Gas Laws"
+ },
+ {
+ "theorem": "pH Scale Definition",
+ "description": "pH is a measure of the hydrogen ion concentration in a solution.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Acid-Base Chemistry"
+ },
+ {
+ "theorem": "Van't Hoff Equation",
+ "description": "The Van 't Hoff equation has been widely utilized to explore the changes in state functions in a thermodynamic system. ",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Chemical Kinetics"
+ }
+]
\ No newline at end of file
diff --git a/data/thb_easy/comp_sci.json b/data/thb_easy/comp_sci.json
new file mode 100644
index 0000000000000000000000000000000000000000..198523a5e4dbc437c834ab2025544eed6a7b4cae
--- /dev/null
+++ b/data/thb_easy/comp_sci.json
@@ -0,0 +1,142 @@
+[
+ {
+ "theorem": "The Pigeonhole Principle",
+ "description": "If you have more pigeons than pigeonholes, then at least one pigeonhole must contain more than one pigeon. More formally, if *n* items are put into *m* containers, with *n > m*, then at least one container must contain more than one item.",
+ "difficulty": "Easy",
+ "remark": "A fundamental principle in combinatorics with surprising applications in various areas of computer science, like proving existence in hashing or data compression. Simple to understand, powerful in use.",
+ "subfield": "Discrete Mathematics"
+ },
+ {
+ "theorem": "De Morgan's Laws",
+ "description": "De Morgan's Laws provide a way to simplify or transform logical statements involving AND, OR, and NOT. Specifically: 1) NOT (A AND B) is equivalent to (NOT A) OR (NOT B). 2) NOT (A OR B) is equivalent to (NOT A) AND (NOT B).",
+ "difficulty": "Easy",
+ "remark": "Crucial for boolean algebra and digital logic design. Helps with simplifying complex logic expressions and is widely used in programming.",
+ "subfield": "Boolean Algebra"
+ },
+ {
+ "theorem": "The Time Complexity of Linear Search",
+ "description": "In the worst-case scenario, searching for an element in an unsorted array using linear search requires O(n) time, where 'n' is the number of elements in the array. This is because the algorithm may need to examine every element in the array to find or conclude the non-existence of the target.",
+ "difficulty": "Easy",
+ "remark": "A foundational concept in algorithm analysis. Illustrates how the running time of an algorithm scales with the input size.",
+ "subfield": "Algorithm Analysis"
+ },
+ {
+ "theorem": "The Properties of a Binary Tree",
+ "description": "For a complete or full binary tree: 1) The maximum number of nodes at level *l* is 2^l (where the root is at level 0). 2) The total number of nodes in a complete binary tree of *h* depth is 2^(h+1) - 1.",
+ "difficulty": "Easy",
+ "remark": "Fundamental for understanding and analyzing tree data structures. Used in many algorithmic designs.",
+ "subfield": "Data Structures"
+ },
+ {
+ "theorem": "The Triangle Inequality Theorem",
+ "description": "The triangle inequality states that for any three points A, B, and C in a metric space (e.g., the Euclidean plane), the sum of the lengths of any two sides of a triangle must be greater than or equal to the length of the third side. |AB| + |BC| >= |AC|",
+ "difficulty": "Easy",
+ "remark": "Often used in graph algorithms (e.g. proving properties of shortest path) . The principle is used as basis of many distance metrics.",
+ "subfield": "Computational Geometry"
+ },
+ {
+ "theorem": "Hamming distance",
+ "description": "In information theory, the Hamming distance between two strings or vectors of equal length is the number of positions at which the corresponding symbols are different.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Information Theory"
+ },
+ {
+ "theorem": "Big O notation",
+ "description": "most common type of asymptotic notation in computer science used to measure worst case complexity",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Algorithm Analysis"
+ },
+ {
+ "theorem": "Deadlock",
+ "description": "A deadlock is a situation where two or more processes are blocked waiting for each other to release resources, resulting in a circular wait condition.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Operating Systems"
+ },
+ {
+ "theorem": "Bubble Sort",
+ "description": "Bubble sort is a simple sorting algorithm that repeatedly steps through the list, compares adjacent elements and swaps them if they are in the wrong order.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Algorithms"
+ },
+ {
+ "theorem": "Karnaugh Map",
+ "description": "A Karnaugh map (K-map) is a graphical method for simplifying Boolean algebra expressions.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Digital Logic Design"
+ },
+ {
+ "theorem": "Hash table",
+ "description": "A hash table uses a hash function to compute an index, also called a hash code, into an array of buckets or slots, from which the desired value can be found.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Data Structures"
+ },
+ {
+ "theorem": "Linked list",
+ "description": "data structure that does not necessarily store elements next to each other and instead works by maintaining, for each element, a link to the next element in the list",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Data Structures"
+ },
+ {
+ "theorem": "Chain Code",
+ "description": "A chain code is a lossless compression based image segmentation method for binary images based upon tracing image contours. The basic principle of chain coding, like other contour codings, is to separately encode each connected component, or blob in the image.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Image Processing"
+ },
+ {
+ "theorem": "Signal-to-noise ratio",
+ "description": "The signal-to-noise ratio (SNR) is a measure of the ratio between the power of a signal and the power of background noise.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Signal Processing"
+ },
+ {
+ "theorem": "Run-length encoding",
+ "description": "Run-length encoding (RLE) is a form of data compression that encodes consecutive data elements by a single data value and count, rather than by the original data values.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Data Compression"
+ },
+ {
+ "theorem": "Elbow method",
+ "description": "The elbow method is a graphical method for finding the optimal K value in a k-means clustering algorithm.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Machine Learning"
+ },
+ {
+ "theorem": "Huffman coding",
+ "description": "In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Data Compression"
+ },
+ {
+ "theorem": "Paging",
+ "description": "Paging is a memory management technique used in operating systems to manage virtual memory. It involves dividing the virtual address space into fixed-size blocks called pages, and storing these pages in a secondary storage device called a paging file.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Operating Systems"
+ },
+ {
+ "theorem": "OSI model",
+ "description": "The Open Systems Interconnection (OSI) model is a conceptual framework that describes how data is sent over a network.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Computer Networks"
+ },
+ {
+ "theorem": "IEEE Convertion",
+ "description": "The IEEE-754 standard describes floating-point formats, a way to represent real numbers in hardware.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Computer Architecture"
+ }
+]
\ No newline at end of file
diff --git a/data/thb_easy/math.json b/data/thb_easy/math.json
new file mode 100644
index 0000000000000000000000000000000000000000..cacafe7e2b3048cf3f14eff2d61ff039286878a4
--- /dev/null
+++ b/data/thb_easy/math.json
@@ -0,0 +1,142 @@
+[
+ {
+ "theorem": "The Pythagorean Theorem",
+ "description": "In a right-angled triangle, the square of the length of the hypotenuse (the side opposite the right angle) is equal to the sum of the squares of the lengths of the other two sides. If a and b are the lengths of the legs and c is the length of the hypotenuse, then a\u00b2 + b\u00b2 = c\u00b2.",
+ "difficulty": "Easy",
+ "remark": "Fundamental theorem in geometry; widely used in various fields.",
+ "subfield": "Geometry"
+ },
+ {
+ "theorem": "Properties of Kites",
+ "description": "A kite is a quadrilateral with two pairs of adjacent, congruent sides. In geometry, kites have several unique properties that distinguish them from other quadrilaterals. Here are some of the key properties of kites:\n\n1. Two pairs of adjacent sides are congruent: In a kite, there are two distinct pairs of adjacent sides that have equal length. This means that if one pair of sides has a length of 'a', the other pair will also have a length of 'a', and if the other pair has a length of 'b', the first pair will also have a length of 'b'.\n\n2. Diagonals are perpendicular: The diagonals of a kite intersect at a 90-degree angle, meaning they are perpendicular to each other.\n\n3. One diagonal is bisected: In a kite, one of the diagonals is bisected by the other diagonal, meaning it is divided into two equal parts. This property is true for the diagonal connecting the vertices between the congruent sides.\n\n4. One pair of opposite angles is congruent: In a kite, the angles between the congruent sides (the angles formed by the two pairs of equal sides) are congruent, meaning they have the same degree measure.\n\n5. Area: The area of a kite can be calculated using the lengths of its diagonals. If 'd1' and 'd2' are the lengths of the diagonals, the area of the kite is given by the formula: Area = (1/2) * d1 * d2.\n\n6. Circumscribed circle: A kite can have a circumscribed circle only if it is a rhombus (all sides are congruent) or a square (all sides and angles are congruent).\n\n7. Inscribed circle: A kite can have an inscribed circle only if it is a square (all sides and angles are congruent).\n\nThese properties make kites an interesting and unique type of quadrilateral in geometry.",
+ "difficulty": "Easy",
+ "remark": "Properties of kites are useful for solving geometry problems involving kites.",
+ "subfield": "Geometry"
+ },
+ {
+ "theorem": "Euler's formula",
+ "description": "Euler's formula is a fundamental equation in complex analysis that establishes a deep connection between trigonometry and complex exponentials. It is named after the Swiss mathematician Leonhard Euler. The formula is given by:\n\ne^(ix) = cos(x) + i*sin(x)\n\nwhere e is the base of the natural logarithm (approximately 2.71828), i is the imaginary unit (i^2 = -1), x is a real number, and cos(x) and sin(x) are the trigonometric functions cosine and sine, respectively.\n\nEuler's formula demonstrates that complex exponentials can be expressed in terms of trigonometric functions, and vice versa. This relationship is particularly useful in various fields of mathematics, physics, and engineering, as it simplifies calculations involving complex numbers and trigonometric functions.\n\nOne of the most famous consequences of Euler's formula is Euler's identity, which is obtained by setting x = \u03c0 in the formula:\n\ne^(i\u03c0) + 1 = 0\n\nEuler's identity is considered one of the most beautiful equations in mathematics, as it combines five fundamental constants (e, i, \u03c0, 1, and 0) in a simple and elegant relationship.",
+ "difficulty": "Easy",
+ "remark": "Euler's formula is widely used in various fields, including engineering, physics, and computer science.",
+ "subfield": "Complex Analysis"
+ },
+ {
+ "theorem": "Laws of Exponents",
+ "description": "The laws of exponents simplify the multiplication and division operations.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Algebra"
+ },
+ {
+ "theorem": "One-to-one function",
+ "description": "a function for which each value of the output is associated with a unique input value",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Functions"
+ },
+ {
+ "theorem": "Inverse function",
+ "description": "For any one-to-one function f(x), the inverse is a function f^(-1)(x) such that f^(-1)(f(x))=x for all x in the domain of f; this also implies that f(f^(-1)(x))=x for all x in the domain of f^(-1)",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Functions"
+ },
+ {
+ "theorem": "Remainder theorem",
+ "description": "The remainder theorem states that when a polynomial p(x) is divided by a linear polynomial (x - a), then the remainder is equal to p(a).",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Algebra"
+ },
+ {
+ "theorem": "Rational Zero Theorem",
+ "description": "The rational root theorem is also known as the rational zero theorem (or) the rational zero test (or) rational test theorem and is used to determine the rational roots of a polynomial function. ",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Algebra"
+ },
+ {
+ "theorem": "Product-to-sum formula",
+ "description": "The product-to-sum formulas are a set of formulas from trigonometric formulas.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Geometry"
+ },
+ {
+ "theorem": "Heron's formula",
+ "description": "Heron's formula is a formula that is used to find the area of a triangle when the lengths of all three sides are known.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Geometry"
+ },
+ {
+ "theorem": "De Moivre's Theorem",
+ "description": "Formula used to find the nth power or nth roots of a complex number; states that, for a positive integer n, z^n is found by raising the modulus to the nth power and multiplying the angles by n",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Complex Analysis"
+ },
+ {
+ "theorem": "Cramer's Rule",
+ "description": "a method for solving systems of equations that have the same number of equations as variables using determinants",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Algebra"
+ },
+ {
+ "theorem": "Angle of rotation",
+ "description": "An angle of rotation is the measure of the amount that a figure is rotated about a fixed point called a point of rotation.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Geometry"
+ },
+ {
+ "theorem": "Similar Triangles Theorem",
+ "description": "Two triangles are similar if their corresponding angles are equal and their corresponding sides are proportional.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Geometry"
+ },
+ {
+ "theorem": "Congruent Triangles Theorem",
+ "description": "Two triangles are congruent if they satisfy any of these criteria: SSS (Side-Side-Side), SAS (Side-Angle-Side), ASA (Angle-Side-Angle), AAS (Angle-Angle-Side), or HL (Hypotenuse-Leg) for right triangles.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Geometry"
+ },
+ {
+ "theorem": "Geometric Sequence",
+ "description": "For a geometric sequence with the first term a, common ratio r, and n terms, the sum is: S_n = a * (1 - r^n) / (1 - r) for r != 1",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Sequences and Series"
+ },
+ {
+ "theorem": "Arithmetic Sequence",
+ "description": "For an arithmetic sequence with the first term a, common difference d, and n terms, the sum is: S_n = (n/2) * (2a + (n-1)d)",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Sequences and Series"
+ },
+ {
+ "theorem": "Permutation",
+ "description": "The term permutation refers to a mathematical calculation of the number of ways a particular set can be arranged.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Combinatorics"
+ },
+ {
+ "theorem": "Directrix",
+ "description": "a line perpendicular to the axis of symmetry of a parabola; a line such that the ratio of the distance between the points on the conic and the focus to the distance to the directrix is constant.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Conic Sections"
+ },
+ {
+ "theorem": "Eccentricity",
+ "description": "the eccentricity of a conic section is a non-negative real number that uniquely characterizes its shape.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Conic Sections"
+ }
+]
\ No newline at end of file
diff --git a/data/thb_easy/physics.json b/data/thb_easy/physics.json
new file mode 100644
index 0000000000000000000000000000000000000000..e7a743d770bd441815b1fb07d426dfc7ccf25466
--- /dev/null
+++ b/data/thb_easy/physics.json
@@ -0,0 +1,142 @@
+[
+ {
+ "theorem": "Ohm's Law",
+ "description": "The voltage (V) across a conductor is directly proportional to the current (I) flowing through it, given the resistance (R) remains constant. The formula is V = IR. This law holds for many materials, particularly metals, and components like resistors.",
+ "difficulty": "Easy",
+ "remark": "A cornerstone of circuit analysis. While it is an approximation, it's incredibly useful in solving basic circuit problems. The 'resistance' is a macroscopic property representing the ease of electron movement.",
+ "subfield": "Electricity and Circuits"
+ },
+ {
+ "theorem": "Newton's First Law of Motion",
+ "description": "a body at rest remains at rest, or, if in motion, remains in motion at a constant velocity unless acted on by a net external force; also known as the law of inertia",
+ "difficulty": "Easy",
+ "remark": "This law is fundamental to understanding the relationship between force and motion. It establishes that forces cause acceleration which changes velocity. Applicable for solving motion problems where force and mass are known.",
+ "subfield": "Classical Mechanics"
+ },
+ {
+ "theorem": "Newton's Second Law of Motion",
+ "description": "The net force (F_net) acting on an object is equal to the mass (m) of the object multiplied by its acceleration (a). F_net = ma. This law is fundamental to understanding the relationship between force and motion.",
+ "difficulty": "Easy",
+ "remark": "This is one of the most important laws in classical mechanics. It establishes that forces cause acceleration which changes velocity. Applicable for solving motion problems where force and mass are known.",
+ "subfield": "Classical Mechanics"
+ },
+ {
+ "theorem": "Hooke's law",
+ "description": "In physics, Hooke's law is an empirical law which states that the force needed to extend or compress a spring by some distance scales linearly with respect to that distance.",
+ "difficulty": "Easy",
+ "remark": "This law is fundamental to understanding the relationship between force and motion. It establishes that forces cause acceleration which changes velocity. Applicable for solving motion problems where force and mass are known.",
+ "subfield": "Classical Mechanics"
+ },
+ {
+ "theorem": "Gravitational Force",
+ "description": "In physics, gravity is a fundamental interaction primarily observed as mutual attraction between all things that have mass.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Classical Mechanics"
+ },
+ {
+ "theorem": "Centrifugal force",
+ "description": "Centrifugal force is a fictitious force in Newtonian mechanics that appears to act on all objects when viewed in a rotating frame of reference. It appears to be directed radially away from the axis of rotation of the frame.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Classical Mechanics"
+ },
+ {
+ "theorem": "Kinetic energy",
+ "description": "In physics, the kinetic energy of an object is the form of energy that it possesses due to its motion. In classical mechanics, the kinetic energy of a non-rotating object of mass m traveling at a speed v is.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Classical Mechanics"
+ },
+ {
+ "theorem": "Torque",
+ "description": "Torque is a measure of the force that can cause an object to rotate about an axis. Just as force is what causes an object to accelerate in linear kinematics, torque is what causes an object to acquire angular acceleration. Torque is a vector quantity.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Classical Mechanics"
+ },
+ {
+ "theorem": "Right-hand rule",
+ "description": "The right hand rule is a hand mnemonic used in physics to identify the direction of axes or parameters that point in three dimensions.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Electromagnetism"
+ },
+ {
+ "theorem": "Snell's Law",
+ "description": "Relates the angles of incidence and refraction of light when passing between two different media. It states that n₁sin(θ₁) = n₂sin(θ₂), where n₁ and n₂ are the refractive indices of the two media, and θ₁ and θ₂ are the angles of incidence and refraction, respectively.",
+ "difficulty": "Easy",
+ "remark": "This theorem is fundamental to understanding how light bends when it travels through different materials, essential for studying optics (lenses, prisms). Its application involves using trigonometry.",
+ "subfield": "Optics"
+ },
+ {
+ "theorem": "The Ideal Gas Law",
+ "description": "Relates the pressure (P), volume (V), temperature (T), and the number of moles (n) of an ideal gas: PV = nRT, where R is the ideal gas constant. It serves as a good approximation for the behavior of real gases under certain conditions.",
+ "difficulty": "Easy",
+ "remark": "Connects macroscopic gas properties and allows calculations involving gas behavior under varied conditions. Applicable for thermodynamics problems and understanding gas pressure, volume and temperature relationship.",
+ "subfield": "Thermodynamics"
+ },
+ {
+ "theorem": "Pascal's Principle",
+ "description": "Pascal's law is a principle in fluid mechanics given by Blaise Pascal that states that a pressure change at any point in a confined incompressible fluid is transmitted throughout the fluid such that the same change occurs everywhere.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Fluid Mechanics"
+ },
+ {
+ "theorem": "Avogadro's number",
+ "description": "The concept of the mole can be used to convert between mass and number of particles.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Thermodynamics"
+ },
+ {
+ "theorem": "Dalton's law of partial pressures",
+ "description": "Dalton's law of partial pressures states that the total pressure of a mixture of gases is the sum of the partial pressures of its components.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Thermodynamics"
+ },
+ {
+ "theorem": "PV diagram",
+ "description": "a graph of pressure vs. volume",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Thermodynamics"
+ },
+ {
+ "theorem": "Color wavelengths",
+ "description": "The wavelength of a color is the range of nanometers (nm) at which it appears in the visible light spectrum.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Optics"
+ },
+ {
+ "theorem": "Ultrasound",
+ "description": "Ultrasound refers to sound waves with frequencies higher than the audible range for humans.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Waves and Sound"
+ },
+ {
+ "theorem": "Coulomb's law",
+ "description": "Coulomb's inverse-square law, or simply Coulomb's law, is an experimental law of physics that calculates the amount of force between two electrically charged particles at rest. This electric force is conventionally called the electrostatic force or Coulomb force.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Electromagnetism"
+ },
+ {
+ "theorem": "Kirchhoff's voltage law",
+ "description": "The sum of all the voltages around a loop is equal to zero.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Electricity and Circuits"
+ },
+ {
+ "theorem": "Thévenin's theorem",
+ "description": "Thévenin's theorem states that any linear circuit containing several voltage sources and resistors can be simplified to a Thévenin-equivalent circuit with a single voltage source and resistance connected in series with a load.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Electricity and Circuits"
+ }
+]
\ No newline at end of file
diff --git a/data/thb_hard/chemistry.json b/data/thb_hard/chemistry.json
new file mode 100644
index 0000000000000000000000000000000000000000..fbca7ba1fc9abf812f1e3f5ac634f8d86822e8c7
--- /dev/null
+++ b/data/thb_hard/chemistry.json
@@ -0,0 +1,142 @@
+[
+ {
+ "theorem": "The Henderson-Hasselbalch Equation",
+ "description": "The pH of a buffer solution is equal to the pKa of the weak acid plus the logarithm of the ratio of the concentration of the conjugate base to the concentration of the weak acid: pH = pKa + log([A-]/[HA]). It allows for the calculation of buffer solutions pH and predicting how pH would change with addition of acid or base",
+ "difficulty": "Hard",
+ "remark": "Crucial in understanding buffer solutions and titrations. Used in biochemistry extensively.",
+ "subfield": "Acid-Base Chemistry"
+ },
+ {
+ "theorem": "Bragg's law",
+ "description": "Bragg's law in chemistry describes how X-rays reflect off of a crystal surface.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Crystallography"
+ },
+ {
+ "theorem": "Debye-Scherrer Equation",
+ "description": "The Debye-Scherrer equation is used in chemistry to calculate the size of crystalline nanoparticles. It is based on X-ray diffraction (XRD) measurements.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Crystallography"
+ },
+ {
+ "theorem": "Hückel's Rule",
+ "description": "In organic chemistry, Hückel's rule predicts that a planar ring molecule will have aromatic properties if it has 4n + 2 π-electrons, where n is a non-negative integer.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Organic Chemistry"
+ },
+ {
+ "theorem": "Hard Acid Soft Base Theory",
+ "description": "Hard Acid Soft Base Theory (HSAB): This theory works on the principle that soft acid reacts with the soft base while hard acid reacts with the hard base",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Acid-Base Chemistry"
+ },
+ {
+ "theorem": "Pauli Exclusion Principle",
+ "description": "Pauli's Exclusion Principle states that no two electrons in the same atom can have identical values for all four of their quantum numbers.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Quantum Chemistry"
+ },
+ {
+ "theorem": "Crystal Field Theory",
+ "description": "Crystal field theory (CFT) describes the breaking of orbital degeneracy in transition metal complexes due to the presence of ligands.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Inorganic Chemistry"
+ },
+ {
+ "theorem": "Hohenberg-Kohn theorem",
+ "description": "The first Hohenberg–Kohn theorem states that 'the ground state of any interacting many particle system with a given fixed inter-particle interaction is a unique functional of the electron density n(r).",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Quantum Chemistry"
+ },
+ {
+ "theorem": "Frost–Ebsworth diagram",
+ "description": "A Frost diagram or Frost–Ebsworth diagram is a type of graph used by inorganic chemists in electrochemistry to illustrate the relative stability of a number of different oxidation states of a particular substance. The graph illustrates the free energy vs oxidation state of a chemical species.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Electrochemistry"
+ },
+ {
+ "theorem": "Coulson-Fischer Theorem",
+ "description": "In theoretical chemistry and molecular physics, Coulson–Fischer theory provides a quantum mechanical description of the electronic structure of molecules.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Quantum Chemistry"
+ },
+ {
+ "theorem": "Frank-Condon Principle",
+ "description": "The Franck-Condon Principle describes the intensities of vibronic transitions, or the absorption or emission of a photon.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Spectroscopy"
+ },
+ {
+ "theorem": "Nernst Equation",
+ "description": "The Nernst Equation enables the determination of cell potential under non-standard conditions.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Electrochemistry"
+ },
+ {
+ "theorem": "Slater's Rules",
+ "description": "The general principle behind Slater's Rule is that the actual charge felt by an electron is equal to what you'd expect the charge to be from a certain number of protons, but minus a certain amount of charge from other electrons.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Quantum Chemistry"
+ },
+ {
+ "theorem": "Langmuir Adsorption Isotherm",
+ "description": "A continuous monolayer of adsorbate molecules surrounding a homogeneous solid surface is the conceptual basis for this adsorption model.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Physical Chemistry"
+ },
+ {
+ "theorem": "Marcus Theory",
+ "description": "Marcus theory is a theory originally developed by Rudolph A. Marcus, starting in 1956, to explain the rates of electron transfer reactions.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Physical Chemistry"
+ },
+ {
+ "theorem": "Eyring Equation",
+ "description": "The Eyring equation is an equation used in chemical kinetics to describe changes in the rate of a chemical reaction against temperature.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Chemical Kinetics"
+ },
+ {
+ "theorem": "Woodward-Hoffmann Rules",
+ "description": "Robert Burns Woodward and Roald Hoffmann devised these set of rules to explain the stereochemistry of pericyclic reactions based on the orbital symmetry.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Organic Chemistry"
+ },
+ {
+ "theorem": "Born-Haber Cycle",
+ "description": "A Born–Haber cycle applies Hess's law to calculate the lattice enthalpy by comparing the standard enthalpy change of formation of the ionic compound (from the elements) to the enthalpy required to make gaseous ions from the elements. This lattice calculation is complex.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Thermodynamics"
+ },
+ {
+ "theorem": "Molecular Orbital Theory",
+ "description": "In chemistry, molecular orbital theory is a method for describing the electronic structure of molecules using quantum mechanics.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Quantum Chemistry"
+ },
+ {
+ "theorem": "Hammond Postulate",
+ "description": "The postulate, which George Hammond first proposed in 1955, states that if two states, such as a transition state and an unstable intermediate, occur consecutively during a reaction process and have nearly the same energy content, their interconversion will result in only a minor reorganisation of molecular structures.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Physical Chemistry"
+ }
+]
\ No newline at end of file
diff --git a/data/thb_hard/comp_sci.json b/data/thb_hard/comp_sci.json
new file mode 100644
index 0000000000000000000000000000000000000000..8481ec527ae9d1b5bba76e9dc7d654d1d04781a1
--- /dev/null
+++ b/data/thb_hard/comp_sci.json
@@ -0,0 +1,142 @@
+[
+ {
+ "theorem": "Evidence lower bound",
+ "description": "The evidence lower bound (ELBO) is a lower bound on the log-evidence of a model, which is a measure of how well the model fits the data.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Machine Learning"
+ },
+ {
+ "theorem": "Viterbi Algorithm",
+ "description": "The Viterbi Algorithm is a dynamic programming algorithm used for finding the most likely sequence of hidden states, known as the Viterbi path, in a Hidden Markov Model (HMM). It is named after its inventor, Andrew Viterbi, and is widely used in various applications such as speech recognition, natural language processing, and bioinformatics.\n\nA Hidden Markov Model (HMM) is a statistical model that represents a stochastic process involving a sequence of observable events and hidden states. In an HMM, the observable events are generated by the hidden states, which follow a Markov chain. The Markov chain is characterized by the transition probabilities between hidden states, and the emission probabilities of observable events given the hidden states.\n\nThe Viterbi Algorithm works by finding the most probable path of hidden states that generates the observed sequence of events. It does this by iteratively computing the maximum probability of reaching each state at each time step, considering all possible paths that lead to that state. The algorithm uses dynamic programming to efficiently compute these probabilities and store them in a trellis structure.\n\nHere's a high-level description of the Viterbi Algorithm:\n\n1. Initialization: Set the initial probabilities for each hidden state, considering the initial state probabilities and the emission probabilities for the first observed event.\n\n2. Recursion: For each subsequent observed event, compute the maximum probability of reaching each hidden state, considering all possible previous states and their transition probabilities. Update the emission probabilities for the current observed event.\n\n3. Termination: Identify the hidden state with the highest probability at the last time step.\n\n4. Traceback: Starting from the identified state in the termination step, backtrack through the trellis to find the most probable path of hidden states that generated the observed sequence.\n\nThe Viterbi Algorithm is an efficient and widely used method for decoding the hidden states in a Hidden Markov Model, providing valuable insights into the underlying structure of the stochastic process.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Dynamic Programming"
+ },
+ {
+ "theorem": "Fano's inequality",
+ "description": "In information theory, Fano's inequality relates the average information lost in a noisy channel to the probability of the categorization error.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Information Theory"
+ },
+ {
+ "theorem": "Message Passing algorithm",
+ "description": "Message passing algorithm is an iterative decoding algorithm factorizes the global function of many variables into product of simpler local functions, whose arguments are the subset of variables.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Machine Learning"
+ },
+ {
+ "theorem": "Maximal Planar Graph",
+ "description": "A maximal planar graph is a graph which can be embedded in the plane such that every face of the graph is a triangle.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Graph Theory"
+ },
+ {
+ "theorem": "Cayley's formula",
+ "description": "This formula tells how many trees can be constructed with N vertices.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Graph Theory"
+ },
+ {
+ "theorem": "Floyd's Cycle Finding Algorithm",
+ "description": "Also known as the tortoise and the hare algorithm, it is a pointer algorithm that uses two pointers which move at different speeds to find a cycle in a sequence.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Algorithms"
+ },
+ {
+ "theorem": "Sigma-Delta Modulation",
+ "description": "A sigma delta modulator converts this shunt voltage across the resistor, into high-frequency one-bit digital bitstream using oversampling and noise shaping.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Digital Signal Processing"
+ },
+ {
+ "theorem": "Kruskal's algorithm",
+ "description": "greedy algorithm that sorts the list of edges in the graph by weight.",
+ "difficulty": "Hard",
+ "remark": "A fundamental algorithm in graph theory. It's used in network design, spanning tree construction, and various optimization problems. Requires understanding of graph theory and greedy algorithms.",
+ "subfield": "Graph Theory"
+ },
+ {
+ "theorem": "Prim's algorithm",
+ "description": "greedy algorithm that maintains a priority queue of vertices in the graph ordered by connecting edge weight",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Graph Theory"
+ },
+ {
+ "theorem": "Region growing by pixel aggregation",
+ "description": "Region growing by pixel aggregation is a technique used in image processing to segment an image into regions based on the similarity of pixel values.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Image Processing"
+ },
+ {
+ "theorem": "Arithmetic coding",
+ "description": "Arithmetic coding is a lossless data compression technique that assigns a unique code to each symbol in a message based on its probability of occurrence.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Data Compression"
+ },
+ {
+ "theorem": "Expectation–maximization (EM) algorithm",
+ "description": "an expectation–maximization (EM) algorithm is an iterative method to find (local) maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Machine Learning"
+ },
+ {
+ "theorem": "Differential entropy",
+ "description": "Differential entropy, also known as continuous entropy, is a concept in information theory that extends the idea of entropy from discrete random variables to continuous random variables. Entropy, in general, is a measure of the uncertainty or randomness associated with a random variable. In the context of information theory, it quantifies the average amount of information required to describe the outcome of a random variable.\n\nFor discrete random variables, entropy is well-defined using the Shannon entropy formula, which sums the product of the probability of each outcome and the logarithm of its reciprocal probability. However, for continuous random variables, the probability of any specific outcome is zero, making the Shannon entropy formula inapplicable.\n\nDifferential entropy addresses this issue by considering the probability density function (pdf) of a continuous random variable instead of the probabilities of individual outcomes. The differential entropy H(X) of a continuous random variable X with a probability density function f(x) is defined as:\n\nH(X) = - \u222b f(x) * log(f(x)) dx\n\nwhere the integral is taken over the entire range of the random variable X, and log is the logarithm base 2 (or any other base, depending on the desired unit of measurement for entropy).\n\nDifferential entropy can be interpreted as the average amount of information required to describe the outcome of a continuous random variable with a given probability density function. However, unlike the entropy of discrete random variables, differential entropy can be negative, which occurs when the probability density function is highly concentrated around certain values.\n\nIt is important to note that differential entropy is not a direct extension of discrete entropy, and some properties of discrete entropy do not hold for differential entropy. For example, differential entropy is not invariant under changes of variables or coordinate transformations, whereas discrete entropy is invariant under permutations of the outcomes.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Information Theory"
+ },
+ {
+ "theorem": "Kullback–Leibler divergence",
+ "description": "a type of statistical distance: a measure of how much a model probability distribution Q is different from a true probability distribution P.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Information Theory"
+ },
+ {
+ "theorem": "Principal component analysis",
+ "description": "Principal component analysis (PCA) is a statistical method that reduces the dimensions of a dataset to a smaller set of components.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Machine Learning"
+ },
+ {
+ "theorem": "Self-attention",
+ "description": "Self-attention is a mechanism in neural networks that allows the model to focus on different parts of the input sequence when making predictions.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Machine Learning"
+ },
+ {
+ "theorem": "Adversarial training",
+ "description": "Adversarial Training is a machine learning technique that is primarily used for improving the robustness of models. It's a process where models are trained with malicious inputs (adversarial examples) alongside the genuine data.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Machine Learning"
+ },
+ {
+ "theorem": "Forward-Backward Algorithm",
+ "description": "The Forward-Backward Algorithm is a dynamic programming algorithm used in Hidden Markov Models (HMMs) to compute the posterior probabilities of hidden states given a sequence of observations. It is a stochastic process that combines both the forward and backward algorithms to efficiently compute these probabilities.\n\nThe algorithm consists of two main steps:\n\n1. Forward Algorithm:\nThe forward algorithm computes the probability of observing a particular sequence of observations up to a certain time step, given the hidden state at that time step. It calculates the forward probabilities, which are the joint probabilities of the observed sequence and the hidden state at each time step. The forward algorithm uses a recursive approach, where the forward probability at each time step is calculated based on the forward probabilities of the previous time step.\n\n2. Backward Algorithm:\nThe backward algorithm computes the probability of observing the remaining sequence of observations from a certain time step onwards, given the hidden state at that time step. It calculates the backward probabilities, which are the conditional probabilities of the future observations given the hidden state at each time step. Similar to the forward algorithm, the backward algorithm also uses a recursive approach, where the backward probability at each time step is calculated based on the backward probabilities of the next time step.\n\nAfter computing the forward and backward probabilities, the Forward-Backward Algorithm combines these probabilities to calculate the posterior probabilities of the hidden states at each time step. The posterior probability of a hidden state at a particular time step is the probability of that state given the entire sequence of observations. This is computed by multiplying the forward probability and the backward probability for that state at that time step and then normalizing the result.\n\nThe Forward-Backward Algorithm is widely used in various applications, such as speech recognition, natural language processing, and bioinformatics, where the goal is to infer the most likely sequence of hidden states given a sequence of observations.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Dynamic Programming"
+ },
+ {
+ "theorem": "Cook-Levin Theorem",
+ "description": "In computational complexity theory, the Cook–Levin theorem, also known as Cook's theorem, states that the Boolean satisfiability problem is NP-complete.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Computational Complexity"
+ }
+]
\ No newline at end of file
diff --git a/data/thb_hard/math.json b/data/thb_hard/math.json
new file mode 100644
index 0000000000000000000000000000000000000000..28fa3bf985fa1bcdebcf71dc69c81d8b899b5831
--- /dev/null
+++ b/data/thb_hard/math.json
@@ -0,0 +1,142 @@
+[
+ {
+ "theorem": "Taylor's theorem",
+ "description": "Taylor's theorem gives an approximation of a k-times differentiable function around a given point by a polynomial of degree k, called the k-th order Taylor polynomial.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Calculus"
+ },
+ {
+ "theorem": "Simpson's rule",
+ "description": "In numerical integration, Simpson's rules are several approximations for definite integrals, named after Thomas Simpson.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Numerical Analysis"
+ },
+ {
+ "theorem": "Velocity vector",
+ "description": "Velocity is the speed in combination with the direction of motion of an object.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Vector Calculus"
+ },
+ {
+ "theorem": "Double Riemann sum",
+ "description": "A double Riemann sum is a mathematical method used to approximate the value of a double integral over a two-dimensional region.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Multivariable Calculus"
+ },
+ {
+ "theorem": "Fubini's theorem",
+ "description": "Fubini's Theorem is a fundamental result in calculus that allows the evaluation of a double integral as an iterated integral, provided certain conditions are met. It simplifies the computation of double integrals over a rectangular or general region by breaking them into two single integrals.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Multivariable Calculus"
+ },
+ {
+ "theorem": "Jacobian matrix and determinant",
+ "description": "In vector calculus, the Jacobian matrix of a vector-valued function of several variables is the matrix of all its first-order partial derivatives.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Vector Calculus"
+ },
+ {
+ "theorem": "Green's theorem",
+ "description": "Green's theorem is used to integrate the derivatives in a particular plane.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Vector Calculus"
+ },
+ {
+ "theorem": "Stokes' theorem",
+ "description": "relates the flux integral over a surface S to a line integral around the boundary C of the surface S",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Vector Calculus"
+ },
+ {
+ "theorem": "Burnside's Lemma",
+ "description": "Burnside's Lemma, also known as the Cauchy-Frobenius Lemma or the Orbit-Counting Theorem, is a fundamental result in combinatorics that deals with counting the number of distinct elements in a set under the action of a group. It is particularly useful in counting problems involving symmetries and permutations.\n\nThe lemma is named after the British mathematician William Burnside, who contributed significantly to the development of group theory.\n\nStatement of Burnside's Lemma:\n\nLet G be a finite group that acts on a finite set X. Then the number of distinct orbits of X under the action of G is given by:\n\n(1/|G|) * \u03a3 |Fix(g)|\n\nwhere |G| is the order of the group (i.e., the number of elements in G), the sum is taken over all elements g in G, and |Fix(g)| is the number of elements in X that are fixed by the action of g (i.e., the number of elements x in X such that g(x) = x).\n\nIn simpler terms, Burnside's Lemma states that the number of distinct orbits (or equivalence classes) in a set under the action of a group can be found by averaging the number of fixed points of each group element.\n\nBurnside's Lemma is often used in combinatorial problems where we need to count the number of distinct configurations of an object, taking into account its symmetries. By applying the lemma, we can avoid overcounting configurations that are equivalent under a given symmetry operation.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Group Theory"
+ },
+ {
+ "theorem": "Lah Number",
+ "description": "In mathematics, the (signed and unsigned) Lah numbers are coefficients expressing rising factorials in terms of falling factorials and vice versa.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Combinatorics"
+ },
+ {
+ "theorem": "Ramsey's theorem",
+ "description": "Ramsey's theorem essentially states that if a structure (such as a graph or a set of numbers) is large enough, then some kind of order or regularity will always emerge, no matter how it is arranged or colored.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Combinatorics"
+ },
+ {
+ "theorem": "Schwarz Lemma theorem",
+ "description": "Schwarz Lemma is a fundamental result in complex analysis that provides a bound on the behavior of holomorphic functions (i.e., complex-differentiable functions) in the unit disk. It is named after the German mathematician Hermann Schwarz.\n\nStatement of Schwarz Lemma:\n\nLet f be a holomorphic function on the open unit disk D = {z \u2208 \u2102 : |z| < 1} such that f(0) = 0 and |f(z)| \u2264 1 for all z \u2208 D. Then, for all z \u2208 D, the following inequalities hold:\n\n1. |f(z)| \u2264 |z|\n2. |f'(0)| \u2264 1\n\nMoreover, if equality holds for some z \u2260 0 (i.e., |f(z)| = |z|) or |f'(0)| = 1, then f is a rotation, i.e., f(z) = e^(i\u03b8)z for some real \u03b8.\n\nThe Schwarz Lemma has several important consequences and generalizations in complex analysis, such as the Riemann Mapping Theorem and the Pick's Lemma. It is a powerful tool for understanding the behavior of holomorphic functions in the unit disk and provides a way to compare the size of their derivatives at the origin.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Complex Analysis"
+ },
+ {
+ "theorem": "Cauchy Riemann Theorem",
+ "description": "The Cauchy-Riemann Theorem is a fundamental result in complex analysis, a branch of mathematics that studies functions of complex variables. It provides necessary and sufficient conditions for a complex function to be holomorphic (complex differentiable) in a given domain.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Complex Analysis"
+ },
+ {
+ "theorem": "Morera's Theorem",
+ "description": "Morera's theorem, named after Giacinto Morera, gives an important criterion for proving that a function is holomorphic.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Complex Analysis"
+ },
+ {
+ "theorem": "Catalan-Mingantu Number",
+ "description": "The Catalan numbers are a sequence of natural numbers that occur in various counting problems, often involving recursively defined objects. ",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Combinatorics"
+ },
+ {
+ "theorem": "Liouville's theorem",
+ "description": "Liouville's theorem states that: The density of states in an ensemble of many identical states with different initial conditions is constant along every trajectory in phase space. It states that if one constructs an ensemble of paths, the probability density along the trajectory remains constant.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Complex Analysis"
+ },
+ {
+ "theorem": "Derangement Formula",
+ "description": "In combinatorial mathematics, a derangement is a permutation of the elements of a set in which no element appears in its original position.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Combinatorics"
+ },
+ {
+ "theorem": "Delian problem",
+ "description": "Doubling the cube, also known as the Delian problem, is an ancient geometric problem. Given the edge of a cube, the problem requires the construction of the edge of a second cube whose volume is double that of the first.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Geometry"
+ },
+ {
+ "theorem": "Polya's Enumeration Theorem",
+ "description": "Pólya's Enumeration Theorem, also known as Pólya's Counting Theorem, is a powerful result in combinatorics used to count distinct arrangements or configurations of objects that are invariant under a group of symmetries.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Combinatorics"
+ },
+ {
+ "theorem": "Cauchy's theorem",
+ "description": "Cauchy's Theorem is a fundamental result in group theory, a branch of abstract algebra. It provides a condition under which a finite group contains an element of a specific order. It is named after the French mathematician Augustin-Louis Cauchy.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Group Theory"
+ }
+]
\ No newline at end of file
diff --git a/data/thb_hard/physics.json b/data/thb_hard/physics.json
new file mode 100644
index 0000000000000000000000000000000000000000..d5c84f45b3672a4b745c66cb8f0ca440f75ca23d
--- /dev/null
+++ b/data/thb_hard/physics.json
@@ -0,0 +1,142 @@
+[
+ {
+ "theorem": "Boltzmann machine",
+ "description": "It is a statistical physics technique applied in the context of cognitive science. It is also classified as a Markov random field.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Statistical Physics"
+ },
+ {
+ "theorem": "Geometric Brownian Motion",
+ "description": "A geometric Brownian motion (GBM) (also known as exponential Brownian motion) is a continuous-time stochastic process in which the logarithm of the randomly varying quantity follows a Brownian motion (also called a Wiener process) with drift.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Statistical Physics"
+ },
+ {
+ "theorem": "Fermat's Principle",
+ "description": "Fermat's principle states that light travels between two points along the path that requires the least time, as compared to other nearby paths.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Optics"
+ },
+ {
+ "theorem": "Huygens's Principle",
+ "description": "The Huygens–Fresnel principle states that every point on a wavefront is itself the source of spherical wavelets, and the secondary wavelets emanating from different points mutually interfere. The sum of these spherical wavelets forms a new wavefront.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Optics"
+ },
+ {
+ "theorem": "Virial Theorem",
+ "description": "In mechanics, the virial theorem provides a general equation that relates the average over time of the total kinetic energy of a stable system of discrete particles, bound by a conservative force, with that of the total potential energy of the system.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Classical Mechanics"
+ },
+ {
+ "theorem": "Poynting Theorem",
+ "description": "It states that in a given volume, the stored energy changes at a rate given by the work done on the charges within the volume, minus the rate at which energy leaves the volume.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Electromagnetism"
+ },
+ {
+ "theorem": "Fresnel transmission equations",
+ "description": "Fresnel's equations describe the reflection and transmission of electromagnetic waves at an interface.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Optics"
+ },
+ {
+ "theorem": "Fourier Heat Conduction Law",
+ "description": "Fourier's law states that the negative gradient of temperature and the time rate of heat transfer is proportional to the area at right angles of that gradient through which the heat flows.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Thermodynamics"
+ },
+ {
+ "theorem": "Ampère's circuital law",
+ "description": "Ampere's circuital law states that the line integral of the magnetic field surrounding closed-loop equals to the number of times the algebraic sum of currents passing through the loop.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Electromagnetism"
+ },
+ {
+ "theorem": "Malus's Law",
+ "description": "Malus law states that the intensity of a plane-polarised light that passes through an analyser is directly proportional to the square of the cosine of the angle between the plane of the polariser and the transmission axis of the analyser.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Optics"
+ },
+ {
+ "theorem": "Van der Waals Equation",
+ "description": "The van der Waals equation is a mathematical formula that describes the behavior of real gases. It is an equation of state that relates the pressure, temperature, and molar volume in a fluid.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Thermodynamics"
+ },
+ {
+ "theorem": "Rayleigh Criterion",
+ "description": "The Rayleigh criterion is the generally accepted criterion for the minimum resolvable detail - the imaging process is said to be diffraction-limited when the first diffraction minimum of the image of one source point coincides with the maximum of another.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Optics"
+ },
+ {
+ "theorem": "Paschen Curve",
+ "description": "Paschen's law is an equation that gives the breakdown voltage, that is, the voltage necessary to start a discharge or electric arc, between two electrodes in a gas as a function of pressure and gap length.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Electromagnetism"
+ },
+ {
+ "theorem": "Chandrasekhar Limit",
+ "description": "The Chandrasekhar limit is the maximum mass that a star can have and still be a stable white dwarf.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Astrophysics"
+ },
+ {
+ "theorem": "Landau Damping",
+ "description": "Landau damping is a phenomena observed in plasma wherein there is an ex- ponential decay in the oscillations of the number density of electrons in a plasma (also referred to as Langmuir waves) and so stability is achieved in some area of the phase-space.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Plasma Physics"
+ },
+ {
+ "theorem": "Schwarzschild radius",
+ "description": "The Schwarzschild radius is the critical distance from the center of a massive body where the gravitational pull becomes so strong that not even light can escape, defining the boundary of a black hole.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Astrophysics"
+ },
+ {
+ "theorem": "Babinet's Principle",
+ "description": "In physics, Babinet's principle states that the diffraction pattern from an opaque body is identical to that from a hole of the same size and shape except for the overall forward beam intensity.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Optics"
+ },
+ {
+ "theorem": "Schrödinger's Cat",
+ "description": "Schrödinger's cat is a thought experiment in quantum mechanics that illustrates the paradoxical nature of quantum superposition and wave function collapse.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Quantum Mechanics"
+ },
+ {
+ "theorem": "Rayleigh Criterion for Resolution",
+ "description": "For a circular aperture, lens, or mirror, the Rayleigh criterion states that two images are just resolvable when the center of the diffraction pattern of one is directly over the first minimum of the diffraction pattern of the other.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Optics"
+ },
+ {
+ "theorem": "Navier-Stokes Equations",
+ "description": "In fluid mechanics, the Navier-Stokes equations are partial differential equations that express the flow of viscous fluids.",
+ "difficulty": "Hard",
+ "remark": "",
+ "subfield": "Fluid Mechanics"
+ }
+]
\ No newline at end of file
diff --git a/data/thb_medium/chemistry.json b/data/thb_medium/chemistry.json
new file mode 100644
index 0000000000000000000000000000000000000000..afd03ced28b66494d8bf68e224679c72a391ad71
--- /dev/null
+++ b/data/thb_medium/chemistry.json
@@ -0,0 +1,142 @@
+[
+ {
+ "theorem": "Le Chatelier's Principle",
+ "description": "When a system at equilibrium is subjected to a change in condition (such as temperature, pressure, or concentration), the system will shift in a direction that relieves the stress and a new equilibrium will be established. This principle helps predict how equilibrium will shift in response to external changes.",
+ "difficulty": "Medium",
+ "remark": "Essential for understanding chemical equilibrium and its practical applications in industrial processes.",
+ "subfield": "Chemical Equilibrium"
+ },
+ {
+ "theorem": "The Pauli Exclusion Principle",
+ "description": "No two electrons in the same atom can have the same set of four quantum numbers (n, l, ml, ms). This limits the number of electrons that can occupy an orbital, which is max two electrons, with opposite spins (+1/2 and -1/2). This explains electronic configuration in atoms.",
+ "difficulty": "Medium",
+ "remark": "Essential for understanding electronic structure and the basis for chemical bonding.",
+ "subfield": "Quantum Chemistry"
+ },
+ {
+ "theorem": "Raoult's Law",
+ "description": "The partial vapor pressure of a component in an ideal solution is equal to the vapor pressure of the pure component multiplied by its mole fraction in the solution: P_A = P_A* X_A. This helps to predict vapor pressure of ideal solutions and is a basis for colligative properties",
+ "difficulty": "Medium",
+ "remark": "Describes vapor pressure of solutions, useful in understanding boiling point elevation and freezing point depression.",
+ "subfield": "Physical Chemistry"
+ },
+ {
+ "theorem": "Beer-Lambert Law",
+ "description": "The absorbance of a solution is directly proportional to the concentration of the analyte and the path length of the light beam through the solution: A = \u03b5bc, where \u03b5 is molar absorptivity, b is path length, and c is the concentration. Useful in analytical chemistry for determining the concentration of a substance by measuring the light it absorbs.",
+ "difficulty": "Medium",
+ "remark": "Important in spectrophotometry for quantitative analysis of solutions.",
+ "subfield": "Analytical Chemistry"
+ },
+ {
+ "theorem": "Phase diagram",
+ "description": "Phase diagram is a graphical representation of the physical states of a substance under different conditions of temperature and pressure.",
+ "difficulty": "Medium",
+ "remark": "Useful in understanding the phase transitions of substances.",
+ "subfield": "Physical Chemistry"
+ },
+ {
+ "theorem": "Boyle's Law",
+ "description": "Raoult's law is a relation of physical chemistry, with implications in thermodynamics.",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Physical Chemistry"
+ },
+ {
+ "theorem": "Graham's Law of Effusion",
+ "description": "Graham's law of effusion was formulated by Scottish physical chemist Thomas Graham in 1848. Graham found experimentally that the rate of effusion of a gas is inversely proportional to the square root of the molar mass of its particles.",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Physical Chemistry"
+ },
+ {
+ "theorem": "Arrhenius Equation",
+ "description": "In physical chemistry, the Arrhenius equation is a formula for the temperature dependence of reaction rates.",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Chemical Kinetics"
+ },
+ {
+ "theorem": "Henry's law",
+ "description": "the proportional relationship between the concentration of dissolved gas in a solution and the partial pressure of the gas in contact with the solution",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Physical Chemistry"
+ },
+ {
+ "theorem": "Lewis Acid-Base Theory",
+ "description": "In the Lewis theory of acid-base reactions, bases donate pairs of electrons and acids accept pairs of electrons.",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Acid-Base Chemistry"
+ },
+ {
+ "theorem": "Clausius-Clapeyron Equation",
+ "description": "allows us to estimate the vapor pressure at another temperature.",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Thermodynamics"
+ },
+ {
+ "theorem": "Michaelis-Menten Kinetics",
+ "description": "In biochemistry, Michaelis–Menten kinetics, named after Leonor Michaelis and Maud Menten, is the simplest case of enzyme kinetics, applied to enzyme-catalysed reactions of one substrate and one product.",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Chemical Kinetics"
+ },
+ {
+ "theorem": "Gibbs Free Energy Equation",
+ "description": "The change in free energy, ΔG, is equal to the sum of the enthalpy plus the product of the temperature and entropy of the system.",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Thermodynamics"
+ },
+ {
+ "theorem": "Transition State Theory",
+ "description": "In chemistry, transition state theory (TST) explains the reaction rates of elementary chemical reactions.",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Chemical Kinetics"
+ },
+ {
+ "theorem": "Koopman's Theorem",
+ "description": "Koopmans' theorem states that the first ionization energy of a molecule is equal to the negative of the energy of the highest occupied molecular orbital (HOMO).",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Quantum Chemistry"
+ },
+ {
+ "theorem": "Recrystallization",
+ "description": "Recrystallization, also known as fractional crystallization, is a procedure for purifying an impure compound in a solvent.",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Analytical Chemistry"
+ },
+ {
+ "theorem": "Electrogravimetry",
+ "description": "Electrogravimetry is a method used to separate and quantify ions of a substance, usually a metal. In this process, the analyte solution is electrolyzed.",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Analytical Chemistry"
+ },
+ {
+ "theorem": "Kjeldahl Method",
+ "description": "The Kjeldahl method is a laboratory technique used to measure the amount of nitrogen in a sample. ",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Analytical Chemistry"
+ },
+ {
+ "theorem": "Liquid-Liquid Extraction",
+ "description": "Liquid–liquid extraction, also known as solvent extraction and partitioning, is a method to separate compounds or metal complexes, based on their relative solubilities in two different immiscible liquids, usually water (polar) and an organic solvent (non-polar).",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Analytical Chemistry"
+ },
+ {
+ "theorem": "Reflux",
+ "description": "Reflux is a laboratory technique where a reaction mixture is heated to boil and the vapors are condensed back into the reaction flask, allowing continuous heating without loss of volatile components.",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Laboratory Techniques"
+ }
+]
\ No newline at end of file
diff --git a/data/thb_medium/comp_sci.json b/data/thb_medium/comp_sci.json
new file mode 100644
index 0000000000000000000000000000000000000000..4590a94eab91d08d9889c5bdeb9be1c6c3d3abcb
--- /dev/null
+++ b/data/thb_medium/comp_sci.json
@@ -0,0 +1,142 @@
+[
+ {
+ "theorem": "The Halting Problem (Undecidability)",
+ "description": "There is no general algorithm (or program) that can determine, for any arbitrary computer program and its input, whether the program will eventually halt (stop) or run forever.",
+ "difficulty": "Medium",
+ "remark": "A core concept in theoretical computer science. Introduces the idea of limits of computation. Understanding the proof (usually using diagonalization) is key to grasp the concept. Usually taught in discrete math or Theory of Computation.",
+ "subfield": "Theory of Computation"
+ },
+ {
+ "theorem": "The Time Complexity of Binary Search",
+ "description": "In the worst case, searching for an element in a sorted array using binary search requires O(log n) time, where n is the number of elements in the array. This efficiency arises from repeatedly dividing the search interval in half.",
+ "difficulty": "Medium",
+ "remark": "Highlights the power of divide-and-conquer algorithms. Illustrates why sorted data structures are often essential. Requires understanding of logarithms",
+ "subfield": "Algorithms"
+ },
+ {
+ "theorem": "The Correctness of Simple Sorting Algorithm (e.g. Bubble Sort)",
+ "description": "Bubble sort repeatedly compares adjacent elements and swaps them if they are in the wrong order. We can formally prove that after n-1 passes, the array will be sorted. Proving it involves demonstrating that the largest element is 'bubbled' to the end of the array in each pass, by using loop invariants.",
+ "difficulty": "Medium",
+ "remark": "Demonstrates how to formally analyze simple algorithms for their correctness and requires some understanding of loop invariants. Useful for introduction to proofs in algorithm.",
+ "subfield": "Algorithms"
+ },
+ {
+ "theorem": "The Church-Turing Thesis",
+ "description": "All models of computation that we know can compute what is Turing computable. In other words, if an effective method (algorithm) for solving a problem exists at all, then a Turing machine can also compute a solution, and vice versa.",
+ "difficulty": "Medium",
+ "remark": "A fundamental principle in theoretical computer science. It defines the limit of computability. It links different computational models to a single class. Requires an understanding of the Turing Machine.",
+ "subfield": "Theory of Computation"
+ },
+ {
+ "theorem": "The Relationship between Recursion and Induction",
+ "description": "Recursive functions can be proven correct and analyzed with mathematical induction. The base case of induction matches the base case in the recursive function. The induction step corresponds to the recursive step.",
+ "difficulty": "Medium",
+ "remark": "Connects two key concepts in Computer Science. Illustrates how induction can be used to prove correctness of recursive algorithms and mathematical induction can be used to define recursive functions. Important for formal analysis.",
+ "subfield": "Programming Fundamentals"
+ },
+ {
+ "theorem": "Chroma Subsampling",
+ "description": "Chroma subsampling is a technique used in digital image processing to reduce the amount of data required to represent an image. It involves reducing the number of color channels or samples per pixel in an image, typically by using fewer bits for chroma (color) information.",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Image Processing"
+ },
+ {
+ "theorem": "Median filtering",
+ "description": "Median filtering is a non-linear digital filtering technique that is used to remove noise from an image or signal. It works by replacing each pixel with the median value of the pixels in its neighborhood.",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Image Processing"
+ },
+ {
+ "theorem": "Shannon Lower bound",
+ "description": "The Shannon Lower Bound refers to a theoretical limit in information theory that represents the minimum entropy or information required to encode a random source. It is tied to the Shannon Entropy, which quantifies the average information content of a random variable. Here's a breakdown of what it means:",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Information Theory"
+ },
+ {
+ "theorem": "Dijkstra's algorithm",
+ "description": "maintains a priority queue of vertices in the graph ordered by distance from the start and repeatedly selects the next shortest path to an unconnected part of the graph",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Graph Theory"
+ },
+ {
+ "theorem": "K-means clustering",
+ "description": "K-means clustering is a method of clustering that partitions the dataset into K clusters, where each cluster is represented by its centroid or center point.",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Machine Learning"
+ },
+ {
+ "theorem": "K-nearest neighbors",
+ "description": "K-nearest neighbors (KNN) is a simple and effective classification algorithm that works by finding the K closest data points in the training set to a new data point and then assigning the class label based on the majority class of these neighbors.",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Machine Learning"
+ },
+ {
+ "theorem": "Gradient descent",
+ "description": "Common optimization algorithm used in machine learning to minimize a loss function.",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Machine Learning"
+ },
+ {
+ "theorem": "Markov Decision Processes",
+ "description": "A Markov decision process (MDP) refers to a stochastic decision-making process that uses a mathematical framework to model the decision-making of a dynamic system.",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Machine Learning"
+ },
+ {
+ "theorem": "ALOHA network",
+ "description": "ALOHA is basically a multiple access protocol which describes how all the terminals can access a medium without interfering at all with one another or even colliding. It operates at the data-link layer.",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Computer Networks"
+ },
+ {
+ "theorem": "Discrete Cosine Transform",
+ "description": "A discrete cosine transform (DCT) expresses a finite sequence of data points in terms of a sum of cosine functions oscillating at different frequencies.",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Digital Signal Processing"
+ },
+ {
+ "theorem": "Master Theorem",
+ "description": "The master theorem is used in calculating the time complexity of recurrence relations (divide and conquer algorithms) in a simple and quick way.",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Algorithms"
+ },
+ {
+ "theorem": "Fast Fourier Transform",
+ "description": "A fast Fourier transform (FFT) is an algorithm that computes the Discrete Fourier Transform (DFT) of a sequence, or its inverse (IDFT).",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Digital Signal Processing"
+ },
+ {
+ "theorem": "SR latch",
+ "description": "S-R latches i.e., Set-Reset latches are the simplest form of latches and are implemented using two inputs: S (Set) and R (Reset).",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Digital Logic"
+ },
+ {
+ "theorem": "TCP Reno",
+ "description": "TCP Reno is a classic congestion control algorithm that was introduced in the early 1990s. It uses a mechanism called additive increase multiplicative decrease (AIMD) to adjust the TCP window size, which is the amount of data that can be sent without waiting for an acknowledgment.",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Computer Networks"
+ },
+ {
+ "theorem": "Chord P2P Network and finger table",
+ "description": "Chord addresses peer addressability and peer findability and message routability challenges by organizing all peers in the P2P network into a single virtual ring.",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Computer Networks"
+ }
+]
\ No newline at end of file
diff --git a/data/thb_medium/math.json b/data/thb_medium/math.json
new file mode 100644
index 0000000000000000000000000000000000000000..3bc754c11e68ad0111c16f1869ecb6405a99cec2
--- /dev/null
+++ b/data/thb_medium/math.json
@@ -0,0 +1,142 @@
+[
+ {
+ "theorem": "The Factor Theorem",
+ "description": "A polynomial f(x) has a factor (x - a) if and only if f(a) = 0. This theorem helps in finding roots and factors of polynomials.",
+ "difficulty": "Medium",
+ "remark": "Crucial for solving polynomial equations and understanding polynomial behavior.",
+ "subfield": "Algebra"
+ },
+ {
+ "theorem": "The Law of Sines",
+ "description": "In any triangle, the ratio of the length of a side to the sine of its opposite angle is constant. If a, b, and c are the side lengths, and A, B, and C are the opposite angles, then a/sin(A) = b/sin(B) = c/sin(C).",
+ "difficulty": "Medium",
+ "remark": "Useful for solving triangles when you have angle-side relationships.",
+ "subfield": "Trigonometry"
+ },
+ {
+ "theorem": "The Binomial Theorem",
+ "description": "For any non-negative integer n and real numbers a and b, (a + b)^n = Σ(k=0 to n) [n choose k] a^(n-k) b^k, where [n choose k] is the binomial coefficient, also written as nCk. It gives a formula for expanding powers of binomials.",
+ "difficulty": "Medium",
+ "remark": "Important in algebra, combinatorics, and probability.",
+ "subfield": "Algebra"
+ },
+ {
+ "theorem": "The Intermediate Value Theorem",
+ "description": "If f(x) is a continuous function on a closed interval [a, b] and k is any number between f(a) and f(b), then there exists at least one number c in the interval [a, b] such that f(c) = k. This theorem helps to find roots and demonstrate the behavior of continuous functions.",
+ "difficulty": "Medium",
+ "remark": "Fundamental for understand continuous functions in calculus",
+ "subfield": "Calculus"
+ },
+ {
+ "theorem": "The Cosine Rule",
+ "description": "In any triangle, the square of the length of one side is equal to the sum of the squares of the lengths of the other two sides minus twice the product of the lengths of those two sides multiplied by the cosine of the angle between them. For a triangle with side lengths a, b, c, and opposite angles A, B, C: a² = b² + c² - 2bc*cos(A). Similar formulas are valid for b² and c².",
+ "difficulty": "Medium",
+ "remark": "Used in any triangle to solve for sides and/or angles",
+ "subfield": "Trigonometry"
+ },
+ {
+ "theorem": "The Divergence Test",
+ "description": "If lim (n→∞) aₙ ≠ 0 or doesn't exist, then the series ∑aₙ diverges. It is a simple test to identify divergent series but will not be able to determine if the series is convergent.",
+ "difficulty": "Medium",
+ "remark": "An important initial check when examining series convergence.",
+ "subfield": "Calculus"
+ },
+ {
+ "theorem": "The Squeeze Theorem (or Sandwich Theorem)",
+ "description": "If g(x) ≤ f(x) ≤ h(x) for all x near a (except possibly at a), and if lim(x→a) g(x) = L and lim(x→a) h(x) = L, then lim(x→a) f(x) = L. Useful for evaluating limits when direct calculation is difficult, by bounding a function between two simpler functions.",
+ "difficulty": "Medium",
+ "remark": "Commonly used in calculus for finding challenging limits.",
+ "subfield": "Calculus"
+ },
+ {
+ "theorem": "The Chain Rule",
+ "description": "The chain rule is a formula for finding the derivative of a composite function. It states that the derivative of a function composed of two functions is the product of the derivative of the outer function and the derivative of the inner function.",
+ "difficulty": "Medium",
+ "remark": "Commonly used in calculus for finding the derivative of composite functions.",
+ "subfield": "Calculus"
+ },
+ {
+ "theorem": "Product Rule",
+ "description": "The product rule is a formula for finding the derivative of a product of two functions. It states that the derivative of a product of two functions is the sum of the product of the first function and the derivative of the second function, and the product of the second function and the derivative of the first function.",
+ "difficulty": "Medium",
+ "remark": "Commonly used in calculus for finding the derivative of products of functions.",
+ "subfield": "Calculus"
+ },
+ {
+ "theorem": "Quotient Rule",
+ "description": "The quotient rule is a formula for finding the derivative of a quotient of two functions. It states that the derivative of a quotient of two functions is the quotient of the derivative of the numerator and the denominator, minus the product of the numerator and the derivative of the denominator, all divided by the square of the denominator.",
+ "difficulty": "Medium",
+ "remark": "Commonly used in calculus for finding the derivative of quotients of functions.",
+ "subfield": "Calculus"
+ },
+ {
+ "theorem": "Power Rule",
+ "description": "The power rule is a formula for finding the derivative of a power of a function. It states that the derivative of a power of a function is the product of the power and the derivative of the function.",
+ "difficulty": "Medium",
+ "remark": "Commonly used in calculus for finding the derivative of powers of functions.",
+ "subfield": "Calculus"
+ },
+ {
+ "theorem": "Integration by Substitution",
+ "description": "Integration by substitution is a technique used to simplify the integration of a function by substituting a new variable for the original variable.",
+ "difficulty": "Medium",
+ "remark": "Commonly used in calculus for finding the integral of functions.",
+ "subfield": "Calculus"
+ },
+ {
+ "theorem": "Disk & Washer Method",
+ "description": "The washer method formula is used to find the volume of two functions that are rotated around the x-axis.",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Calculus"
+ },
+ {
+ "theorem": "Extreme value theorem",
+ "description": "if 𝑓 is a continuous function over a finite, closed interval, then 𝑓 has an absolute maximum and an absolute minimum",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Calculus"
+ },
+ {
+ "theorem": "Fermat's theorem",
+ "description": "if 𝑓 has a local extremum at 𝑐, then 𝑐 is a critical point of 𝑓",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Calculus"
+ },
+ {
+ "theorem": "Mean Value Theorem",
+ "description": "Mean Value Theorem states that if a function f is continuous on the closed interval [a,b] and differentiable on the open interval (a,b), then there exists a point c in the interval (a,b) such that f'(c) is equal to the function's average rate of change over [a,b].",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Calculus"
+ },
+ {
+ "theorem": "Newton-Raphson method",
+ "description": "The Newton-Raphson method, also known as the Newton's method, is a widely used iterative numerical technique for finding the approximate roots of a real-valued function. It is named after Sir Isaac Newton and Joseph Raphson, who independently developed the method in the 17th century.\n\nThe method is based on the idea of linear approximation, where a function is approximated by its tangent line at a given point. The intersection of this tangent line with the x-axis provides a better approximation of the root than the initial point. This process is then repeated iteratively until the desired level of accuracy is achieved.\n\nGiven a function f(x) and an initial guess x0 for the root, the Newton-Raphson method can be described by the following iterative formula:\n\nx1 = x0 - f(x0) / f'(x0)\n\nHere, f'(x0) is the derivative of the function f(x) evaluated at the point x0. The new approximation x1 is then used as the starting point for the next iteration, and the process is repeated until the difference between successive approximations is smaller than a predefined tolerance level or a maximum number of iterations is reached.\n\nThe Newton-Raphson method converges rapidly when the initial guess is close to the actual root and the function is well-behaved. However, the method may fail to converge or converge to a wrong root if the initial guess is not close enough to the actual root, or if the function has multiple roots, or if the derivative of the function is zero or nearly zero at the root.\n\nDespite these limitations, the Newton-Raphson method is widely used in various fields of science and engineering due to its simplicity and fast convergence properties when applied to well-behaved functions.",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Numerical Analysis"
+ },
+ {
+ "theorem": "Rolle's theorem",
+ "description": "Rolle's theorem or Rolle's lemma essentially states that any real-valued differentiable function that attains equal values at two distinct points must have at least one point, somewhere between them, at which the slope of the tangent line is zero.",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Calculus"
+ },
+ {
+ "theorem": "Second derivative test",
+ "description": "The second partial derivatives test classifies the point as a local maximum or local minimum.",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Calculus"
+ },
+ {
+ "theorem": "Pappus's Theorem",
+ "description": "Pappus's centroid theorem is either of two related theorems dealing with the surface areas and volumes of surfaces and solids of revolution.",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Geometry"
+ }
+]
\ No newline at end of file
diff --git a/data/thb_medium/physics.json b/data/thb_medium/physics.json
new file mode 100644
index 0000000000000000000000000000000000000000..1893322b454124c442976bdb7135755f67ba728a
--- /dev/null
+++ b/data/thb_medium/physics.json
@@ -0,0 +1,142 @@
+[
+ {
+ "theorem": "The Work-Energy Theorem",
+ "description": "The net work done on an object is equal to the change in its kinetic energy. Mathematically, this is expressed as W_net = \u0394KE, where W_net is the net work and \u0394KE is the change in kinetic energy.",
+ "difficulty": "Medium",
+ "remark": "This theorem connects force, displacement, and energy. It's crucial for analyzing motion when forces are not constant or when the detailed time evolution is not needed. It's often used to solve problems involving motion and energy transfer.",
+ "subfield": "Classical Mechanics"
+ },
+ {
+ "theorem": "The Law of Conservation of Energy",
+ "description": "In a closed system, the total energy remains constant; it can transform from one form to another (e.g., potential to kinetic) but cannot be created or destroyed. Mathematically, E_total_initial = E_total_final.",
+ "difficulty": "Medium",
+ "remark": "This is a fundamental principle in physics, applicable to a wide range of scenarios from mechanics to thermodynamics. It simplifies problem-solving by focusing on energy balance rather than detailed force interactions.",
+ "subfield": "Classical Mechanics"
+ },
+ {
+ "theorem": "The Law of Universal Gravitation",
+ "description": "Any two objects with mass attract each other with a force that is directly proportional to the product of their masses and inversely proportional to the square of the distance between their centers. F = G(m\u2081m\u2082)/r\u00b2, where G is the gravitational constant.",
+ "difficulty": "Medium",
+ "remark": "This law describes the gravitational force that governs the motions of celestial bodies and explains why things fall towards the earth. Its mathematical form shows the distance dependence of the gravitational force.",
+ "subfield": "Gravitation"
+ },
+ {
+ "theorem": "Archimedes' Principle",
+ "description": "An object immersed in a fluid experiences an upward buoyant force equal to the weight of the fluid displaced by the object. This principle explains buoyancy and is crucial for understanding why objects float or sink.",
+ "difficulty": "Medium",
+ "remark": "Connects the density of a fluid, the volume of displaced fluid, and the buoyant force. It's used to design boats and determine densities through buoyancy measurements.",
+ "subfield": "Fluid Mechanics"
+ },
+ {
+ "theorem": "The Doppler Effect",
+ "description": "Describes the change in frequency of a wave (sound or light) when the source and the observer are moving relative to each other. The perceived frequency shifts higher when the source and observer move closer and lower when they move apart. The mathematical formulation differs for sound and light.",
+ "difficulty": "Medium",
+ "remark": "Has applications in areas like radar speed guns, medical imaging, astronomy for finding the recession velocity of galaxies. It's crucial in understanding wave phenomena in a dynamic context.",
+ "subfield": "Wave Physics"
+ },
+ {
+ "theorem": "The Principle of Superposition of Waves",
+ "description": "When two or more waves overlap in a medium, the resultant displacement at any point is the vector sum of the displacements of the individual waves at that point. This principle governs wave interference and diffraction phenomena.",
+ "difficulty": "Medium",
+ "remark": "Explains how waves combine with each other. Its application can create both constructive and destructive interference effects. Essential in understanding the behavior of light and sound, diffraction gratings.",
+ "subfield": "Wave Physics"
+ },
+ {
+ "theorem": "Kepler's laws of planetary motion",
+ "description": "These laws describe the motion of planets around the sun. Kepler's First Law states that planets orbit in elliptical paths with the sun at one of the two foci. Kepler's Second Law states that a line drawn from the sun to a planet sweeps out equal areas in equal times. Kepler's Third Law relates the orbital period of a planet to its average distance from the sun.",
+ "difficulty": "Medium",
+ "remark": "These laws are crucial for understanding the motion of planets and are used in astronomy and space science.",
+ "subfield": "Astrophysics"
+ },
+ {
+ "theorem": "Gauss's law",
+ "description": "Gauss's law states that the electric flux through any closed surface is equal to the charge enclosed by the surface divided by the permittivity of free space.",
+ "difficulty": "Medium",
+ "remark": "This law is fundamental to understanding the relationship between electric fields and charges. It's used in electrostatics and electromagnetism to calculate electric fields around charged objects.",
+ "subfield": "Electromagnetism"
+ },
+ {
+ "theorem": "Stokes' law",
+ "description": "Stokes' Law describes the force of viscous drag on a small spherical object moving through a viscous fluid.",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Fluid Mechanics"
+ },
+ {
+ "theorem": "Bernoulli's principle",
+ "description": "Bernoulli's principle is a key concept in fluid dynamics that relates pressure, density, speed and height. Bernoulli's principle states that an increase in the speed of a parcel of fluid occurs simultaneously with a decrease in either the pressure or the height above a datum.",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Fluid Mechanics"
+ },
+ {
+ "theorem": "Poiseuille's law",
+ "description": "the rate of laminar flow of an incompressible fluid in a tube.",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Fluid Mechanics"
+ },
+ {
+ "theorem": "Stefan-Boltzmann Law of Radiation",
+ "description": "The Stefan–Boltzmann law, also known as Stefan's law, describes the intensity of the thermal radiation emitted by matter in terms of that matter's temperature. It is named for Josef Stefan, who empirically derived the relationship, and Ludwig Boltzmann who derived the law theoretically.",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Thermodynamics"
+ },
+ {
+ "theorem": "Carnot cycle",
+ "description": "A Carnot cycle is an ideal thermodynamic cycle proposed by French physicist Sadi Carnot in 1824.",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Thermodynamics"
+ },
+ {
+ "theorem": "Electromagnetic spectrum",
+ "description": "The electromagnetic spectrum is the full range of electromagnetic radiation, organized by frequency or wavelength. The spectrum is divided into separate bands, with different names for the electromagnetic waves within each band.",
+ "difficulty": "Easy",
+ "remark": "",
+ "subfield": "Electromagnetism"
+ },
+ {
+ "theorem": "Ampere's law",
+ "description": "In classical electromagnetism, Ampère's circuital law relates the circulation of a magnetic field around a closed loop to the electric current passing through the loop.",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Electromagnetism"
+ },
+ {
+ "theorem": "Brewster's law",
+ "description": "Brewster's law is a relationship of light waves at the maximum polarization angle of light.",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Optics"
+ },
+ {
+ "theorem": "Brownian motion",
+ "description": "Brownian motion is the seemingly random motion of particles within a liquid or gas that emerges from constant collisions and redirection from impacting the atoms or molecules within the fluid. All matter is in constant motion which results in Brownian motion.",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Statistical Physics"
+ },
+ {
+ "theorem": "Hubble's law",
+ "description": "Hubble's law, also known as the Hubble–Lemaître law, is the observation in physical cosmology that galaxies are moving away from Earth at speeds proportional to their distance. In other words, the farther a galaxy is from the Earth, the faster it moves away.",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Astrophysics"
+ },
+ {
+ "theorem": "Tsiolkovsky rocket equation",
+ "description": "It is a mathematical equation that describes the motion of a rocket in a vacuum and is used to calculate the velocity, acceleration, and thrust of the rocket.",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Classical Mechanics"
+ },
+ {
+ "theorem": "Hall Effect",
+ "description": "Hall effect is a process in which a transverse electric field is developed in a solid material when the material carrying an electric current is placed in a magnetic field that is perpendicular to the current.",
+ "difficulty": "Medium",
+ "remark": "",
+ "subfield": "Electromagnetism"
+ }
+]
\ No newline at end of file
diff --git a/eval_suite/__init__.py b/eval_suite/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/eval_suite/image_utils.py b/eval_suite/image_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..8920af533f45b5509adbf7fb9c3e689c9b0dc98e
--- /dev/null
+++ b/eval_suite/image_utils.py
@@ -0,0 +1,104 @@
+import os
+import tempfile
+
+import numpy as np
+from PIL import Image, ImageOps
+from moviepy import VideoFileClip
+
+from eval_suite.prompts_raw import _image_eval
+from eval_suite.utils import extract_json, convert_score_fields, calculate_geometric_mean
+from mllm_tools.utils import _prepare_text_image_inputs
+from src.core.parse_video import image_with_most_non_black_space
+
+def extract_key_frames(video_path, output_dir, num_chunks):
+ """Extract key frames from a video by dividing it into chunks and selecting representative frames.
+
+ Args:
+ video_path (str): Path to the input video file
+ output_dir (str): Directory where extracted frames will be saved
+ num_chunks (int): Number of chunks to divide the video into
+
+ Returns:
+ list: List of paths to the extracted key frames
+ """
+ # Create output directory if it doesn't exist
+ os.makedirs(output_dir, exist_ok=True)
+
+ # Extract all frames from the video
+ clip = VideoFileClip(video_path)
+ frames = list(clip.iter_frames(fps=1)) # one frame every second
+
+ total_frames = len(frames)
+ if total_frames == 0:
+ print("No frames extracted from the video.")
+ return []
+
+ # Determine the number of frames per chunk
+ frames_per_chunk = total_frames // num_chunks
+ num_chunks = min(num_chunks, (total_frames + frames_per_chunk - 1) // frames_per_chunk)
+
+ key_frames = []
+
+ # Process each chunk of frames
+ for i in range(num_chunks):
+ start_idx = i * frames_per_chunk
+ end_idx = min((i + 1) * frames_per_chunk, total_frames)
+ chunk_frames = frames[start_idx:end_idx]
+
+ if chunk_frames:
+ # Save the frame with most non-black space
+ output_path = os.path.join(output_dir, f"key_frame_{i+1}.jpg")
+ result = image_with_most_non_black_space(chunk_frames, output_path)
+ else:
+ print(f"No frames in chunk {i+1}. Skipping.")
+ result = None
+
+ if result is not None:
+ key_frames.append(output_path)
+ clip.close()
+
+ return key_frames
+
+
+def evaluate_sampled_images(model, video_path, description="No description provided", num_chunks=10, output_folder=None):
+ """Evaluate sampled frames from a video using an image evaluation model.
+
+ Args:
+ model: The image evaluation model to use
+ video_path (str): Path to the input video file
+ description (str, optional): Description of the video content. Defaults to "No description provided"
+ num_chunks (int, optional): Number of chunks to divide the video into. Defaults to 10
+ output_folder (str, optional): Directory for temporary files. Defaults to None
+
+ Returns:
+ dict: Dictionary containing evaluation scores and individual frame assessments with keys:
+ - evaluation: Dictionary of averaged scores for each criterion
+ - image_chunks: List of individual frame evaluation results
+ """
+ with tempfile.TemporaryDirectory(dir=output_folder) as temp_dir:
+ key_frames = extract_key_frames(video_path, temp_dir, num_chunks)
+
+ prompt = _image_eval.format(description=description)
+
+ responses = []
+ for key_frame in key_frames:
+ inputs = _prepare_text_image_inputs(prompt, key_frame)
+ response = model(inputs)
+ response_json = extract_json(response)
+ response_json = convert_score_fields(response_json)
+ responses.append(response_json)
+
+ criteria = list(responses[0]["evaluation"].keys())
+ scores_dict = {c: [] for c in criteria}
+ for response in responses:
+ for key, val in response["evaluation"].items():
+ scores_dict[key].append(val["score"])
+
+ res_score = {}
+ for key, scores in scores_dict.items():
+ res_score[key] = {"score": calculate_geometric_mean(scores)}
+
+ return {
+ "evaluation": res_score,
+ "image_chunks": responses
+ }
diff --git a/eval_suite/parse_prompt.py b/eval_suite/parse_prompt.py
new file mode 100644
index 0000000000000000000000000000000000000000..94f8da056377cae088dc405bb1ef771479d7a109
--- /dev/null
+++ b/eval_suite/parse_prompt.py
@@ -0,0 +1,54 @@
+import os
+from tqdm import tqdm
+
+
+def call_parse_prompt():
+ """
+ Locates the prompts_raw directory and generates an __init__.py file containing prompt texts.
+
+ Searches for prompts_raw directory in current and parent directories. Once found, calls
+ create_python_file_with_texts() to generate the __init__.py file.
+ """
+ current_file_path = os.path.abspath(__file__)
+ current_folder_path = os.path.dirname(current_file_path)
+ folder_path = os.path.join(current_folder_path, "prompts_raw")
+
+ # If prompts_raw not found in current directory, search parent directories
+ if not os.path.exists(folder_path):
+ parent_dir = current_folder_path
+ while parent_dir != os.path.dirname(parent_dir): # Stop at root directory
+ parent_dir = os.path.dirname(parent_dir)
+ test_path = os.path.join(parent_dir, "prompts_raw")
+ if os.path.exists(test_path):
+ folder_path = test_path
+ break
+
+ output_file = os.path.join(folder_path, "__init__.py")
+ create_python_file_with_texts(folder_path, output_file)
+
+
+def create_python_file_with_texts(folder_path, output_file):
+ """
+ Creates a Python file containing prompt texts from .txt files.
+
+ Args:
+ folder_path (str): Path to directory containing prompt .txt files
+ output_file (str): Path where the output __init__.py file will be created
+
+ The function reads all .txt files in the given folder, converts their contents into
+ Python variables, and writes them to the output file. Variable names are derived from
+ file paths with special characters replaced.
+ """
+ with open(output_file, 'w', encoding='utf-8') as out_file:
+ out_file.write("# This file is generated automatically through parse_prompt.py\n\n")
+ txt_files = [file for root, dirs, files in os.walk(folder_path) for file in files if file.endswith(".txt")]
+ for file in tqdm(txt_files, desc="Processing files"):
+ file_path = os.path.join(folder_path, file)
+ var_name = "_" + file_path.replace(folder_path, "").replace(os.sep, "_").replace(".txt", "").strip("_")
+ with open(file_path, 'r', encoding='utf-8') as f:
+ content = f.read().replace('"""', '\"\"\"')
+ out_file.write(f'{var_name} = """{content}"""\n\n')
+
+
+if __name__ == "__main__":
+ call_parse_prompt()
\ No newline at end of file
diff --git a/eval_suite/prompts_raw/__init__.py b/eval_suite/prompts_raw/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..87732bc1e6529df4877d81ddec9982a865534f30
--- /dev/null
+++ b/eval_suite/prompts_raw/__init__.py
@@ -0,0 +1,145 @@
+# This file is generated automatically through parse_prompt.py
+
+_video_eval_new = """# Task: Video Frame Quality Evaluation
+
+You are tasked with analyzing and scoring a chunk of a theorem explanation video. Note that you may not have the full context of the video. Your job is to assign a score from 1 to 5 for each criterion. Please provide a brief justification for your scores.
+
+## Evaluation Criteria
+
+1. **Visual Consistency**
+ - Style Consistency: Does the visual style remain consistent across frames?
+ - Smoothness: Are the motions and transitions smooth?
+
+## Scoring Instructions
+1. Assign a score from **1 to 5** for each dimension:
+ - **1**: Very poor quality, completely fails to meet the criteria.
+ - **2**: Below average, significant issues present.
+ - **3**: Acceptable, meets the basic criteria with minor issues.
+ - **4**: Good, performs well with no major issues.
+ - **5**: Excellent, fully meets or exceeds expectations.
+2. Provide a comprehensive evaluation for each dimension.
+3. Format your output in **JSON**
+
+### JSON Output Format
+```json
+{{
+ "overall_analysis": "[Provide a general assessment of the video's quality]",
+ "evaluation": {{
+ "visual_consistency": {{
+ "comprehensive_evaluation": "[Analysis of visual consistency]",
+ "score": [1-5]
+ }}
+ }}
+}}
+```
+
+Description of the theorem:
+{description}
+
+Video chunk:"""
+
+_text_eval_new = """You are a specialist in evaluating theorem explanation videos, known for giving clear and objective feedback. You will be given the transcript of a video. Your task is to evaluate and score the content of the video in several dimensions.
+
+### Task Objective
+1. Perform an overall analysis of the video.
+ * Identify the topic of the video.
+ * Note your general thoughts and impression of the video, and any findings and observations.
+2. Conduct a comprehensive evaluation and score each criterion in the given dimensions.
+ * Analyze how well or poorly the video meets each criterion.
+ * Assign a score from **1 to 5** for each dimension:
+ - **1**: Very poor quality, completely fails to meet the criteria.
+ - **2**: Below average, significant issues present.
+ - **3**: Acceptable, meets the basic criteria with minor issues.
+ - **4**: Good, performs well with no major issues.
+ - **5**: Excellent, fully meets or exceeds expectations.
+3. Output the results in the specified JSON format.
+
+### Evaluation Criteria
+1. **Accuracy and Depth**
+ - Does the narration explain the theorem accurately?
+ - Does the video provide intuitive and/or rigorous explanations for why the theorem holds?
+2. **Logical Flow**
+ - Does the video follow a clear and logical structure?
+ - Does the video present a coherent buildup of ideas?
+
+### Notes
+* You do not have access to the visual portion of the video as you are given only the textual portion. Do not reference or commentate on the visuals as they will be evaluated separately - just assume that there are reasonable visuals (e.g., geometric objects, graphs of functions, and calculations) to accompany the narration.
+* The evaluation criteria are intended to be independent of each other. Do not restate the same violation in multiple criteria; only consider it in the most relevant criterion.
+
+### Output Format
+```json
+{{
+ "overall_analysis": "[Overall analysis]",
+ "evaluation": {{
+ "accuracy_and_depth": {{
+ "comprehensive_evaluation": "[Analysis of accuracy and depth]",
+ "score": [1-5]
+ }},
+ "logical_flow": {{
+ "comprehensive_evaluation": "[Analysis of logical flow]",
+ "score": [1-5]
+ }}
+ }}
+}}
+```
+
+The transcript of the video is as follows:
+{transcript}
+"""
+
+_fix_transcript = """You are an expert in YouTube video transcripts. There is a transcript that was automatically generated through YouTube, so it lacks proper capitalization and punctuation. Your task is to fix the transcript so that there is proper punctuation, capitalization, and spacing. Do not make other modifications (e.g., keep the original word choice).
+
+You should enclose the fixed transcript with a block, i.e.:
+
+
+Original transcript: {transcript}
+"""
+
+_image_eval = """# Task: Video Frame Quality Evaluation
+
+You are tasked with analyzing and scoring a frame taken from a theorem explanation video. Note that you may not have the context of the video, so the captured frame may be a frame where some motion of visual elements is taking place. Your job is to assign a score from 1 to 5 for each criterion. Please provide a brief justification for your scores.
+
+## Evaluation Criteria
+
+1. **Visual Relevance**
+ - Does the video frame align with the theorem's concepts and derivations?
+
+2. **Element Layout**
+ - Placemend and Size: Are the visual elements well-placed and appropriately sized within the frame?
+ - Overlap: Are the visual elements free of unintentional overlap?
+ - Clarity: Is the visual information conveyed in the frame clear and easy to understand?
+
+## Scoring Instructions
+1. Assign a score from **1 to 5** for each dimension:
+ - **1**: Very poor quality, completely fails to meet the criteria.
+ - **2**: Below average, significant issues present.
+ - **3**: Acceptable, meets the basic criteria with minor issues.
+ - **4**: Good, performs well with no major issues.
+ - **5**: Excellent, fully meets or exceeds expectations.
+2. Provide a comprehensive evaluation for each dimension.
+3. Format your output in **JSON**
+
+### JSON Output Format
+```json
+{{
+ "overall_analysis": "[Provide a general assessment of the image's quality]",
+ "evaluation": {{
+ "visual_relevance": {{
+ "comprehensive_evaluation": "[Analysis of visual relevance]",
+ "score": [1-5]
+ }},
+ "element_layout": {{
+ "comprehensive_evaluation": "[Analysis of element layout]",
+ "score": [1-5]
+ }}
+ }}
+}}
+```
+
+Description of the theorem:
+{description}
+
+Image:"""
+
diff --git a/eval_suite/prompts_raw/fix_transcript.txt b/eval_suite/prompts_raw/fix_transcript.txt
new file mode 100644
index 0000000000000000000000000000000000000000..c28523e3b8432da3f68ce93b576ae583d6f27023
--- /dev/null
+++ b/eval_suite/prompts_raw/fix_transcript.txt
@@ -0,0 +1,8 @@
+You are an expert in YouTube video transcripts. There is a transcript that was automatically generated through YouTube, so it lacks proper capitalization and punctuation. Your task is to fix the transcript so that there is proper punctuation, capitalization, and spacing. Do not make other modifications (e.g., keep the original word choice).
+
+You should enclose the fixed transcript with a block, i.e.:
+
+
+Original transcript: {transcript}
diff --git a/eval_suite/prompts_raw/image_eval.txt b/eval_suite/prompts_raw/image_eval.txt
new file mode 100644
index 0000000000000000000000000000000000000000..86f18a8d1d1544f430138777310819b0e4d54686
--- /dev/null
+++ b/eval_suite/prompts_raw/image_eval.txt
@@ -0,0 +1,45 @@
+# Task: Video Frame Quality Evaluation
+
+You are tasked with analyzing and scoring a frame taken from a theorem explanation video. Note that you may not have the context of the video, so the captured frame may be a frame where some motion of visual elements is taking place. Your job is to assign a score from 1 to 5 for each criterion. Please provide a brief justification for your scores.
+
+## Evaluation Criteria
+
+1. **Visual Relevance**
+ - Does the video frame align with the theorem's concepts and derivations?
+
+2. **Element Layout**
+ - Placemend and Size: Are the visual elements well-placed and appropriately sized within the frame?
+ - Overlap: Are the visual elements free of unintentional overlap?
+ - Clarity: Is the visual information conveyed in the frame clear and easy to understand?
+
+## Scoring Instructions
+1. Assign a score from **1 to 5** for each dimension:
+ - **1**: Very poor quality, completely fails to meet the criteria.
+ - **2**: Below average, significant issues present.
+ - **3**: Acceptable, meets the basic criteria with minor issues.
+ - **4**: Good, performs well with no major issues.
+ - **5**: Excellent, fully meets or exceeds expectations.
+2. Provide a comprehensive evaluation for each dimension.
+3. Format your output in **JSON**
+
+### JSON Output Format
+```json
+{{
+ "overall_analysis": "[Provide a general assessment of the image's quality]",
+ "evaluation": {{
+ "visual_relevance": {{
+ "comprehensive_evaluation": "[Analysis of visual relevance]",
+ "score": [1-5]
+ }},
+ "element_layout": {{
+ "comprehensive_evaluation": "[Analysis of element layout]",
+ "score": [1-5]
+ }}
+ }}
+}}
+```
+
+Description of the theorem:
+{description}
+
+Image:
\ No newline at end of file
diff --git a/eval_suite/prompts_raw/text_eval_new.txt b/eval_suite/prompts_raw/text_eval_new.txt
new file mode 100644
index 0000000000000000000000000000000000000000..bc224bfc50062a526d7a8575eb2aced570ad88a1
--- /dev/null
+++ b/eval_suite/prompts_raw/text_eval_new.txt
@@ -0,0 +1,47 @@
+You are a specialist in evaluating theorem explanation videos, known for giving clear and objective feedback. You will be given the transcript of a video. Your task is to evaluate and score the content of the video in several dimensions.
+
+### Task Objective
+1. Perform an overall analysis of the video.
+ * Identify the topic of the video.
+ * Note your general thoughts and impression of the video, and any findings and observations.
+2. Conduct a comprehensive evaluation and score each criterion in the given dimensions.
+ * Analyze how well or poorly the video meets each criterion.
+ * Assign a score from **1 to 5** for each dimension:
+ - **1**: Very poor quality, completely fails to meet the criteria.
+ - **2**: Below average, significant issues present.
+ - **3**: Acceptable, meets the basic criteria with minor issues.
+ - **4**: Good, performs well with no major issues.
+ - **5**: Excellent, fully meets or exceeds expectations.
+3. Output the results in the specified JSON format.
+
+### Evaluation Criteria
+1. **Accuracy and Depth**
+ - Does the narration explain the theorem accurately?
+ - Does the video provide intuitive and/or rigorous explanations for why the theorem holds?
+2. **Logical Flow**
+ - Does the video follow a clear and logical structure?
+ - Does the video present a coherent buildup of ideas?
+
+### Notes
+* You do not have access to the visual portion of the video as you are given only the textual portion. Do not reference or commentate on the visuals as they will be evaluated separately - just assume that there are reasonable visuals (e.g., geometric objects, graphs of functions, and calculations) to accompany the narration.
+* The evaluation criteria are intended to be independent of each other. Do not restate the same violation in multiple criteria; only consider it in the most relevant criterion.
+
+### Output Format
+```json
+{{
+ "overall_analysis": "[Overall analysis]",
+ "evaluation": {{
+ "accuracy_and_depth": {{
+ "comprehensive_evaluation": "[Analysis of accuracy and depth]",
+ "score": [1-5]
+ }},
+ "logical_flow": {{
+ "comprehensive_evaluation": "[Analysis of logical flow]",
+ "score": [1-5]
+ }}
+ }}
+}}
+```
+
+The transcript of the video is as follows:
+{transcript}
diff --git a/eval_suite/prompts_raw/video_eval_new.txt b/eval_suite/prompts_raw/video_eval_new.txt
new file mode 100644
index 0000000000000000000000000000000000000000..36ea4bbbbb0fa09eb373928e79e1a4e8a1a8e74c
--- /dev/null
+++ b/eval_suite/prompts_raw/video_eval_new.txt
@@ -0,0 +1,37 @@
+# Task: Video Frame Quality Evaluation
+
+You are tasked with analyzing and scoring a chunk of a theorem explanation video. Note that you may not have the full context of the video. Your job is to assign a score from 1 to 5 for each criterion. Please provide a brief justification for your scores.
+
+## Evaluation Criteria
+
+1. **Visual Consistency**
+ - Style Consistency: Does the visual style remain consistent across frames?
+ - Smoothness: Are the motions and transitions smooth?
+
+## Scoring Instructions
+1. Assign a score from **1 to 5** for each dimension:
+ - **1**: Very poor quality, completely fails to meet the criteria.
+ - **2**: Below average, significant issues present.
+ - **3**: Acceptable, meets the basic criteria with minor issues.
+ - **4**: Good, performs well with no major issues.
+ - **5**: Excellent, fully meets or exceeds expectations.
+2. Provide a comprehensive evaluation for each dimension.
+3. Format your output in **JSON**
+
+### JSON Output Format
+```json
+{{
+ "overall_analysis": "[Provide a general assessment of the video's quality]",
+ "evaluation": {{
+ "visual_consistency": {{
+ "comprehensive_evaluation": "[Analysis of visual consistency]",
+ "score": [1-5]
+ }}
+ }}
+}}
+```
+
+Description of the theorem:
+{description}
+
+Video chunk:
\ No newline at end of file
diff --git a/eval_suite/text_utils.py b/eval_suite/text_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..2e6ff64cab90171fde41cf47ea20bca15d5323a6
--- /dev/null
+++ b/eval_suite/text_utils.py
@@ -0,0 +1,80 @@
+from typing import Union
+
+import pysrt
+
+from mllm_tools.litellm import LiteLLMWrapper
+from mllm_tools.gemini import GeminiWrapper
+from mllm_tools.utils import _prepare_text_inputs
+from eval_suite.prompts_raw import _fix_transcript, _text_eval_new
+from eval_suite.utils import extract_json, convert_score_fields
+
+
+def parse_srt_to_text(srt_path) -> str:
+ """
+ Parse an SRT subtitle file into plain text.
+
+ Args:
+ srt_path: Path to the SRT subtitle file.
+
+ Returns:
+ str: The subtitle text with duplicates removed and ellipses replaced.
+ """
+ subs = pysrt.open(srt_path)
+ full_text = []
+ for sub in subs:
+ sub.text = sub.text.replace("...", ".")
+ for line in sub.text.splitlines():
+ # .srt can contain repeated lines
+ if full_text and full_text[-1] == line:
+ continue
+ full_text.append(line)
+ return "\n".join(full_text)
+
+
+def fix_transcript(text_eval_model: Union[LiteLLMWrapper, GeminiWrapper], transcript: str) -> str:
+ """
+ Fix and clean up a transcript using an LLM model.
+
+ Args:
+ text_eval_model: The LLM model wrapper to use for fixing the transcript.
+ transcript: The input transcript text to fix.
+
+ Returns:
+ str: The fixed and cleaned transcript text.
+ """
+ print("Fixing transcript...")
+
+ prompt = _fix_transcript.format(transcript=transcript)
+ response = text_eval_model(_prepare_text_inputs(prompt))
+ fixed_script = response.split("")[0]
+
+ return fixed_script
+
+
+def evaluate_text(text_eval_model: LiteLLMWrapper, transcript: str, retry_limit: int) -> dict:
+ """
+ Evaluate transcript text using an LLM model with retry logic.
+
+ Args:
+ text_eval_model: The LLM model wrapper to use for evaluation.
+ transcript: The transcript text to evaluate.
+ retry_limit: Maximum number of retry attempts on failure.
+
+ Returns:
+ dict: The evaluation results as a JSON object.
+
+ Raises:
+ ValueError: If all retry attempts fail.
+ """
+ # prompt = _text_eval.format(transcript=transcript)
+ prompt = _text_eval_new.format(transcript=transcript)
+ for attempt in range(retry_limit):
+ try:
+ evaluation = text_eval_model(_prepare_text_inputs(prompt))
+ evaluation_json = extract_json(evaluation)
+ evaluation_json = convert_score_fields(evaluation_json)
+ return evaluation_json
+ except Exception as e:
+ print(f"Attempt {attempt + 1} failed: {e.__class__.__name__}: {e}")
+ if attempt + 1 == retry_limit:
+ raise ValueError("Reached maximum retry limit. Evaluation failed.") from None
diff --git a/eval_suite/utils.py b/eval_suite/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..51124375b54364eae77bf12cc8428657fb47dadc
--- /dev/null
+++ b/eval_suite/utils.py
@@ -0,0 +1,81 @@
+import json
+import re
+from math import prod
+from typing import List
+
+def extract_json(response: str) -> dict:
+ """
+ Extract JSON content from a string response.
+
+ Args:
+ response (str): String containing JSON content, possibly within code blocks.
+
+ Returns:
+ dict: Extracted and parsed JSON content.
+
+ Raises:
+ ValueError: If no valid JSON content could be extracted.
+ """
+ try:
+ evaluation_json = json.loads(response)
+ except json.JSONDecodeError:
+ # If JSON parsing fails, try to extract the content between ```json and ```
+ match = re.search(r'```json\n(.*?)\n```', response, re.DOTALL)
+ if not match:
+ # If no match for ```json, try to extract content between ``` and ```
+ match = re.search(r'```\n(.*?)\n```', response, re.DOTALL)
+
+ if match:
+ evaluation_content = match.group(1)
+ evaluation_json = json.loads(evaluation_content)
+ else:
+ raise ValueError("Failed to extract valid JSON content")
+ return evaluation_json
+
+
+def convert_score_fields(data: dict) -> dict:
+ """
+ Convert score fields in a dictionary to integers recursively.
+
+ Args:
+ data (dict): Dictionary containing score fields to convert.
+
+ Returns:
+ dict: Dictionary with score fields converted to integers.
+
+ Raises:
+ ValueError: If a score value cannot be converted to integer.
+ """
+ # Create a new dictionary with the converted values
+ converted_data = {}
+ for key, value in data.items():
+ if key == "score":
+ if isinstance(value, int):
+ converted_data[key] = value
+ elif isinstance(value, str) and value.isdigit():
+ converted_data[key] = int(value)
+ else:
+ raise ValueError(f"Invalid score value: {value!r}")
+ elif isinstance(value, dict):
+ converted_data[key] = convert_score_fields(value)
+ else:
+ converted_data[key] = value
+ return converted_data
+
+
+def calculate_geometric_mean(scores: List[int]) -> float:
+ """
+ Calculate the geometric mean of a list of scores.
+
+ Args:
+ scores (List[int]): List of integer scores, may contain None values.
+
+ Returns:
+ float: Geometric mean of non-None scores. Returns 0.0 if list is empty
+ or contains only None values.
+ """
+ scores = [s for s in scores if s is not None]
+ if not scores:
+ return 0.0
+ product = prod(scores)
+ return product ** (1 / len(scores))
diff --git a/eval_suite/video_utils.py b/eval_suite/video_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..c0b3ffdb89fda5b372866521b35a8cee2be39b5a
--- /dev/null
+++ b/eval_suite/video_utils.py
@@ -0,0 +1,167 @@
+import os
+import cv2
+import tempfile
+
+from dotenv import load_dotenv
+
+from mllm_tools.utils import _prepare_text_video_inputs
+from eval_suite.prompts_raw import _video_eval_new
+from eval_suite.utils import extract_json, convert_score_fields
+
+load_dotenv()
+
+
+def reduce_video_framerate(input_path, target_fps=1, output_path=None):
+ """
+ Reduces the frame rate of a video by only keeping frames at the target interval.
+
+ Args:
+ input_path (str): Path to the input video
+ target_fps (int): Target frames per second (default: 1)
+ output_path (str, optional): Path to save the processed video. If None, uses a temporary file.
+
+ Returns:
+ str: Path to the processed video
+
+ Raises:
+ ValueError: If input video cannot be opened or has invalid FPS
+ RuntimeError: If video writer initialization fails or output video creation fails
+ """
+ cap = cv2.VideoCapture(input_path)
+ if not cap.isOpened():
+ raise ValueError(f"Could not open input video: {input_path}")
+
+ original_fps = cap.get(cv2.CAP_PROP_FPS)
+ if original_fps <= 0:
+ raise ValueError(f"Invalid FPS ({original_fps}) detected in input video")
+
+ frame_interval = int(original_fps / target_fps)
+
+ # Get video properties
+ width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
+ height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
+
+ # Use provided output path or create temporary file
+ if output_path is None:
+ temp_output = tempfile.NamedTemporaryFile(suffix='.mp4', delete=False)
+ output_path = temp_output.name
+
+ # Ensure output directory exists
+ os.makedirs(os.path.dirname(output_path), exist_ok=True)
+
+ # Try different codecs in order of preference
+ codecs = [
+ ('avc1', '.mp4'), # H.264 codec
+ ('mp4v', '.mp4'), # MP4V codec
+ ('XVID', '.avi'), # XVID codec
+ ('MJPG', '.avi'), # Motion JPEG codec
+ ]
+
+ success = False
+ for codec, ext in codecs:
+ if output_path.endswith('.mp4') and not ext.endswith('.mp4'):
+ # If we're switching to AVI format, change the extension
+ output_path = output_path[:-4] + ext
+
+ fourcc = cv2.VideoWriter_fourcc(*codec)
+ out = cv2.VideoWriter(output_path, fourcc, target_fps, (width, height))
+
+ if out.isOpened():
+ success = True
+ print(f"Successfully initialized video writer with codec: {codec}")
+ break
+ else:
+ out.release()
+ if os.path.exists(output_path):
+ os.remove(output_path)
+
+ if not success:
+ raise RuntimeError("Could not initialize video writer with any available codec")
+
+ frame_count = 0
+ frames_written = 0
+ while cap.isOpened():
+ ret, frame = cap.read()
+ if not ret:
+ break
+
+ # Only write frames at the specified interval
+ if frame_count % frame_interval == 0:
+ out.write(frame)
+ frames_written += 1
+ frame_count += 1
+
+ cap.release()
+ out.release()
+
+ # Verify the output
+ verify_cap = cv2.VideoCapture(output_path)
+ if not verify_cap.isOpened():
+ raise RuntimeError(f"Failed to create output video at {output_path}")
+
+ actual_fps = verify_cap.get(cv2.CAP_PROP_FPS)
+ total_frames = verify_cap.get(cv2.CAP_PROP_FRAME_COUNT)
+ verify_cap.release()
+
+ if actual_fps <= 0:
+ print("Warning: Output video reports invalid FPS. This might be a codec issue.")
+ actual_fps = target_fps # Use target FPS for duration calculation
+
+ print(f"Created video with {frames_written} frames at {actual_fps} FPS")
+ print(f"Total duration: {total_frames/actual_fps:.2f} seconds")
+ print(f"Video saved to: {output_path}")
+
+ return output_path
+
+
+def evaluate_video_chunk_new(model, video_path, transcript="No transcript provided", description="No description provided",
+ save_processed_video=None, target_fps=None, retry_limit=5):
+ """
+ Evaluate a single video chunk using a multimodal model.
+
+ Args:
+ model: The multimodal model to use for evaluation
+ video_path (str): Path to the video file to evaluate
+ transcript (str, optional): Video transcript text. Defaults to "No transcript provided"
+ description (str, optional): Video description text. Defaults to "No description provided"
+ save_processed_video (str, optional): Path to save processed video. If None, uses temporary file
+ target_fps (int, optional): Target frames per second for video processing. If None, no processing
+ retry_limit (int, optional): Maximum number of retry attempts. Defaults to 5
+
+ Returns:
+ dict: Evaluation results as a JSON object with scores converted to integers
+
+ Raises:
+ FileNotFoundError: If video file does not exist
+ Exception: If evaluation fails after all retry attempts
+ """
+ if not os.path.exists(video_path):
+ raise FileNotFoundError(f"Video file not found: {video_path}")
+
+ # Only process video if target_fps is specified
+ if target_fps is not None:
+ processed_video_path = reduce_video_framerate(video_path, target_fps=target_fps, output_path=save_processed_video)
+ video_to_use = processed_video_path
+ else:
+ video_to_use = video_path
+
+ prompt = _video_eval_new.format(description=description)
+ inputs = _prepare_text_video_inputs(prompt, video_to_use)
+
+ try:
+ for attempt in range(retry_limit):
+ try:
+ response = model(inputs)
+ response_json = extract_json(response)
+ response_json = convert_score_fields(response_json)
+
+ return response_json
+ except Exception as e:
+ print(f"Attempt {attempt + 1} failed: {e}")
+ if attempt + 1 == retry_limit:
+ print("Reached maximum retry limit. Evaluation failed.")
+ raise
+ finally:
+ # Clean up the temporary processed video if we created one
+ if target_fps is not None and save_processed_video is None and os.path.exists(processed_video_path):
+ os.unlink(processed_video_path)
\ No newline at end of file
diff --git a/evaluate.py b/evaluate.py
new file mode 100644
index 0000000000000000000000000000000000000000..e6da363bcca3dbf68e7b9c9b77b3a713f5cd5128
--- /dev/null
+++ b/evaluate.py
@@ -0,0 +1,474 @@
+import os
+import json
+import argparse
+import tempfile
+from typing import Dict, List, Union
+from datetime import datetime
+
+from dotenv import load_dotenv
+from moviepy import VideoFileClip
+
+from mllm_tools.litellm import LiteLLMWrapper
+from mllm_tools.gemini import GeminiWrapper
+from eval_suite.utils import calculate_geometric_mean
+from eval_suite.text_utils import parse_srt_to_text, fix_transcript, evaluate_text
+from eval_suite.video_utils import evaluate_video_chunk_new
+from eval_suite.image_utils import evaluate_sampled_images
+
+load_dotenv()
+
+with open(os.path.join(os.path.dirname(os.path.abspath(__file__)), "src", "utils", "allowed_models.json")) as f:
+ ALLOWED_MODELS = json.load(f)["allowed_models"]
+
+
+def combine_results(output_folder: str, combined_file: str, results: Dict[str, Dict]) -> None:
+ """
+ Combine all evaluation results into a single file.
+
+ Args:
+ output_folder (str): Directory to store the combined file.
+ combined_file (str): Name of the combined file.
+ results (Dict[str, Dict]): Dictionary of evaluation results with file names as keys.
+
+ Returns:
+ None
+ """
+ combined_path = os.path.join(output_folder, combined_file)
+ with open(combined_path, 'w') as output_file:
+ json.dump(results, output_file, indent=4)
+
+
+def save_individual_result(output_folder: str, file_name: str, result: Dict) -> None:
+ """
+ Save individual evaluation result to a file.
+
+ Args:
+ output_folder (str): Directory to store the evaluation file.
+ file_name (str): Name of the file.
+ result (Dict): Evaluation result.
+
+ Returns:
+ None
+ """
+ current_time = datetime.now().strftime("%Y%m%d_%H%M%S")
+ result_file = f"evaluation_{file_name}_{current_time}.json"
+ os.makedirs(output_folder, exist_ok=True)
+ result_path = os.path.join(output_folder, result_file)
+ with open(result_path, 'w') as output_file:
+ json.dump(result, output_file, indent=4)
+
+
+def evaluate_text_file(model, transcript_path, retry_limit):
+ """
+ Evaluate a text file using the provided model.
+
+ Args:
+ model: The model to use for evaluation.
+ transcript_path (str): Path to the transcript file (.srt or .txt).
+ retry_limit (int): Number of retry attempts for evaluation.
+
+ Returns:
+ Dict or None: Evaluation results if successful, None if file format unsupported.
+ """
+ if not transcript_path.endswith(('.srt', '.txt')):
+ print(f"Skipping {transcript_path}: Unsupported file format for text evaluation.")
+ return None
+
+ if transcript_path.endswith(".srt"):
+ transcript = parse_srt_to_text(transcript_path)
+ elif transcript_path.endswith(".txt"):
+ with open(transcript_path) as f:
+ transcript = f.read().strip()
+ else:
+ raise ValueError("Unrecognized transcript file format.")
+
+ capital_letter_proportion = sum(1 for c in transcript if c.isupper()) / sum(1 for c in transcript if c.isalpha())
+ if capital_letter_proportion < 0.01:
+ transcript = fix_transcript(model, transcript)
+
+ print(f"Performing text evaluation: {os.path.basename(transcript_path)}")
+ result = evaluate_text(model, transcript, retry_limit)
+ return result
+
+
+def evaluate_video_file(model, video_path, transcript_path, description_path, target_fps=None, output_folder=None):
+ """
+ Evaluate a video file using the provided model.
+
+ Args:
+ model: The model to use for evaluation.
+ video_path (str): Path to the video file.
+ transcript_path (str): Path to the transcript file.
+ description_path (str): Path to the description file.
+ target_fps (int, optional): Target frames per second for video processing.
+ output_folder (str, optional): Directory to store output files.
+
+ Returns:
+ Dict or None: Evaluation results if successful, None if file format unsupported.
+ """
+ if not video_path.endswith(('.mp4', '.mkv')):
+ print(f"Skipping {video_path}: Unsupported file format for video evaluation.")
+ return None
+
+ moviepy_temp_dir = os.path.join(output_folder, "moviepy_temp")
+
+ # Chunking
+ num_chunks = 10
+ with VideoFileClip(video_path) as clip:
+ duration = clip.duration
+ chunk_duration = duration / num_chunks
+ results = []
+
+ # Create a temporary directory in the output_folder
+ temp_dir_parent = output_folder or os.getcwd()
+ with tempfile.TemporaryDirectory(dir=temp_dir_parent) as temp_dir:
+ for i in range(10):
+ start = i * chunk_duration
+ end = min(start + chunk_duration, duration)
+ chunk = clip.subclipped(start, end)
+ chunk_path = os.path.join(temp_dir, f"chunk_{i+1}.mp4")
+ # Explicitly set the temp_audiofile path with matching codec
+ temp_audiofile = os.path.join(moviepy_temp_dir, f"temp_audio_chunk_{i+1}.m4a")
+ chunk.write_videofile(
+ chunk_path,
+ codec="libx264",
+ audio_codec="aac",
+ temp_audiofile=temp_audiofile,
+ audio_bitrate="192k",
+ preset="ultrafast", # Speed up encoding
+ logger=None
+ )
+ # Create processed videos folder inside output_folder
+ processed_videos_dir = os.path.join(output_folder, "processed_videos")
+ save_path = os.path.join(processed_videos_dir, f"processed_chunk_{i+1}.mp4")
+ result = evaluate_video_chunk_new(
+ model,
+ chunk_path,
+ transcript_path,
+ description_path,
+ target_fps=target_fps,
+ save_processed_video=save_path
+ )
+ results.append(result)
+
+ score_dict = {}
+ for key in results[0]["evaluation"].keys():
+ score_dict[key] = []
+ for result in results:
+ score_dict[key].append(result["evaluation"][key]["score"])
+
+ evaluation = {}
+ for key, scores in score_dict.items():
+ evaluation[key] = {"score": calculate_geometric_mean(scores)}
+
+ result_json = {
+ "evaluation": evaluation,
+ "video_chunks": results
+ }
+ return result_json
+
+
+def extract_scores(data: Union[Dict, List]) -> List[int]:
+ """
+ Extract all score values from a nested dictionary or list structure.
+
+ Args:
+ data (Union[Dict, List]): The data structure to extract scores from.
+
+ Returns:
+ List[int]: List of extracted score values.
+ """
+ scores = []
+ if isinstance(data, dict):
+ for key, value in data.items():
+ if "chunks" in key:
+ continue
+ elif isinstance(value, dict) or isinstance(value, list):
+ scores.extend(extract_scores(value))
+ elif key == 'score':
+ scores.append(value)
+ elif isinstance(data, list):
+ for item in data:
+ scores.extend(extract_scores(item))
+ return scores
+
+
+def calculate_overall_score(result: Dict) -> float:
+ """
+ Calculate the overall score from evaluation results.
+
+ Args:
+ result (Dict): Dictionary containing evaluation results.
+
+ Returns:
+ float: The calculated overall score.
+ """
+ scores = extract_scores(result)
+ overall_score = calculate_geometric_mean(scores)
+ return overall_score
+
+
+def process_topic_name(topic_name: str) -> str:
+ """
+ Process a topic name by capitalizing words and handling special characters.
+
+ Args:
+ topic_name (str): The topic name to process.
+
+ Returns:
+ str: The processed topic name.
+ """
+ words = topic_name.replace("_s_", "'s_").split("_")
+ return " ".join([word.capitalize() for word in words])
+
+
+def merge_dicts(dict1: dict, dict2: dict) -> dict:
+ """
+ Recursively merge two dictionaries.
+
+ Args:
+ dict1 (dict): First dictionary.
+ dict2 (dict): Second dictionary.
+
+ Returns:
+ dict: Merged dictionary.
+ """
+ merged = dict1.copy()
+ for key, value in dict2.items():
+ if key in merged and isinstance(merged[key], dict) and isinstance(value, dict):
+ merged[key] = merge_dicts(merged[key], value)
+ else:
+ merged[key] = value
+ return merged
+
+
+def process_theorem(models, file_path: str, eval_type: str, retry_limit: int,
+ target_fps: int = None, use_parent_folder_as_topic: bool = False,
+ output_folder: str = None) -> tuple[str, dict]:
+ """
+ Process a theorem file or directory for evaluation.
+
+ Args:
+ models: Dictionary of models for different evaluation types.
+ file_path (str): Path to the file or directory to evaluate.
+ eval_type (str): Type of evaluation to perform.
+ retry_limit (int): Number of retry attempts.
+ target_fps (int, optional): Target frames per second for video processing.
+ use_parent_folder_as_topic (bool, optional): Use parent folder name as topic.
+ output_folder (str, optional): Directory to store output files.
+
+ Returns:
+ tuple[str, dict]: Tuple of file name and evaluation results.
+ """
+ ext_map = {
+ 'text': ('.txt', '.srt'),
+ 'video': ('.mp4', '.mkv')
+ }
+
+ # Handle single file evaluation
+ if os.path.isfile(file_path):
+ file_ext = os.path.splitext(file_path)[1].lower()
+ file_name = os.path.basename(file_path)
+
+ if eval_type == "text" and file_ext in ext_map['text']:
+ return file_name, evaluate_text_file(models['text'], file_path, retry_limit)
+ elif eval_type == "video" and file_ext in ext_map['video']:
+ if use_parent_folder_as_topic:
+ topic_name = os.path.basename(os.path.dirname(file_path))
+ else:
+ topic_name = None
+ topic_name = process_topic_name(topic_name)
+ return file_name, evaluate_video_file(models['video'], file_path, None, topic_name, target_fps, output_folder)
+ elif eval_type == "image" and file_ext in ext_map['video']:
+ if use_parent_folder_as_topic:
+ topic_name = os.path.basename(os.path.dirname(file_path))
+ else:
+ topic_name = None
+ topic_name = process_topic_name(topic_name)
+ return file_name, evaluate_sampled_images(models['image'], file_path, topic_name, num_chunks=10, output_folder=output_folder)
+ elif eval_type == "all":
+ raise ValueError("Evaluation type 'all' is not supported for a single file. Try passing a folder with both a video and a subtitle file.")
+ else:
+ raise ValueError(f"File type of {file_path} does not match evaluation type {eval_type!r}")
+
+ # Handle directory evaluation
+ theorem_dir = file_path
+ all_files = os.listdir(theorem_dir)
+
+ # Look for transcript files, prioritizing .srt over .txt if both exist
+ transcript_file_candidates = [f for f in all_files if f.endswith(ext_map['text']) and not f.endswith('_scene_outline.txt')]
+ srt_files = [f for f in transcript_file_candidates if f.endswith('.srt')]
+ txt_files = [f for f in transcript_file_candidates if f.endswith('.txt')]
+
+ transcript_path = None
+ if srt_files:
+ transcript_path = os.path.join(theorem_dir, srt_files[0])
+ elif txt_files:
+ transcript_path = os.path.join(theorem_dir, txt_files[0])
+
+ video_file_candidates = [f for f in all_files if f.endswith(ext_map['video'])]
+ video_path = os.path.join(theorem_dir, video_file_candidates[0]) if len(video_file_candidates) == 1 else None
+
+ topic_name = os.path.basename(theorem_dir)
+ topic_name = process_topic_name(topic_name)
+
+ if not video_path:
+ print(f"Skipping {theorem_dir}: No video file found")
+ return None, None
+
+ text_result = video_result = image_result = None
+ if eval_type == "text" or eval_type == "all":
+ if transcript_path is None:
+ print(f"Warning: No suitable transcript file found in {theorem_dir}")
+ else:
+ text_result = evaluate_text_file(models['text'], transcript_path, retry_limit)
+ if eval_type == "video" or eval_type == "all":
+ assert video_path is not None, f"Expected 1 video file, got {len(video_file_candidates)} for {theorem_dir}"
+ video_result = evaluate_video_file(models['video'], video_path, transcript_path, topic_name, target_fps, output_folder)
+ if eval_type == "image" or eval_type == "all":
+ assert video_path is not None, f"Expected 1 video file, got {len(video_file_candidates)} for {theorem_dir}"
+ image_result = evaluate_sampled_images(models['image'], video_path, topic_name, num_chunks=10, output_folder=output_folder)
+
+ if eval_type == "all":
+ result = {}
+ if text_result:
+ result = merge_dicts(result, text_result)
+ if video_result:
+ result = merge_dicts(result, video_result)
+ if image_result:
+ result = merge_dicts(result, image_result)
+ if result:
+ result["evaluation"]["overall_score"] = calculate_overall_score(result)
+ else:
+ result = text_result if eval_type == "text" else video_result if eval_type == "video" else image_result if eval_type == "image" else None
+
+ file_name = os.path.basename(theorem_dir)
+ return file_name, result
+
+
+def main():
+ """
+ Main function to run the evaluation script.
+
+ Parses command line arguments and orchestrates the evaluation process
+ for text, video, and image content using specified AI models.
+ """
+ parser = argparse.ArgumentParser(description='Automatic evaluation of theorem explanation videos with LLMs')
+ parser.add_argument('--model_text', type=str,
+ choices=ALLOWED_MODELS,
+ default='azure/gpt-4o',
+ help='Select the AI model to use for text evaluation')
+ parser.add_argument('--model_video', type=str,
+ choices=['gemini/gemini-1.5-pro-002',
+ 'gemini/gemini-2.0-flash-exp',
+ 'gemini/gemini-2.0-pro-exp-02-05'],
+ default='gemini/gemini-1.5-pro-002',
+ help='Select the AI model to use for video evaluation')
+ parser.add_argument('--model_image', type=str,
+ choices=ALLOWED_MODELS,
+ default='azure/gpt-4o',
+ help='Select the AI model to use for image evaluation')
+ parser.add_argument('--eval_type', type=str, choices=['text', 'video', 'image', 'all'], default='all', help='Type of evaluation to perform')
+ parser.add_argument('--file_path', type=str, help='Path to a file or a theorem folder', required=True)
+ parser.add_argument('--output_folder', type=str, help='Directory to store the evaluation files', required=True)
+ parser.add_argument('--retry_limit', type=int, default=3, help='Number of retry attempts for each inference')
+ parser.add_argument('--combine', action='store_true', help='Combine all results into a single JSON file')
+ parser.add_argument('--bulk_evaluate', action='store_true', help='Evaluate a folder of theorems together', default=False)
+ parser.add_argument('--target_fps', type=int, help='Target FPS for video processing. If not set, original video FPS will be used', required=False)
+ parser.add_argument('--use_parent_folder_as_topic', action='store_true', help='Use parent folder name as topic name for single file evaluation', default=True)
+ parser.add_argument('--max_workers', type=int, default=4, help='Maximum number of concurrent workers for parallel processing')
+
+ args = parser.parse_args()
+
+ # Initialize separate models
+ text_model = LiteLLMWrapper(
+ model_name=args.model_text,
+ temperature=0.0,
+ )
+ video_model = GeminiWrapper(
+ model_name=args.model_video,
+ temperature=0.0,
+ )
+ image_model = LiteLLMWrapper(
+ model_name=args.model_image,
+ temperature=0.0,
+ )
+
+ models = {
+ 'text': text_model,
+ 'video': video_model,
+ 'image': image_model
+ }
+
+ theorem_dirs = []
+ if args.bulk_evaluate:
+ assert os.path.isdir(args.file_path), "File path must be a folder for --bulk_evaluate"
+ for root, dirnames, _ in os.walk(args.file_path):
+ if not any(f.endswith(".mp4") for f in os.listdir(root)):
+ continue
+
+ theorem_dirs.append(root)
+ elif os.path.isdir(args.file_path):
+ assert any(f.endswith(".mp4") for f in os.listdir(args.file_path)), "The provided folder must contain a video file"
+
+ theorem_dirs.append(args.file_path)
+
+ # Create output directory and its temp subdirectories if it doesn't exist
+ os.makedirs(args.output_folder, exist_ok=True)
+ moviepy_temp_dir = os.path.join(args.output_folder, "moviepy_temp")
+ os.makedirs(moviepy_temp_dir, exist_ok=True)
+ VideoFileClip.DEFAULT_TEMP_DIR = moviepy_temp_dir
+
+ processed_videos_dir = os.path.join(args.output_folder, "processed_videos")
+ os.makedirs(processed_videos_dir, exist_ok=True)
+
+ results = {}
+ if theorem_dirs:
+ for theorem_dir in theorem_dirs:
+ file_name, result = process_theorem(
+ models,
+ theorem_dir,
+ args.eval_type,
+ args.retry_limit,
+ args.target_fps,
+ args.use_parent_folder_as_topic,
+ args.output_folder
+ )
+
+ if result is not None:
+ results[file_name] = result
+
+ if not args.combine:
+ save_individual_result(args.output_folder, file_name, result)
+ else:
+ file_name, result = process_theorem(
+ models,
+ args.file_path,
+ args.eval_type,
+ args.retry_limit,
+ args.target_fps,
+ args.use_parent_folder_as_topic,
+ args.output_folder
+ )
+
+ if result is not None:
+ results[file_name] = result
+
+ if not args.combine:
+ save_individual_result(args.output_folder, file_name, result)
+
+ if args.combine:
+ if len(results) > 1:
+ current_time = datetime.now().strftime("%Y%m%d_%H%M%S")
+ combined_file = f"evaluation_{current_time}.json"
+ combine_results(args.output_folder, combined_file, results)
+ print("Combining results completed.")
+ else:
+ for file_name, result in results.items():
+ save_individual_result(args.output_folder, file_name, result)
+
+ os.rmdir(moviepy_temp_dir)
+
+
+if __name__ == "__main__":
+ main()
diff --git a/generate_video.py b/generate_video.py
new file mode 100644
index 0000000000000000000000000000000000000000..3773dfc694830c6c4b414d33b4a8d9814dda37d2
--- /dev/null
+++ b/generate_video.py
@@ -0,0 +1,954 @@
+import os
+import json
+import random
+from typing import Union, List, Dict, Optional
+import subprocess
+import argparse
+import glob
+from PIL import Image
+import re
+from dotenv import load_dotenv
+import asyncio
+import uuid # Import uuid for generating trace_id
+
+from mllm_tools.litellm import LiteLLMWrapper
+from mllm_tools.utils import _prepare_text_inputs # Keep _prepare_text_inputs if still used directly in main
+
+# Import new modules
+from src.core.video_planner import VideoPlanner
+from src.core.code_generator import CodeGenerator
+from src.core.video_renderer import VideoRenderer
+from src.utils.utils import _print_response, _extract_code, extract_xml # Import utility functions
+from src.config.config import Config # Import Config class
+
+# Video parsing
+from src.core.parse_video import (
+ get_images_from_video,
+ image_with_most_non_black_space
+)
+from task_generator import get_banned_reasonings
+from task_generator.prompts_raw import (_code_font_size, _code_disable, _code_limit, _prompt_manim_cheatsheet)
+
+# Load allowed models list from JSON file
+allowed_models_path = os.path.join(os.path.dirname(__file__), 'src', 'utils', 'allowed_models.json')
+with open(allowed_models_path, 'r') as f:
+ allowed_models = json.load(f).get("allowed_models", [])
+
+load_dotenv(override=True)
+
+class VideoGenerator:
+ """
+ A class for generating manim videos using AI models.
+
+ This class coordinates the video generation pipeline by managing scene planning,
+ code generation, and video rendering. It supports concurrent scene processing,
+ visual code fixing, and RAG (Retrieval Augmented Generation).
+
+ Args:
+ planner_model: Model used for scene planning and high-level decisions
+ scene_model: Model used specifically for scene generation (defaults to planner_model)
+ helper_model: Helper model for additional tasks (defaults to planner_model)
+ output_dir (str): Directory to store generated files and videos
+ verbose (bool): Whether to print detailed output
+ use_rag (bool): Whether to use Retrieval Augmented Generation
+ use_context_learning (bool): Whether to use context learning with example code
+ context_learning_path (str): Path to context learning examples
+ chroma_db_path (str): Path to ChromaDB for RAG
+ manim_docs_path (str): Path to Manim documentation for RAG
+ embedding_model (str): Model to use for embeddings
+ use_visual_fix_code (bool): Whether to use visual feedback for code fixing
+ use_langfuse (bool): Whether to enable Langfuse logging
+ trace_id (str, optional): Trace ID for logging
+ max_scene_concurrency (int): Maximum number of scenes to process concurrently
+
+ Attributes:
+ output_dir (str): Directory for output files
+ verbose (bool): Verbosity flag
+ use_visual_fix_code (bool): Visual code fixing flag
+ session_id (str): Unique session identifier
+ scene_semaphore (asyncio.Semaphore): Controls concurrent scene processing
+ banned_reasonings (list): List of banned reasoning patterns
+ planner (VideoPlanner): Handles scene planning
+ code_generator (CodeGenerator): Handles code generation
+ video_renderer (VideoRenderer): Handles video rendering
+ """
+
+ def __init__(self,
+ planner_model,
+ scene_model=None,
+ helper_model=None,
+ output_dir="output",
+ verbose=False,
+ use_rag=False,
+ use_context_learning=False,
+ context_learning_path="data/context_learning",
+ chroma_db_path="data/rag/chroma_db",
+ manim_docs_path="data/rag/manim_docs",
+ embedding_model="azure/text-embedding-3-large",
+ use_visual_fix_code=False,
+ use_langfuse=True,
+ trace_id=None,
+ max_scene_concurrency: int = 5):
+ self.output_dir = output_dir
+ self.verbose = verbose
+ self.use_visual_fix_code = use_visual_fix_code
+ self.session_id = self._load_or_create_session_id() # Modified to load existing or create new
+ self.scene_semaphore = asyncio.Semaphore(max_scene_concurrency)
+ self.banned_reasonings = get_banned_reasonings()
+
+ # Initialize separate modules
+ self.planner = VideoPlanner(
+ planner_model=planner_model,
+ helper_model=helper_model,
+ output_dir=output_dir,
+ print_response=verbose,
+ use_context_learning=use_context_learning,
+ context_learning_path=context_learning_path,
+ use_rag=use_rag,
+ session_id=self.session_id,
+ chroma_db_path=chroma_db_path,
+ manim_docs_path=manim_docs_path,
+ embedding_model=embedding_model,
+ use_langfuse=use_langfuse
+ )
+ self.code_generator = CodeGenerator(
+ scene_model=scene_model if scene_model is not None else planner_model,
+ helper_model=helper_model if helper_model is not None else planner_model,
+ output_dir=output_dir,
+ print_response=verbose,
+ use_rag=use_rag,
+ use_context_learning=use_context_learning,
+ context_learning_path=context_learning_path,
+ chroma_db_path=chroma_db_path,
+ manim_docs_path=manim_docs_path,
+ embedding_model=embedding_model,
+ use_visual_fix_code=use_visual_fix_code,
+ use_langfuse=use_langfuse,
+ session_id=self.session_id
+ )
+ self.video_renderer = VideoRenderer(
+ output_dir=output_dir,
+ print_response=verbose,
+ use_visual_fix_code=use_visual_fix_code
+ )
+
+ def _load_or_create_session_id(self) -> str:
+ """
+ Load existing session ID from file or create a new one.
+
+ Returns:
+ str: The session ID either loaded from file or newly created.
+ """
+ session_file = os.path.join(self.output_dir, "session_id.txt")
+
+ if os.path.exists(session_file):
+ with open(session_file, 'r') as f:
+ session_id = f.read().strip()
+ print(f"Loaded existing session ID: {session_id}")
+ return session_id
+
+ # Create new session ID if none exists
+ session_id = str(uuid.uuid4())
+ os.makedirs(self.output_dir, exist_ok=True)
+ with open(session_file, 'w') as f:
+ f.write(session_id)
+ print(f"Created new session ID: {session_id}")
+ return session_id
+
+ def _save_topic_session_id(self, topic: str, session_id: str) -> None:
+ """
+ Save session ID for a specific topic.
+
+ Args:
+ topic (str): The topic to save the session ID for
+ session_id (str): The session ID to save
+ """
+ file_prefix = topic.lower()
+ file_prefix = re.sub(r'[^a-z0-9_]+', '_', file_prefix)
+ topic_dir = os.path.join(self.output_dir, file_prefix)
+ os.makedirs(topic_dir, exist_ok=True)
+
+ session_file = os.path.join(topic_dir, "session_id.txt")
+ with open(session_file, 'w') as f:
+ f.write(session_id)
+
+ def _load_topic_session_id(self, topic: str) -> Optional[str]:
+ """
+ Load session ID for a specific topic if it exists.
+
+ Args:
+ topic (str): The topic to load the session ID for
+
+ Returns:
+ Optional[str]: The session ID if found, None otherwise
+ """
+ file_prefix = topic.lower()
+ file_prefix = re.sub(r'[^a-z0-9_]+', '_', file_prefix)
+ session_file = os.path.join(self.output_dir, file_prefix, "session_id.txt")
+
+ if os.path.exists(session_file):
+ with open(session_file, 'r') as f:
+ return f.read().strip()
+ return None
+
+ def generate_scene_outline(self,
+ topic: str,
+ description: str,
+ session_id: str) -> str:
+ """
+ Generate scene outline using VideoPlanner.
+
+ Args:
+ topic (str): The topic of the video
+ description (str): Description of the video content
+ session_id (str): Session identifier for tracking
+
+ Returns:
+ str: Generated scene outline
+ """
+ return self.planner.generate_scene_outline(topic, description, session_id)
+
+ async def generate_scene_implementation(self,
+ topic: str,
+ description: str,
+ plan: str,
+ session_id: str) -> List[str]:
+ """
+ Generate scene implementations using VideoPlanner.
+
+ Args:
+ topic (str): The topic of the video
+ description (str): Description of the video content
+ plan (str): The scene plan to implement
+ session_id (str): Session identifier for tracking
+
+ Returns:
+ List[str]: List of generated scene implementations
+ """
+ return await self.planner.generate_scene_implementation(topic, description, plan, session_id)
+
+ async def generate_scene_implementation_concurrently(self,
+ topic: str,
+ description: str,
+ plan: str,
+ session_id: str) -> List[str]:
+ """
+ Generate scene implementations concurrently using VideoPlanner.
+
+ Args:
+ topic (str): The topic of the video
+ description (str): Description of the video content
+ plan (str): The scene plan to implement
+ session_id (str): Session identifier for tracking
+
+ Returns:
+ List[str]: List of generated scene implementations
+ """
+ return await self.planner.generate_scene_implementation_concurrently(topic, description, plan, session_id, self.scene_semaphore) # Pass semaphore
+
+ def load_implementation_plans(self, topic: str) -> Dict[int, Optional[str]]:
+ """
+ Load implementation plans for each scene.
+
+ Args:
+ topic (str): The topic to load implementation plans for
+
+ Returns:
+ Dict[int, Optional[str]]: Dictionary mapping scene numbers to their plans.
+ If a scene's plan is missing, its value will be None.
+ """
+ file_prefix = topic.lower()
+ file_prefix = re.sub(r'[^a-z0-9_]+', '_', file_prefix)
+
+ # Load scene outline from file
+ scene_outline_path = os.path.join(self.output_dir, file_prefix, f"{file_prefix}_scene_outline.txt")
+ if not os.path.exists(scene_outline_path):
+ return {}
+
+ with open(scene_outline_path, "r") as f:
+ scene_outline = f.read()
+
+ # Extract scene outline to get number of scenes
+ scene_outline_content = extract_xml(scene_outline)
+ scene_number = len(re.findall(r'[^<]', scene_outline_content))
+ print(f"Number of scenes: {scene_number}")
+
+ implementation_plans = {}
+
+ # Check each scene's implementation plan
+ for i in range(1, scene_number + 1):
+ plan_path = os.path.join(self.output_dir, file_prefix, f"scene{i}", f"{file_prefix}_scene{i}_implementation_plan.txt")
+ if os.path.exists(plan_path):
+ with open(plan_path, "r") as f:
+ implementation_plans[i] = f.read()
+ print(f"Found existing implementation plan for scene {i}")
+ else:
+ implementation_plans[i] = None
+ print(f"Missing implementation plan for scene {i}")
+
+ return implementation_plans
+
+ async def render_video_fix_code(self,
+ topic: str,
+ description: str,
+ scene_outline: str,
+ implementation_plans: List,
+ max_retries=3,
+ session_id: str = None) -> None:
+ """
+ Render the video for all scenes with code fixing capability.
+
+ Args:
+ topic (str): The topic of the video
+ description (str): Description of the video content
+ scene_outline (str): The overall scene outline
+ implementation_plans (List): List of implementation plans for each scene
+ max_retries (int, optional): Maximum number of code fix attempts. Defaults to 3.
+ session_id (str, optional): Session identifier for tracking
+ """
+ file_prefix = topic.lower()
+ file_prefix = re.sub(r'[^a-z0-9_]+', '_', file_prefix)
+
+ # Create tasks for each scene
+ tasks = []
+ for i, implementation_plan in enumerate(implementation_plans):
+ # Try to load scene trace id, or generate new one if it doesn't exist
+ scene_dir = os.path.join(self.output_dir, file_prefix, f"scene{i+1}")
+ subplan_dir = os.path.join(scene_dir, "subplans")
+ os.makedirs(subplan_dir, exist_ok=True) # Create directories if they don't exist
+
+ scene_trace_id_path = os.path.join(subplan_dir, "scene_trace_id.txt")
+ try:
+ with open(scene_trace_id_path, 'r') as f:
+ scene_trace_id = f.read().strip()
+ except FileNotFoundError:
+ scene_trace_id = str(uuid.uuid4())
+ with open(scene_trace_id_path, 'w') as f:
+ f.write(scene_trace_id)
+
+ task = self.process_scene(i, scene_outline, implementation_plan, topic, description, max_retries, file_prefix, session_id, scene_trace_id)
+ tasks.append(task)
+
+ # Execute all tasks concurrently
+ await asyncio.gather(*tasks)
+
+ async def process_scene(self, i: int, scene_outline: str, scene_implementation: str, topic: str, description: str, max_retries: int, file_prefix: str, session_id: str, scene_trace_id: str): # added scene_trace_id
+ """
+ Process a single scene using CodeGenerator and VideoRenderer.
+
+ Args:
+ i (int): Scene index
+ scene_outline (str): Overall scene outline
+ scene_implementation (str): Implementation plan for this scene
+ topic (str): The topic of the video
+ description (str): Description of the video content
+ max_retries (int): Maximum number of code fix attempts
+ file_prefix (str): Prefix for file naming
+ session_id (str): Session identifier for tracking
+ scene_trace_id (str): Trace identifier for this scene
+ """
+ curr_scene = i + 1
+ curr_version = 0
+ # scene_trace_id = str(uuid.uuid4()) # Remove uuid generation
+ rag_queries_cache = {} # Initialize RAG queries cache
+
+ # Create necessary directories
+ code_dir = os.path.join(self.output_dir, file_prefix, f"scene{curr_scene}", "code")
+ os.makedirs(code_dir, exist_ok=True)
+ media_dir = os.path.join(self.output_dir, file_prefix, "media") # Define media_dir here
+
+ async with self.scene_semaphore:
+ # Step 3A: Generate initial manim code
+ code, log = self.code_generator.generate_manim_code(
+ topic=topic,
+ description=description,
+ scene_outline=scene_outline,
+ scene_implementation=scene_implementation,
+ scene_number=curr_scene,
+ additional_context=[_prompt_manim_cheatsheet, _code_font_size, _code_limit, _code_disable],
+ scene_trace_id=scene_trace_id, # Use passed scene_trace_id
+ session_id=session_id,
+ rag_queries_cache=rag_queries_cache # Pass the cache
+ )
+
+ # Save initial code and log (file operations can be offloaded if needed)
+ with open(os.path.join(code_dir, f"{file_prefix}_scene{curr_scene}_v{curr_version}_init_log.txt"), "w") as f:
+ f.write(log)
+ with open(os.path.join(code_dir, f"{file_prefix}_scene{curr_scene}_v{curr_version}.py"), "w") as f:
+ f.write(code)
+ print(f"Code saved to {code_dir}/{file_prefix}_scene{curr_scene}_v{curr_version}.py")
+
+ # Step 3B: Compile and fix code if needed
+ error_message = None
+ while True: # Retry loop controlled by break statements
+ code, error_message = await self.video_renderer.render_scene(
+ code=code,
+ file_prefix=file_prefix,
+ curr_scene=curr_scene,
+ curr_version=curr_version,
+ code_dir=code_dir,
+ media_dir=media_dir,
+ max_retries=max_retries, # Pass max_retries here if needed in render_scene
+ use_visual_fix_code=self.use_visual_fix_code,
+ visual_self_reflection_func=self.code_generator.visual_self_reflection, # Pass visual_self_reflection function
+ banned_reasonings=self.banned_reasonings, # Pass banned reasonings
+ scene_trace_id=scene_trace_id,
+ topic=topic,
+ session_id=session_id
+ )
+ if error_message is None: # Render success if error_message is None
+ break
+
+ if curr_version >= max_retries: # Max retries reached
+ print(f"Max retries reached for scene {curr_scene}, error: {error_message}")
+ break # Exit retry loop
+
+ curr_version += 1
+ # if program runs this, it means that the code is not rendered successfully
+ code, log = self.code_generator.fix_code_errors(
+ implementation_plan=scene_implementation,
+ code=code,
+ error=error_message,
+ scene_trace_id=scene_trace_id,
+ topic=topic,
+ scene_number=curr_scene,
+ session_id=session_id,
+ rag_queries_cache=rag_queries_cache
+ )
+
+ with open(os.path.join(code_dir, f"{file_prefix}_scene{curr_scene}_v{curr_version}_fix_log.txt"), "w") as f:
+ f.write(log)
+ with open(os.path.join(code_dir, f"{file_prefix}_scene{curr_scene}_v{curr_version}.py"), "w") as f:
+ f.write(code)
+
+ print(f"Code saved to {code_dir}/{file_prefix}_scene{curr_scene}_v{curr_version}.py")
+
+ def run_manim_process(self,
+ topic: str):
+ """
+ Run manim on all generated manim code for a specific topic using VideoRenderer.
+
+ Args:
+ topic (str): The topic to render videos for
+ """
+ return self.video_renderer.run_manim_process(topic)
+
+ def create_snapshot_scene(self, topic: str, scene_number: int, version_number: int, return_type: str = "image"):
+ """
+ Create a snapshot of the video for a specific topic and scene using VideoRenderer.
+
+ Args:
+ topic (str): The topic of the video
+ scene_number (int): Scene number to snapshot
+ version_number (int): Version number to snapshot
+ return_type (str, optional): Type of snapshot to return. Defaults to "image".
+
+ Returns:
+ The snapshot in the specified format
+ """
+ return self.video_renderer.create_snapshot_scene(topic, scene_number, version_number, return_type)
+
+ def combine_videos(self, topic: str):
+ """
+ Combine all videos and subtitle files for a specific topic using VideoRenderer.
+
+ Args:
+ topic (str): The topic to combine videos for
+ """
+ self.video_renderer.combine_videos(topic)
+
+ async def _generate_scene_implementation_single(self, topic: str, description: str, scene_outline_i: str, i: int, file_prefix: str, session_id: str, scene_trace_id: str) -> str:
+ """
+ Generate detailed implementation plan for a single scene using VideoPlanner.
+
+ Args:
+ topic (str): The topic of the video
+ description (str): Description of the video content
+ scene_outline_i (str): Outline for this specific scene
+ i (int): Scene index
+ file_prefix (str): Prefix for file naming
+ session_id (str): Session identifier for tracking
+ scene_trace_id (str): Trace identifier for this scene
+
+ Returns:
+ str: Generated implementation plan
+ """
+ return await self.planner._generate_scene_implementation_single(topic, description, scene_outline_i, i, file_prefix, session_id, scene_trace_id)
+
+ async def generate_video_pipeline(self, topic: str, description: str, max_retries: int, only_plan: bool = False, specific_scenes: List[int] = None):
+ """
+ Modified pipeline to handle partial scene completions and option to only generate plans for specific scenes.
+
+ Args:
+ topic (str): The topic of the video
+ description (str): Description of the video content
+ max_retries (int): Maximum number of code fix attempts
+ only_plan (bool, optional): Whether to only generate plans without rendering. Defaults to False.
+ specific_scenes (List[int], optional): List of specific scenes to process. Defaults to None.
+ """
+ session_id = self._load_or_create_session_id()
+ self._save_topic_session_id(topic, session_id)
+
+ file_prefix = topic.lower()
+ file_prefix = re.sub(r'[^a-z0-9_]+', '_', file_prefix)
+
+ # Load or generate scene outline
+ scene_outline_path = os.path.join(self.output_dir, file_prefix, f"{file_prefix}_scene_outline.txt")
+ if os.path.exists(scene_outline_path):
+ with open(scene_outline_path, "r") as f:
+ scene_outline = f.read()
+ print(f"Loaded existing scene outline for topic: {topic}")
+ if self.planner.use_rag:
+ self.planner.relevant_plugins = self.planner.rag_integration.detect_relevant_plugins(topic, description) or []
+ self.planner.rag_integration.set_relevant_plugins(self.planner.relevant_plugins)
+ print(f"Detected relevant plugins: {self.planner.relevant_plugins}")
+ else:
+ print(f"Generating new scene outline for topic: {topic}")
+ scene_outline = self.planner.generate_scene_outline(topic, description, session_id)
+ os.makedirs(os.path.join(self.output_dir, file_prefix), exist_ok=True)
+ with open(scene_outline_path, "w") as f:
+ f.write(scene_outline)
+
+ # Load or generate implementation plans
+ implementation_plans_dict = self.load_implementation_plans(topic)
+ if not implementation_plans_dict:
+ scene_outline_content = extract_xml(scene_outline)
+ scene_numbers = len(re.findall(r'[^<]', scene_outline_content))
+ implementation_plans_dict = {i: None for i in range(1, scene_numbers + 1)}
+
+ # Generate missing implementation plans for specified scenes or all missing scenes
+ missing_scenes = []
+ for scene_num, plan in implementation_plans_dict.items():
+ if plan is None and (specific_scenes is None or scene_num in specific_scenes):
+ missing_scenes.append(scene_num)
+
+ if missing_scenes:
+ print(f"Generating implementation plans for missing scenes: {missing_scenes}")
+ for scene_num in missing_scenes:
+ scene_outline_content = extract_xml(scene_outline)
+ scene_match = re.search(f'(.*?)', scene_outline_content, re.DOTALL)
+ if scene_match:
+ scene_outline_i = scene_match.group(1)
+ scene_trace_id = str(uuid.uuid4())
+ implementation_plan = await self._generate_scene_implementation_single(
+ topic, description, scene_outline_i, scene_num, file_prefix, session_id, scene_trace_id)
+ implementation_plans_dict[scene_num] = implementation_plan
+
+ if only_plan:
+ print(f"Only generating plans - skipping code generation and video rendering for topic: {topic}")
+ return
+
+ # Convert dictionary to list maintaining scene order
+ sorted_scene_numbers = sorted(implementation_plans_dict.keys())
+ implementation_plans = [implementation_plans_dict[i] for i in sorted_scene_numbers]
+
+ # Render scenes
+ print(f"Starting video rendering for topic: {topic}")
+
+ # Check which scenes need processing
+ scenes_to_process = []
+ for i, implementation_plan in enumerate(implementation_plans):
+ scene_dir = os.path.join(self.output_dir, file_prefix, f"scene{i+1}")
+ code_dir = os.path.join(scene_dir, "code")
+
+ # Check if scene has any code files
+ has_code = False
+ if os.path.exists(code_dir):
+ if any(f.endswith('.py') for f in os.listdir(code_dir)):
+ has_code = True
+
+ # For only_render mode, only process scenes without code
+ if args.only_render:
+ if not has_code:
+ scenes_to_process.append((i+1, implementation_plan))
+ print(f"Scene {i+1} has no code, will process")
+ else:
+ print(f"Scene {i+1} already has code, skipping")
+ # For normal mode, process scenes that haven't been successfully rendered
+ elif not os.path.exists(os.path.join(scene_dir, "succ_rendered.txt")):
+ scenes_to_process.append((i+1, implementation_plan))
+
+ if not scenes_to_process:
+ print(f"No scenes need processing for topic '{topic}'.")
+ else:
+ print(f"Rendering {len(scenes_to_process)} scenes that need processing...")
+ # Create a list of tuples with scene numbers and plans
+ scene_plans = [(scene_num, plan) for scene_num, plan in scenes_to_process]
+ # Sort by scene number to ensure correct order
+ scene_plans.sort(key=lambda x: x[0])
+ # Extract just the plans in the correct order
+ filtered_implementation_plans = [plan for _, plan in scene_plans]
+ await self.render_video_fix_code(topic, description, scene_outline, filtered_implementation_plans,
+ max_retries=max_retries, session_id=session_id)
+
+ if not args.only_render: # Skip video combination in only_render mode
+ print(f"Video rendering completed for topic '{topic}'.")
+
+ def check_theorem_status(self, theorem: Dict) -> Dict[str, bool]:
+ """
+ Check if a theorem has its plan, code files, and rendered videos with detailed scene status.
+
+ Args:
+ theorem (Dict): Dictionary containing theorem information
+
+ Returns:
+ Dict[str, bool]: Dictionary containing status information for the theorem
+ """
+ topic = theorem['theorem']
+ file_prefix = topic.lower()
+ file_prefix = re.sub(r'[^a-z0-9_]+', '_', file_prefix)
+
+ # Check scene outline
+ scene_outline_path = os.path.join(self.output_dir, file_prefix, f"{file_prefix}_scene_outline.txt")
+ has_scene_outline = os.path.exists(scene_outline_path)
+
+ # Get number of scenes if outline exists
+ num_scenes = 0
+ if has_scene_outline:
+ with open(scene_outline_path, "r") as f:
+ scene_outline = f.read()
+ scene_outline_content = extract_xml(scene_outline)
+ num_scenes = len(re.findall(r'[^<]', scene_outline_content))
+
+ # Check implementation plans, code files, and rendered videos
+ implementation_plans = 0
+ code_files = 0
+ rendered_scenes = 0
+
+ # Track status of individual scenes
+ scene_status = []
+ for i in range(1, num_scenes + 1):
+ scene_dir = os.path.join(self.output_dir, file_prefix, f"scene{i}")
+
+ # Check implementation plan
+ plan_path = os.path.join(scene_dir, f"{file_prefix}_scene{i}_implementation_plan.txt")
+ has_plan = os.path.exists(plan_path)
+ if has_plan:
+ implementation_plans += 1
+
+ # Check code files
+ code_dir = os.path.join(scene_dir, "code")
+ has_code = False
+ if os.path.exists(code_dir):
+ if any(f.endswith('.py') for f in os.listdir(code_dir)):
+ has_code = True
+ code_files += 1
+
+ # Check rendered scene video
+ has_render = False
+ if os.path.exists(scene_dir):
+ succ_rendered_path = os.path.join(scene_dir, "succ_rendered.txt")
+ if os.path.exists(succ_rendered_path):
+ has_render = True
+ rendered_scenes += 1
+
+ scene_status.append({
+ 'scene_number': i,
+ 'has_plan': has_plan,
+ 'has_code': has_code,
+ 'has_render': has_render
+ })
+
+ # Check combined video
+ combined_video_path = os.path.join(self.output_dir, file_prefix, f"{file_prefix}_combined.mp4")
+ has_combined_video = os.path.exists(combined_video_path)
+
+ return {
+ 'topic': topic,
+ 'has_scene_outline': has_scene_outline,
+ 'total_scenes': num_scenes,
+ 'implementation_plans': implementation_plans,
+ 'code_files': code_files,
+ 'rendered_scenes': rendered_scenes,
+ 'has_combined_video': has_combined_video,
+ 'scene_status': scene_status
+ }
+
+if __name__ == "__main__":
+ parser = argparse.ArgumentParser(description='Generate Manim videos using AI')
+ parser.add_argument('--model', type=str, choices=allowed_models,
+ default='gemini/gemini-1.5-pro-002', help='Select the AI model to use')
+ parser.add_argument('--topic', type=str, default=None, help='Topic to generate videos for')
+ parser.add_argument('--context', type=str, default=None, help='Context of the topic')
+ parser.add_argument('--helper_model', type=str, choices=allowed_models,
+ default=None, help='Select the helper model to use')
+ parser.add_argument('--only_gen_vid', action='store_true', help='Only generate videos to existing plans')
+ parser.add_argument('--only_combine', action='store_true', help='Only combine videos')
+ parser.add_argument('--peek_existing_videos', '--peek', action='store_true', help='Peek at existing videos')
+ parser.add_argument('--output_dir', type=str, default=Config.OUTPUT_DIR, help='Output directory') # Use Config
+ parser.add_argument('--theorems_path', type=str, default=None, help='Path to theorems json file')
+ parser.add_argument('--sample_size', '--sample', type=int, default=None, help='Number of theorems to sample')
+ parser.add_argument('--verbose', action='store_true', help='Print verbose output')
+ parser.add_argument('--max_retries', type=int, default=5, help='Maximum number of retries for code generation')
+ parser.add_argument('--use_rag', '--rag', action='store_true', help='Use Retrieval Augmented Generation')
+ parser.add_argument('--use_visual_fix_code','--visual_fix_code', action='store_true', help='Use VLM to fix code with rendered visuals')
+ parser.add_argument('--chroma_db_path', type=str, default=Config.CHROMA_DB_PATH, help="Path to Chroma DB") # Use Config
+ parser.add_argument('--manim_docs_path', type=str, default=Config.MANIM_DOCS_PATH, help="Path to manim docs") # Use Config
+ parser.add_argument('--embedding_model', type=str,
+ default=Config.EMBEDDING_MODEL, # Use Config
+ choices=["azure/text-embedding-3-large", "vertex_ai/text-embedding-005"],
+ help='Select the embedding model to use')
+ parser.add_argument('--use_context_learning', action='store_true',
+ help='Use context learning with example Manim code')
+ parser.add_argument('--context_learning_path', type=str,
+ default=Config.CONTEXT_LEARNING_PATH, # Use Config
+ help='Path to context learning examples')
+ parser.add_argument('--use_langfuse', action='store_true',
+ help='Enable Langfuse logging')
+ parser.add_argument('--max_scene_concurrency', type=int, default=1, help='Maximum number of scenes to process concurrently')
+ parser.add_argument('--max_topic_concurrency', type=int, default=1,
+ help='Maximum number of topics to process concurrently')
+ parser.add_argument('--debug_combine_topic', type=str, help='Debug combine videos', default=None)
+ parser.add_argument('--only_plan', action='store_true', help='Only generate scene outline and implementation plans')
+ parser.add_argument('--check_status', action='store_true',
+ help='Check planning and code status for all theorems')
+ parser.add_argument('--only_render', action='store_true', help='Only render scenes without combining videos')
+ parser.add_argument('--scenes', nargs='+', type=int, help='Specific scenes to process (if theorems_path is provided)')
+ args = parser.parse_args()
+
+ # Initialize planner model using LiteLLM
+ if args.verbose:
+ verbose = True
+ else:
+ verbose = False
+ planner_model = LiteLLMWrapper(
+ model_name=args.model,
+ temperature=0.7,
+ print_cost=True,
+ verbose=verbose,
+ use_langfuse=args.use_langfuse
+ )
+ helper_model = LiteLLMWrapper(
+ model_name=args.helper_model if args.helper_model else args.model, # Use helper_model if provided, else planner_model
+ temperature=0.7,
+ print_cost=True,
+ verbose=verbose,
+ use_langfuse=args.use_langfuse
+ )
+ scene_model = LiteLLMWrapper( # Initialize scene_model separately
+ model_name=args.model,
+ temperature=0.7,
+ print_cost=True,
+ verbose=verbose,
+ use_langfuse=args.use_langfuse
+ )
+ print(f"Planner model: {args.model}, Helper model: {args.helper_model if args.helper_model else args.model}, Scene model: {args.model}") # Print all models
+
+
+ if args.theorems_path:
+ # Load the sample theorems
+ with open(args.theorems_path, "r") as f:
+ theorems = json.load(f)
+
+ if args.sample_size:
+ theorems = theorems[:args.sample_size]
+
+ if args.peek_existing_videos:
+ print(f"Here's the results of checking whether videos are rendered successfully in {args.output_dir}:")
+ # in output_dir, find all combined.mp4 files and print number of successful rendered videos out of total number of folders
+ successful_rendered_videos = 0
+ total_folders = 0
+ for item in os.listdir(args.output_dir):
+ if os.path.isdir(os.path.join(args.output_dir, item)):
+ total_folders += 1
+ if os.path.exists(os.path.join(args.output_dir, item, f"{item}_combined.mp4")):
+ successful_rendered_videos += 1
+ print(f"Number of successful rendered videos: {successful_rendered_videos}/{total_folders}")
+
+ # also check whether any succ_rendered.txt in scene{i} folder, and then add up the number of successful rendered videos
+ successful_rendered_videos = 0
+ total_scenes = 0
+ for item in os.listdir(args.output_dir):
+ if os.path.isdir(os.path.join(args.output_dir, item)):
+ for scene_folder in os.listdir(os.path.join(args.output_dir, item)):
+ if "scene" in scene_folder and os.path.isdir(os.path.join(args.output_dir, item, scene_folder)):
+ total_scenes += 1
+ if os.path.exists(os.path.join(args.output_dir, item, scene_folder, "succ_rendered.txt")):
+ successful_rendered_videos += 1
+ print(f"Number of successful rendered scenes: {successful_rendered_videos}/{total_scenes}")
+ exit()
+
+ video_generator = VideoGenerator(
+ planner_model=planner_model,
+ scene_model=scene_model, # Pass scene_model
+ helper_model=helper_model, # Pass helper_model
+ output_dir=args.output_dir,
+ verbose=args.verbose,
+ use_rag=args.use_rag,
+ use_context_learning=args.use_context_learning,
+ context_learning_path=args.context_learning_path,
+ chroma_db_path=args.chroma_db_path,
+ manim_docs_path=args.manim_docs_path,
+ embedding_model=args.embedding_model,
+ use_visual_fix_code=args.use_visual_fix_code,
+ use_langfuse=args.use_langfuse,
+ max_scene_concurrency=args.max_scene_concurrency
+ )
+
+ if args.debug_combine_topic is not None:
+ video_generator.combine_videos(args.debug_combine_topic)
+ exit()
+
+ if args.only_gen_vid:
+ # Generate videos for existing plans
+ print("Generating videos for existing plans...")
+
+ async def process_theorem(theorem, topic_semaphore):
+ async with topic_semaphore:
+ topic = theorem['theorem']
+ print(f"Processing topic: {topic}")
+ await video_generator.render_video_fix_code(topic, theorem['description'], max_retries=args.max_retries)
+
+ async def main():
+ # Use the command-line argument for topic concurrency
+ topic_semaphore = asyncio.Semaphore(args.max_topic_concurrency)
+ tasks = [process_theorem(theorem, topic_semaphore) for theorem in theorems]
+ await asyncio.gather(*tasks)
+
+ asyncio.run(main())
+
+ elif args.check_status:
+ print("\nChecking theorem status...")
+ video_generator = VideoGenerator(
+ planner_model=planner_model,
+ scene_model=scene_model,
+ helper_model=helper_model,
+ output_dir=args.output_dir,
+ verbose=args.verbose,
+ use_rag=args.use_rag,
+ use_context_learning=args.use_context_learning,
+ context_learning_path=args.context_learning_path,
+ chroma_db_path=args.chroma_db_path,
+ manim_docs_path=args.manim_docs_path,
+ embedding_model=args.embedding_model,
+ use_visual_fix_code=args.use_visual_fix_code,
+ use_langfuse=args.use_langfuse,
+ max_scene_concurrency=args.max_scene_concurrency
+ )
+
+ all_statuses = [video_generator.check_theorem_status(theorem) for theorem in theorems]
+
+ # Print combined status table
+ print("\nTheorem Status:")
+ print("-" * 160)
+ print(f"{'Topic':<40} {'Outline':<8} {'Total':<8} {'Status (Plan/Code/Render)':<50} {'Combined':<10} {'Missing Components':<40}")
+ print("-" * 160)
+ for status in all_statuses:
+ # Create status string showing plan/code/render completion for each scene
+ scene_status_str = ""
+ for scene in status['scene_status']:
+ scene_str = (
+ ("P" if scene['has_plan'] else "-") +
+ ("C" if scene['has_code'] else "-") +
+ ("R" if scene['has_render'] else "-") + " "
+ )
+ scene_status_str += scene_str
+
+ # Collect missing components
+ missing_plans = []
+ missing_code = []
+ missing_renders = []
+ for scene in status['scene_status']:
+ if not scene['has_plan']:
+ missing_plans.append(str(scene['scene_number']))
+ if not scene['has_code']:
+ missing_code.append(str(scene['scene_number']))
+ if not scene['has_render']:
+ missing_renders.append(str(scene['scene_number']))
+
+ # Format missing components string
+ missing_str = []
+ if missing_plans:
+ missing_str.append(f"P:{','.join(missing_plans)}")
+ if missing_code:
+ missing_str.append(f"C:{','.join(missing_code)}")
+ if missing_renders:
+ missing_str.append(f"R:{','.join(missing_renders)}")
+ missing_str = ' '.join(missing_str)
+
+ print(f"{status['topic'][:37]+'...' if len(status['topic'])>37 else status['topic']:<40} "
+ f"{'✓' if status['has_scene_outline'] else '✗':<8} "
+ f"{status['total_scenes']:<8} "
+ f"{scene_status_str[:47]+'...' if len(scene_status_str)>47 else scene_status_str:<50} "
+ f"{'✓' if status['has_combined_video'] else '✗':<10} "
+ f"{missing_str[:37]+'...' if len(missing_str)>37 else missing_str:<40}")
+
+ # Print summary
+ print("\nSummary:")
+ print(f"Total theorems: {len(theorems)}")
+ print(f"Total scenes: {sum(status['total_scenes'] for status in all_statuses)}")
+ print(f"Scene completion status:")
+ print(f" Plans: {sum(status['implementation_plans'] for status in all_statuses)} scenes")
+ print(f" Code: {sum(status['code_files'] for status in all_statuses)} scenes")
+ print(f" Renders: {sum(status['rendered_scenes'] for status in all_statuses)} scenes")
+ print(f"Combined videos: {sum(1 for status in all_statuses if status['has_combined_video'])}/{len(theorems)}")
+ exit()
+
+ else:
+ # Generate video pipeline from scratch
+ print("Generating video pipeline from scratch...")
+
+ async def process_theorem(theorem, topic_semaphore):
+ async with topic_semaphore:
+ topic = theorem['theorem']
+ description = theorem['description']
+ print(f"Processing topic: {topic}")
+ if args.only_combine:
+ video_generator.combine_videos(topic)
+ else:
+ await video_generator.generate_video_pipeline(
+ topic,
+ description,
+ max_retries=args.max_retries,
+ only_plan=args.only_plan,
+ specific_scenes=args.scenes
+ )
+ if not args.only_plan and not args.only_render: # Add condition for only_render
+ video_generator.combine_videos(topic)
+
+ async def main():
+ # Use the command-line argument for topic concurrency
+ topic_semaphore = asyncio.Semaphore(args.max_topic_concurrency)
+ tasks = [process_theorem(theorem, topic_semaphore) for theorem in theorems]
+ await asyncio.gather(*tasks)
+
+ asyncio.run(main())
+
+ elif args.topic and args.context:
+ video_generator = VideoGenerator(
+ planner_model=planner_model,
+ scene_model=scene_model, # Pass scene_model
+ helper_model=helper_model, # Pass helper_model
+ output_dir=args.output_dir,
+ verbose=args.verbose,
+ use_rag=args.use_rag,
+ use_context_learning=args.use_context_learning,
+ context_learning_path=args.context_learning_path,
+ chroma_db_path=args.chroma_db_path,
+ manim_docs_path=args.manim_docs_path,
+ embedding_model=args.embedding_model,
+ use_visual_fix_code=args.use_visual_fix_code,
+ use_langfuse=args.use_langfuse,
+ max_scene_concurrency=args.max_scene_concurrency
+ )
+ # Process single topic with context
+ print(f"Processing topic: {args.topic}")
+
+ if args.only_gen_vid:
+ video_generator.render_video_fix_code(args.topic, args.context, max_retries=args.max_retries)
+ exit()
+
+ if args.only_combine:
+ video_generator.combine_videos(args.topic)
+ else:
+ asyncio.run(video_generator.generate_video_pipeline(
+ args.topic,
+ args.context,
+ max_retries=args.max_retries,
+ only_plan=args.only_plan,
+ ))
+ if not args.only_plan and not args.only_render:
+ video_generator.combine_videos(args.topic)
+ else:
+ print("Please provide either (--theorems_path) or (--topic and --context)")
+ exit()
diff --git a/mllm_tools/__init__.py b/mllm_tools/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..2c92cd532a28499553985f8dc929710b713e987a
--- /dev/null
+++ b/mllm_tools/__init__.py
@@ -0,0 +1 @@
+# Empty file to make this directory a Python package
\ No newline at end of file
diff --git a/mllm_tools/gemini.py b/mllm_tools/gemini.py
new file mode 100644
index 0000000000000000000000000000000000000000..e7c4a2eb896d48c45f0fe69a446add1dc5d2517c
--- /dev/null
+++ b/mllm_tools/gemini.py
@@ -0,0 +1,176 @@
+from typing import List, Dict, Any, Union, Optional
+import io
+import os
+import base64
+from PIL import Image
+import mimetypes
+import google.generativeai as genai
+import tempfile
+import time
+from urllib.parse import urlparse
+import requests
+from io import BytesIO
+
+class GeminiWrapper:
+ """Wrapper for Gemini to support multiple models and logging"""
+
+ def __init__(
+ self,
+ model_name: str = "gemini-1.5-pro-002",
+ temperature: float = 0.7,
+ print_cost: bool = False,
+ verbose: bool = False,
+ use_langfuse: bool = False
+ ):
+ """
+ Initialize the Gemini wrapper
+
+ Args:
+ model_name: Name of the model to use
+ temperature: Temperature for completion
+ print_cost: Whether to print the cost of the completion
+ verbose: Whether to print verbose output
+ use_langfuse: Whether to enable Langfuse logging
+ """
+ self.model_name = model_name.split('/')[-1] if '/' in model_name else model_name
+ self.temperature = temperature
+ self.print_cost = print_cost
+ self.verbose = verbose
+ self.accumulated_cost = 0
+
+ api_key = os.getenv("GEMINI_API_KEY") or os.getenv("GOOGLE_API_KEY")
+ if not api_key:
+ raise ValueError("No API_KEY found. Please set the `GEMINI_API_KEY` or `GOOGLE_API_KEY` environment variable.")
+ genai.configure(api_key=api_key)
+
+ generation_config = {
+ "temperature": self.temperature,
+ "top_p": 0.95,
+ "response_mime_type": "text/plain",
+ }
+ safety_settings = [
+ {"category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_NONE"},
+ {"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_NONE"},
+ {"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_NONE"},
+ {"category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_NONE"},
+ ]
+ self.model = genai.GenerativeModel(
+ model_name=self.model_name,
+ safety_settings=safety_settings,
+ generation_config=generation_config,
+ )
+
+ def _get_mime_type(self, file_path: str) -> str:
+ """
+ Get the MIME type of a file based on its extension
+
+ Args:
+ file_path: Path to the file
+
+ Returns:
+ MIME type as a string (e.g., "image/jpeg", "audio/mp3")
+ """
+ mime_type, _ = mimetypes.guess_type(file_path)
+ if mime_type is None:
+ raise ValueError(f"Unsupported file type: {file_path}")
+ return mime_type
+
+ def _download_file(self, url: str) -> str:
+ """
+ Download a file from a URL and save it as a temporary file
+
+ Args:
+ url: URL of the file to download
+
+ Returns:
+ Path to the temporary file
+ """
+ response = requests.get(url)
+ if response.status_code == 200:
+ temp_file = tempfile.NamedTemporaryFile(delete=False)
+ temp_file.write(response.content)
+ temp_file.close()
+ return temp_file.name
+ else:
+ raise ValueError(f"Failed to download file from URL: {url}")
+
+ def _save_image_to_temp(self, image: Image.Image) -> str:
+ """
+ Save a PIL Image to a temporary file
+
+ Args:
+ image: PIL Image object
+
+ Returns:
+ Path to the temporary file
+ """
+ temp_file = tempfile.NamedTemporaryFile(suffix=".png", delete=False)
+ image.save(temp_file, format="PNG")
+ temp_file.close()
+ return temp_file.name
+
+ def _upload_to_gemini(self, file_path: str, mime_type: Optional[str] = None):
+ """
+ Uploads the given file to Gemini.
+
+ Args:
+ file_path: Path to the file
+ mime_type: MIME type of the file
+
+ Returns:
+ Uploaded file object
+ """
+ return genai.upload_file(file_path, mime_type=mime_type)
+
+ def __call__(self, messages: List[Dict[str, Any]], metadata: Optional[Dict[str, Any]] = None) -> str:
+ """
+ Process messages and return completion
+
+ Args:
+ messages: List of message dictionaries with 'type' and 'content' keys
+ metadata: Optional metadata to pass to Gemini completion
+
+ Returns:
+ Generated text response
+ """
+ contents = []
+ for msg in messages:
+ if msg["type"] == "text":
+ contents.append(msg["content"])
+ elif msg["type"] in ["image", "audio", "video"]:
+ if isinstance(msg["content"], Image.Image):
+ file_path = self._save_image_to_temp(msg["content"])
+ mime_type = "image/png"
+ elif isinstance(msg["content"], str):
+ if msg["content"].startswith("http"):
+ file_path = self._download_file(msg["content"])
+ mime_type = self._get_mime_type(msg["content"])
+ else:
+ file_path = msg["content"]
+ mime_type = self._get_mime_type(file_path)
+ else:
+ raise ValueError("Unsupported content type")
+
+ uploaded_file = self._upload_to_gemini(file_path, mime_type)
+
+ while uploaded_file.state.name == "PROCESSING":
+ print('.', end='')
+ time.sleep(3)
+ uploaded_file = genai.get_file(uploaded_file.name)
+ if uploaded_file.state.name == "FAILED":
+ raise ValueError(uploaded_file.state.name)
+ print("Upload successfully")
+ contents.append(uploaded_file)
+ else:
+ raise ValueError("Unsupported message type")
+
+ response = self.model.generate_content(contents, request_options={"timeout": 600})
+ try:
+ return response.text
+ except Exception as e:
+ print(e)
+ print(response.prompt_feedback)
+ return str(response.prompt_feedback)
+
+if __name__ == "__main__":
+ pass
\ No newline at end of file
diff --git a/mllm_tools/litellm.py b/mllm_tools/litellm.py
new file mode 100644
index 0000000000000000000000000000000000000000..4873494adb985c921bcc411643d6ee06e89b6740
--- /dev/null
+++ b/mllm_tools/litellm.py
@@ -0,0 +1,193 @@
+import json
+import re
+from typing import List, Dict, Any, Union, Optional
+import io
+import os
+import base64
+from PIL import Image
+import mimetypes
+import litellm
+from litellm import completion, completion_cost
+from dotenv import load_dotenv
+
+load_dotenv()
+
+class LiteLLMWrapper:
+ """Wrapper for LiteLLM to support multiple models and logging"""
+
+ def __init__(
+ self,
+ model_name: str = "gpt-4-vision-preview",
+ temperature: float = 0.7,
+ print_cost: bool = False,
+ verbose: bool = False,
+ use_langfuse: bool = True,
+ ):
+ """
+ Initialize the LiteLLM wrapper
+
+ Args:
+ model_name: Name of the model to use (e.g. "azure/gpt-4", "vertex_ai/gemini-pro")
+ temperature: Temperature for completion
+ print_cost: Whether to print the cost of the completion
+ verbose: Whether to print verbose output
+ use_langfuse: Whether to enable Langfuse logging
+ """
+ self.model_name = model_name
+ self.temperature = temperature
+ self.print_cost = print_cost
+ self.verbose = verbose
+ self.accumulated_cost = 0
+
+ if self.verbose:
+ os.environ['LITELLM_LOG'] = 'DEBUG'
+
+ # Set langfuse callback only if enabled
+ if use_langfuse:
+ litellm.success_callback = ["langfuse"]
+ litellm.failure_callback = ["langfuse"]
+
+ def _encode_file(self, file_path: Union[str, Image.Image]) -> str:
+ """
+ Encode local file or PIL Image to base64 string
+
+ Args:
+ file_path: Path to local file or PIL Image object
+
+ Returns:
+ Base64 encoded file string
+ """
+ if isinstance(file_path, Image.Image):
+ buffered = io.BytesIO()
+ file_path.save(buffered, format="PNG")
+ return base64.b64encode(buffered.getvalue()).decode("utf-8")
+ else:
+ with open(file_path, "rb") as file:
+ return base64.b64encode(file.read()).decode("utf-8")
+
+ def _get_mime_type(self, file_path: str) -> str:
+ """
+ Get the MIME type of a file based on its extension
+
+ Args:
+ file_path: Path to the file
+
+ Returns:
+ MIME type as a string (e.g., "image/jpeg", "audio/mp3")
+ """
+ mime_type, _ = mimetypes.guess_type(file_path)
+ if mime_type is None:
+ raise ValueError(f"Unsupported file type: {file_path}")
+ return mime_type
+
+ def __call__(self, messages: List[Dict[str, Any]], metadata: Optional[Dict[str, Any]] = None) -> str:
+ """
+ Process messages and return completion
+
+ Args:
+ messages: List of message dictionaries with 'type' and 'content' keys
+ metadata: Optional metadata to pass to litellm completion, e.g. for Langfuse tracking
+
+ Returns:
+ Generated text response
+ """
+ if metadata is None:
+ print("No metadata provided, using empty metadata")
+ metadata = {}
+ metadata["trace_name"] = f"litellm-completion-{self.model_name}"
+ # Convert messages to LiteLLM format
+ formatted_messages = []
+ for msg in messages:
+ if msg["type"] == "text":
+ formatted_messages.append({
+ "role": "user",
+ "content": [{"type": "text", "text": msg["content"]}]
+ })
+ elif msg["type"] in ["image", "audio", "video"]:
+ # Check if content is a local file path or PIL Image
+ if isinstance(msg["content"], Image.Image) or os.path.isfile(msg["content"]):
+ try:
+ if isinstance(msg["content"], Image.Image):
+ mime_type = "image/png"
+ else:
+ mime_type = self._get_mime_type(msg["content"])
+ base64_data = self._encode_file(msg["content"])
+ data_url = f"data:{mime_type};base64,{base64_data}"
+ except ValueError as e:
+ print(f"Error processing file {msg['content']}: {e}")
+ continue
+ else:
+ data_url = msg["content"]
+
+ # Append the formatted message based on the model
+ if "gemini" in self.model_name:
+ formatted_messages.append({
+ "role": "user",
+ "content": [
+ {
+ "type": "image_url",
+ "image_url": data_url
+ }
+ ]
+ })
+ elif "gpt" in self.model_name:
+ # GPT and other models expect a different format
+ if msg["type"] == "image":
+ # Default format for images and videos in GPT
+ formatted_messages.append({
+ "role": "user",
+ "content": [
+ {
+ "type": f"image_url",
+ f"{msg['type']}_url": {
+ "url": data_url,
+ "detail": "high"
+ }
+ }
+ ]
+ })
+ else:
+ raise ValueError("For GPT, only text and image inferencing are supported")
+ else:
+ raise ValueError("Only support Gemini and Gpt for Multimodal capability now")
+
+ try:
+ # if it's openai o series model, set temperature to None and reasoning_effort to "medium"
+ if (re.match(r"^o\d+.*$", self.model_name) or re.match(r"^openai/o.*$", self.model_name)):
+ self.temperature = None
+ self.reasoning_effort = "medium"
+ response = completion(
+ model=self.model_name,
+ messages=formatted_messages,
+ temperature=self.temperature,
+ reasoning_effort=self.reasoning_effort,
+ metadata=metadata,
+ max_retries=99
+ )
+ else:
+ response = completion(
+ model=self.model_name,
+ messages=formatted_messages,
+ temperature=self.temperature,
+ metadata=metadata,
+ max_retries=99
+ )
+ if self.print_cost:
+ # pass your response from completion to completion_cost
+ cost = completion_cost(completion_response=response)
+ formatted_string = f"Cost: ${float(cost):.10f}"
+ # print(formatted_string)
+ self.accumulated_cost += cost
+ print(f"Accumulated Cost: ${self.accumulated_cost:.10f}")
+
+ content = response.choices[0].message.content
+ if content is None:
+ print(f"Got null response from model. Full response: {response}")
+ return content
+
+ except Exception as e:
+ print(f"Error in model completion: {e}")
+ return str(e)
+
+if __name__ == "__main__":
+ pass
\ No newline at end of file
diff --git a/mllm_tools/utils.py b/mllm_tools/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..44e016282967b1ae6eee67eca1f60a81defc8bad
--- /dev/null
+++ b/mllm_tools/utils.py
@@ -0,0 +1,174 @@
+from typing import Union, List, Dict, Any, Optional
+from PIL import Image
+import google.generativeai as genai
+import tempfile
+import os
+from .gemini import GeminiWrapper
+from .vertex_ai import VertexAIWrapper
+
+
+def _prepare_text_inputs(texts: List[str]) -> List[Dict[str, str]]:
+ """
+ Converts a list of text strings into the input format for the Agent model.
+
+ Args:
+ texts (List[str]): The list of text strings to be processed.
+
+ Returns:
+ List[Dict[str, str]]: A list of dictionaries formatted for the Agent model.
+ """
+ inputs = []
+ # Add each text string to the inputs
+ if isinstance(texts, str):
+ texts = [texts]
+ for text in texts:
+ inputs.append({
+ "type": "text",
+ "content": text
+ })
+ return inputs
+
+def _prepare_text_image_inputs(texts: Union[str, List[str]], images: Union[str, Image.Image, List[Union[str, Image.Image]]]) -> List[Dict[str, str]]:
+ """
+ Converts text strings and images into the input format for the Agent model.
+
+ Args:
+ texts (Union[str, List[str]]): Text string(s) to be processed.
+ images (Union[str, Image.Image, List[Union[str, Image.Image]]]): Image file path(s) or PIL Image object(s).
+ Returns:
+ List[Dict[str, str]]: A list of dictionaries formatted for the Agent model.
+ """
+ inputs = []
+ # Add each text string to the inputs
+ if isinstance(texts, str):
+ texts = [texts]
+ for text in texts:
+ inputs.append({
+ "type": "text",
+ "content": text
+ })
+ if isinstance(images, (str, Image.Image)):
+ images = [images]
+ for image in images:
+ inputs.append({
+ "type": "image",
+ "content": image
+ })
+ return inputs
+
+def _prepare_text_video_inputs(texts: Union[str, List[str]], videos: Union[str, List[str]]) -> List[Dict[str, str]]:
+ """
+ Converts text strings and video file paths into the input format for the Agent model.
+
+ Args:
+ texts (Union[str, List[str]]): Text string(s) to be processed.
+ videos (Union[str, List[str]]): Video file path(s).
+ Returns:
+ List[Dict[str, str]]: A list of dictionaries formatted for the Agent model.
+ """
+ inputs = []
+ # Add each text string to the inputs
+ if isinstance(texts, str):
+ texts = [texts]
+ for text in texts:
+ inputs.append({
+ "type": "text",
+ "content": text
+ })
+ # Add each video file path to the inputs
+ if isinstance(videos, str):
+ videos = [videos]
+ for video in videos:
+ inputs.append({
+ "type": "video",
+ "content": video
+ })
+ return inputs
+
+def _prepare_text_audio_inputs(texts: Union[str, List[str]], audios: Union[str, List[str]]) -> List[Dict[str, str]]:
+ """
+ Converts text strings and audio file paths into the input format for the Agent model.
+
+ Args:
+ texts (Union[str, List[str]]): Text string(s) to be processed.
+ audios (Union[str, List[str]]): Audio file path(s).
+ Returns:
+ List[Dict[str, str]]: A list of dictionaries formatted for the Agent model.
+ """
+ inputs = []
+ # Add each text string to the inputs
+ if isinstance(texts, str):
+ texts = [texts]
+ for text in texts:
+ inputs.append({
+ "type": "text",
+ "content": text
+ })
+ # Add each audio file path to the inputs
+ if isinstance(audios, str):
+ audios = [audios]
+ for audio in audios:
+ inputs.append({
+ "type": "audio",
+ "content": audio
+ })
+ return inputs
+
+def _extract_code(text: str) -> str:
+ """Helper to extract code block from model response, support Gemini style and OpenAI style"""
+ try:
+ # Find code between ```python and ``` tags
+ start = text.split("```python\n")[-1]
+ end = start.split("```")[0]
+ return end.strip()
+ except IndexError:
+ return text
+
+def _upload_to_gemini(input, mime_type=None):
+ """Uploads the given file or PIL image to Gemini.
+
+ See https://ai.google.dev/gemini-api/docs/prompting_with_media
+ """
+ if isinstance(input, str):
+ # Input is a file path
+ file = genai.upload_file(input, mime_type=mime_type)
+ elif isinstance(input, Image.Image):
+ # Input is a PIL image
+ with tempfile.NamedTemporaryFile(suffix=".jpg", delete=False) as tmp_file:
+ input.save(tmp_file, format="JPEG")
+ tmp_file_path = tmp_file.name
+ file = genai.upload_file(tmp_file_path, mime_type=mime_type or "image/jpeg")
+ os.remove(tmp_file_path)
+ else:
+ raise ValueError("Unsupported input type. Must be a file path or PIL Image.")
+
+ #print(f"Uploaded file '{file.display_name}' as: {file.uri}")
+ return file
+
+def get_media_wrapper(model_name: str) -> Optional[Union[GeminiWrapper, VertexAIWrapper]]:
+ """Get appropriate wrapper for media handling based on model name"""
+ if model_name.startswith('gemini/'):
+ return GeminiWrapper(model_name=model_name.split('/')[-1])
+ elif model_name.startswith('vertex_ai/'):
+ return VertexAIWrapper(model_name=model_name.split('/')[-1])
+ return None
+
+def prepare_media_messages(prompt: str, media_path: Union[str, Image.Image], model_name: str) -> List[Dict[str, Any]]:
+ """Prepare messages for media input based on model type"""
+ is_video = isinstance(media_path, str) and media_path.endswith('.mp4')
+
+ if is_video and (model_name.startswith('gemini/') or model_name.startswith('vertex_ai/')):
+ return [
+ {"type": "text", "content": prompt},
+ {"type": "video", "content": media_path}
+ ]
+ else:
+ # For images or non-Gemini/Vertex models
+ if isinstance(media_path, str):
+ media = Image.open(media_path)
+ else:
+ media = media_path
+ return [
+ {"type": "text", "content": prompt},
+ {"type": "image", "content": media}
+ ]
\ No newline at end of file
diff --git a/mllm_tools/vertex_ai.py b/mllm_tools/vertex_ai.py
new file mode 100644
index 0000000000000000000000000000000000000000..ff688010279c5d6e80ad38aa85660863489eaf9d
--- /dev/null
+++ b/mllm_tools/vertex_ai.py
@@ -0,0 +1,86 @@
+import os
+from typing import List, Dict, Any, Optional
+import vertexai
+from vertexai.generative_models import GenerativeModel, Part
+from google.auth import default
+from google.auth.transport import requests
+
+
+# TODO: check if this is the correct way to use Vertex AI
+# TODO: add langfuse support
+class VertexAIWrapper:
+ """Wrapper for Vertex AI to support Gemini models."""
+
+ def __init__(
+ self,
+ model_name: str = "gemini-1.5-pro",
+ temperature: float = 0.7,
+ print_cost: bool = False,
+ verbose: bool = False,
+ use_langfuse: bool = False
+ ):
+ """Initialize the Vertex AI wrapper.
+
+ Args:
+ model_name: Name of the model to use (e.g. "gemini-1.5-pro")
+ temperature: Temperature for generation between 0 and 1
+ print_cost: Whether to print the cost of the completion
+ verbose: Whether to print verbose output
+ use_langfuse: Whether to enable Langfuse logging
+ """
+ self.model_name = model_name
+ self.temperature = temperature
+ self.print_cost = print_cost
+ self.verbose = verbose
+
+ # Initialize Vertex AI
+ project_id = os.getenv("GOOGLE_CLOUD_PROJECT")
+ location = os.getenv("GOOGLE_CLOUD_LOCATION", "us-central1")
+ if not project_id:
+ raise ValueError("No GOOGLE_CLOUD_PROJECT found in environment variables")
+
+ vertexai.init(project=project_id, location=location)
+ self.model = GenerativeModel(model_name)
+
+ def __call__(self, messages: List[Dict[str, Any]], metadata: Optional[Dict[str, Any]] = None) -> str:
+ """Process messages and return completion.
+
+ Args:
+ messages: List of message dictionaries containing type and content
+ metadata: Optional metadata dictionary to pass to the model
+
+ Returns:
+ Generated text response from the model
+
+ Raises:
+ ValueError: If message type is not supported
+ """
+ parts = []
+
+ for msg in messages:
+ if msg["type"] == "text":
+ parts.append(Part.from_text(msg["content"]))
+ elif msg["type"] in ["image", "video"]:
+ mime_type = "video/mp4" if msg["type"] == "video" else "image/jpeg"
+ if isinstance(msg["content"], str):
+ # Handle GCS URI
+ parts.append(Part.from_uri(
+ msg["content"],
+ mime_type=mime_type
+ ))
+ else:
+ # Handle file path or bytes
+ parts.append(Part.from_data(
+ msg["content"],
+ mime_type=mime_type
+ ))
+
+ response = self.model.generate_content(
+ parts,
+ generation_config={
+ "temperature": self.temperature,
+ "top_p": 0.95,
+ }
+ )
+
+ return response.text
\ No newline at end of file
diff --git a/requirements.txt b/requirements.txt
new file mode 100644
index 0000000000000000000000000000000000000000..7aee7cea55d2c0c06754703652913025cace8a05
--- /dev/null
+++ b/requirements.txt
@@ -0,0 +1,101 @@
+annotated-types~=0.7.0
+azure-cognitiveservices-speech~=1.41.1
+cachetools~=5.5.0
+certifi~=2024.8.30
+charset-normalizer~=3.4.0
+click~=8.1.7
+cloup~=3.0.5
+Cython~=3.0.11
+decorator~=5.1.1
+glcontext~=3.0.0
+google-ai-generativelanguage~=0.6.10
+google-api-core~=2.22.0
+google-api-python-client~=2.151.0
+google-auth~=2.35.0
+google-auth-httplib2~=0.2.0
+google-generativeai~=0.8.3
+googleapis-common-protos~=1.65.0
+grpcio~=1.67.1
+grpcio-status~=1.67.1
+gTTS~=2.5.3
+httplib2~=0.22.0
+idna~=3.10
+isosurfaces~=0.1.2
+manim~=0.18.1
+manim-voiceover~=0.3.7
+ManimPango~=0.6.0 # sudo apt-get install libsdl-pango-dev if you dont have pangocairo
+mapbox_earcut~=1.0.2
+markdown-it-py~=3.0.0
+mdurl~=0.1.2
+moderngl~=5.12.0
+multipledispatch~=1.0.0
+mutagen~=1.47.0
+networkx~=3.4.2
+numpy~=2.2.2
+pillow
+proto-plus~=1.25.0
+protobuf~=5.28.3
+pyasn1~=0.6.1
+pyasn1_modules~=0.4.1
+PyAudio~=0.2.14 #required brew install portaudio for mac
+pycairo~=1.27.0
+pydantic~=2.9.2
+pydantic_core~=2.23.4
+pydub~=0.25.1
+pyglet~=2.0.18
+Pygments~=2.18.0
+#pyobjc-core~=10.3.1 # only for mac
+#pyobjc-framework-Cocoa~=10.3.1 # only for mac
+pyparsing~=3.2.0
+pyrr~=0.10.3
+python-dotenv~=0.21.1
+python-slugify~=8.0.4
+requests~=2.32.3
+rich~=13.9.3
+rsa~=4.9
+scipy~=1.14.1
+screeninfo~=0.8.1
+skia-pathops~=0.8.0.post2
+sox~=1.5.0
+srt~=3.5.3
+svgelements~=1.9.6
+text-unidecode~=1.3
+tqdm~=4.66.5
+typing_extensions~=4.12.2
+uritemplate~=4.1.1
+urllib3~=2.2.3
+watchdog~=5.0.3
+inquirer
+openai~=1.61.0
+tiktoken~=0.8.0
+timm
+sentencepiece
+transformers
+litellm~=1.60.5
+pysrt
+moviepy~=2.1.2
+yt-dlp
+imageio_ffmpeg~=0.5.1
+langchain~=0.3.14
+langchain_community~=0.3.14
+SpeechRecognition~=3.14.1
+boto3~=1.36.9
+manim-physics~=0.4.0
+manim-ml~=0.0.24
+manim-chemistry~=0.4.4
+manim-dsa~=0.2.0
+manim-circuit~=0.0.3
+langfuse~=2.58.1
+chromadb~=0.6.3
+google-cloud-aiplatform~=1.79.0
+cairosvg
+pylatexenc~=2.10
+ffmpeg-python~=0.2.0
+kokoro-onnx[gpu] # if you have a GPU, otherwise kokoro-onnx
+soundfile~=0.13.1
+krippendorff~=0.8.1
+statsmodels~=0.14.4
+opencv-python~=4.11.0
+fastapi
+uvicorn
+gradio
\ No newline at end of file
diff --git a/src/__init__.py b/src/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..af7d3b0ba7e6b6301cc415ed2c278f87ba37fd7c
--- /dev/null
+++ b/src/__init__.py
@@ -0,0 +1 @@
+# This is essential for the release to work
\ No newline at end of file
diff --git a/src/config/__init__.py b/src/config/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/src/config/config.py b/src/config/config.py
new file mode 100644
index 0000000000000000000000000000000000000000..5e1d7c5cd974a26b5c7eed25126b26022d86b98b
--- /dev/null
+++ b/src/config/config.py
@@ -0,0 +1,20 @@
+import os
+from dotenv import load_dotenv
+
+# Load environment variables from .env file
+load_dotenv()
+
+class Config:
+ OUTPUT_DIR = "output"
+ THEOREMS_PATH = os.path.join("data", "easy_20.json")
+ CONTEXT_LEARNING_PATH = "data/context_learning"
+ CHROMA_DB_PATH = "data/rag/chroma_db"
+ MANIM_DOCS_PATH = "data/rag/manim_docs"
+ EMBEDDING_MODEL = "azure/text-embedding-3-large"
+
+ # Kokoro TTS configurations
+ KOKORO_MODEL_PATH = os.getenv('KOKORO_MODEL_PATH')
+ KOKORO_VOICES_PATH = os.getenv('KOKORO_VOICES_PATH')
+ KOKORO_DEFAULT_VOICE = os.getenv('KOKORO_DEFAULT_VOICE')
+ KOKORO_DEFAULT_SPEED = float(os.getenv('KOKORO_DEFAULT_SPEED', '1.0'))
+ KOKORO_DEFAULT_LANG = os.getenv('KOKORO_DEFAULT_LANG')
\ No newline at end of file
diff --git a/src/core/__init__.py b/src/core/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/src/core/code_generator.py b/src/core/code_generator.py
new file mode 100644
index 0000000000000000000000000000000000000000..af65450c6aa1ffacf3e8ef68113ebf83f3cd2d52
--- /dev/null
+++ b/src/core/code_generator.py
@@ -0,0 +1,454 @@
+import os
+import re
+import json
+from typing import Union, List, Dict
+from PIL import Image
+import glob
+
+from src.utils.utils import extract_json
+from mllm_tools.utils import _prepare_text_inputs, _extract_code, _prepare_text_image_inputs
+from mllm_tools.gemini import GeminiWrapper
+from mllm_tools.vertex_ai import VertexAIWrapper
+from task_generator import (
+ get_prompt_code_generation,
+ get_prompt_fix_error,
+ get_prompt_visual_fix_error,
+ get_banned_reasonings,
+ get_prompt_rag_query_generation_fix_error,
+ get_prompt_context_learning_code,
+ get_prompt_rag_query_generation_code
+)
+from task_generator.prompts_raw import (
+ _code_font_size,
+ _code_disable,
+ _code_limit,
+ _prompt_manim_cheatsheet
+)
+from src.rag.vector_store import RAGVectorStore # Import RAGVectorStore
+
+class CodeGenerator:
+ """A class for generating and managing Manim code."""
+
+ def __init__(self, scene_model, helper_model, output_dir="output", print_response=False, use_rag=False, use_context_learning=False, context_learning_path="data/context_learning", chroma_db_path="rag/chroma_db", manim_docs_path="rag/manim_docs", embedding_model="azure/text-embedding-3-large", use_visual_fix_code=False, use_langfuse=True, session_id=None):
+ """Initialize the CodeGenerator.
+
+ Args:
+ scene_model: The model used for scene generation
+ helper_model: The model used for helper tasks
+ output_dir (str, optional): Directory for output files. Defaults to "output".
+ print_response (bool, optional): Whether to print model responses. Defaults to False.
+ use_rag (bool, optional): Whether to use RAG. Defaults to False.
+ use_context_learning (bool, optional): Whether to use context learning. Defaults to False.
+ context_learning_path (str, optional): Path to context learning examples. Defaults to "data/context_learning".
+ chroma_db_path (str, optional): Path to ChromaDB. Defaults to "rag/chroma_db".
+ manim_docs_path (str, optional): Path to Manim docs. Defaults to "rag/manim_docs".
+ embedding_model (str, optional): Name of embedding model. Defaults to "azure/text-embedding-3-large".
+ use_visual_fix_code (bool, optional): Whether to use visual code fixing. Defaults to False.
+ use_langfuse (bool, optional): Whether to use Langfuse logging. Defaults to True.
+ session_id (str, optional): Session identifier. Defaults to None.
+ """
+ self.scene_model = scene_model
+ self.helper_model = helper_model
+ self.output_dir = output_dir
+ self.print_response = print_response
+ self.use_rag = use_rag
+ self.use_context_learning = use_context_learning
+ self.context_learning_path = context_learning_path
+ self.context_examples = self._load_context_examples() if use_context_learning else None
+ self.manim_docs_path = manim_docs_path
+
+ self.use_visual_fix_code = use_visual_fix_code
+ self.banned_reasonings = get_banned_reasonings()
+ self.session_id = session_id # Use session_id passed from VideoGenerator
+
+ if use_rag:
+ self.vector_store = RAGVectorStore(
+ chroma_db_path=chroma_db_path,
+ manim_docs_path=manim_docs_path,
+ embedding_model=embedding_model,
+ session_id=self.session_id,
+ use_langfuse=use_langfuse
+ )
+ else:
+ self.vector_store = None
+
+ def _load_context_examples(self) -> str:
+ """Load all context learning examples from the specified directory.
+
+ Returns:
+ str: Formatted context learning examples, or None if no examples found.
+ """
+ examples = []
+ for example_file in glob.glob(f"{self.context_learning_path}/**/*.py", recursive=True):
+ with open(example_file, 'r') as f:
+ examples.append(f"# Example from {os.path.basename(example_file)}\n{f.read()}\n")
+
+ # Format examples using get_prompt_context_learning_code instead of _prompt_context_learning
+ if examples:
+ formatted_examples = get_prompt_context_learning_code(
+ examples="\n".join(examples)
+ )
+ return formatted_examples
+ return None
+
+ def _generate_rag_queries_code(self, implementation: str, scene_trace_id: str = None, topic: str = None, scene_number: int = None, session_id: str = None, relevant_plugins: List[str] = []) -> List[str]:
+ """Generate RAG queries from the implementation plan.
+
+ Args:
+ implementation (str): The implementation plan text
+ scene_trace_id (str, optional): Trace ID for the scene. Defaults to None.
+ topic (str, optional): Topic of the scene. Defaults to None.
+ scene_number (int, optional): Scene number. Defaults to None.
+ session_id (str, optional): Session identifier. Defaults to None.
+ relevant_plugins (List[str], optional): List of relevant plugins. Defaults to empty list.
+
+ Returns:
+ List[str]: List of generated RAG queries
+ """
+ # Create a cache key for this scene
+ cache_key = f"{topic}_scene{scene_number}"
+
+ # Check if we already have a cache file for this scene
+ cache_dir = os.path.join(self.output_dir, re.sub(r'[^a-z0-9_]+', '_', topic.lower()), f"scene{scene_number}", "rag_cache")
+ os.makedirs(cache_dir, exist_ok=True)
+ cache_file = os.path.join(cache_dir, "rag_queries_code.json")
+
+ # If cache file exists, load and return cached queries
+ if os.path.exists(cache_file):
+ with open(cache_file, 'r') as f:
+ cached_queries = json.load(f)
+ print(f"Using cached RAG queries for {cache_key}")
+ return cached_queries
+
+ # Generate new queries if not cached
+ if relevant_plugins:
+ prompt = get_prompt_rag_query_generation_code(implementation, ", ".join(relevant_plugins))
+ else:
+ prompt = get_prompt_rag_query_generation_code(implementation, "No plugins are relevant.")
+
+ queries = self.helper_model(
+ _prepare_text_inputs(prompt),
+ metadata={"generation_name": "rag_query_generation", "trace_id": scene_trace_id, "tags": [topic, f"scene{scene_number}"], "session_id": session_id}
+ )
+
+ print(f"RAG queries: {queries}")
+ # retreive json triple backticks
+
+ try: # add try-except block to handle potential json decode errors
+ queries = re.search(r'```json(.*)```', queries, re.DOTALL).group(1)
+ queries = json.loads(queries)
+ except json.JSONDecodeError as e:
+ print(f"JSONDecodeError when parsing RAG queries for storyboard: {e}")
+ print(f"Response text was: {queries}")
+ return [] # Return empty list in case of parsing error
+
+ # Cache the queries
+ with open(cache_file, 'w') as f:
+ json.dump(queries, f)
+
+ return queries
+
+ def _generate_rag_queries_error_fix(self, error: str, code: str, scene_trace_id: str = None, topic: str = None, scene_number: int = None, session_id: str = None, relevant_plugins: List[str] = []) -> List[str]:
+ """Generate RAG queries for fixing code errors.
+
+ Args:
+ error (str): The error message to fix
+ code (str): The code containing the error
+ scene_trace_id (str, optional): Trace ID for the scene. Defaults to None.
+ topic (str, optional): Topic of the scene. Defaults to None.
+ scene_number (int, optional): Scene number. Defaults to None.
+ session_id (str, optional): Session identifier. Defaults to None.
+ relevant_plugins (List[str], optional): List of relevant plugins. Defaults to empty list.
+
+ Returns:
+ List[str]: List of generated RAG queries for error fixing
+ """
+ # Create a cache key for this scene and error
+ cache_key = f"{topic}_scene{scene_number}_error_fix"
+
+ # Check if we already have a cache file for error fix queries
+ cache_dir = os.path.join(self.output_dir, re.sub(r'[^a-z0-9_]+', '_', topic.lower()), f"scene{scene_number}", "rag_cache")
+ os.makedirs(cache_dir, exist_ok=True)
+ cache_file = os.path.join(cache_dir, "rag_queries_error_fix.json")
+
+ # If cache file exists, load and return cached queries
+ if os.path.exists(cache_file):
+ with open(cache_file, 'r') as f:
+ cached_queries = json.load(f)
+ print(f"Using cached RAG queries for error fix in {cache_key}")
+ return cached_queries
+
+ # Generate new queries for error fix if not cached
+ prompt = get_prompt_rag_query_generation_fix_error(
+ error=error,
+ code=code,
+ relevant_plugins=", ".join(relevant_plugins) if relevant_plugins else "No plugins are relevant."
+ )
+
+ queries = self.helper_model(
+ _prepare_text_inputs(prompt),
+ metadata={"generation_name": "rag-query-generation-fix-error", "trace_id": scene_trace_id, "tags": [topic, f"scene{scene_number}"], "session_id": session_id}
+ )
+
+ # remove json triple backticks
+ queries = queries.replace("```json", "").replace("```", "")
+ try: # add try-except block to handle potential json decode errors
+ queries = json.loads(queries)
+ except json.JSONDecodeError as e:
+ print(f"JSONDecodeError when parsing RAG queries for error fix: {e}")
+ print(f"Response text was: {queries}")
+ return [] # Return empty list in case of parsing error
+
+ # Cache the queries
+ with open(cache_file, 'w') as f:
+ json.dump(queries, f)
+
+ return queries
+
+ def _extract_code_with_retries(self, response_text: str, pattern: str, generation_name: str = None, trace_id: str = None, session_id: str = None, max_retries: int = 10) -> str:
+ """Extract code from response text with retry logic.
+
+ Args:
+ response_text (str): The text containing code to extract
+ pattern (str): Regex pattern for extracting code
+ generation_name (str, optional): Name of generation step. Defaults to None.
+ trace_id (str, optional): Trace identifier. Defaults to None.
+ session_id (str, optional): Session identifier. Defaults to None.
+ max_retries (int, optional): Maximum number of retries. Defaults to 10.
+
+ Returns:
+ str: The extracted code
+
+ Raises:
+ ValueError: If code extraction fails after max retries
+ """
+ retry_prompt = """
+ Please extract the Python code in the correct format using the pattern: {pattern}.
+ You MUST NOT include any other text or comments.
+ You MUST return the exact same code as in the previous response, NO CONTENT EDITING is allowed.
+ Previous response:
+ {response_text}
+ """
+
+ for attempt in range(max_retries):
+ code_match = re.search(pattern, response_text, re.DOTALL)
+ if code_match:
+ return code_match.group(1)
+
+ if attempt < max_retries - 1:
+ print(f"Attempt {attempt + 1}: Failed to extract code pattern. Retrying...")
+ # Regenerate response with a more explicit prompt
+ response_text = self.scene_model(
+ _prepare_text_inputs(retry_prompt.format(pattern=pattern, response_text=response_text)),
+ metadata={
+ "generation_name": f"{generation_name}_format_retry_{attempt + 1}",
+ "trace_id": trace_id,
+ "session_id": session_id
+ }
+ )
+
+ raise ValueError(f"Failed to extract code pattern after {max_retries} attempts. Pattern: {pattern}")
+
+ def generate_manim_code(self,
+ topic: str,
+ description: str,
+ scene_outline: str,
+ scene_implementation: str,
+ scene_number: int,
+ additional_context: Union[str, List[str]] = None,
+ scene_trace_id: str = None,
+ session_id: str = None,
+ rag_queries_cache: Dict = None) -> str:
+ """Generate Manim code from video plan.
+
+ Args:
+ topic (str): Topic of the scene
+ description (str): Description of the scene
+ scene_outline (str): Outline of the scene
+ scene_implementation (str): Implementation details
+ scene_number (int): Scene number
+ additional_context (Union[str, List[str]], optional): Additional context. Defaults to None.
+ scene_trace_id (str, optional): Trace identifier. Defaults to None.
+ session_id (str, optional): Session identifier. Defaults to None.
+ rag_queries_cache (Dict, optional): Cache for RAG queries. Defaults to None.
+
+ Returns:
+ Tuple[str, str]: Generated code and response text
+ """
+ if self.use_context_learning:
+ # Add context examples to additional_context
+ if additional_context is None:
+ additional_context = []
+ elif isinstance(additional_context, str):
+ additional_context = [additional_context]
+
+ # Now using the properly formatted code examples
+ if self.context_examples:
+ additional_context.append(self.context_examples)
+
+ if self.use_rag:
+ # Generate RAG queries (will use cache if available)
+ rag_queries = self._generate_rag_queries_code(
+ implementation=scene_implementation,
+ scene_trace_id=scene_trace_id,
+ topic=topic,
+ scene_number=scene_number,
+ session_id=session_id
+ )
+
+ retrieved_docs = self.vector_store.find_relevant_docs(
+ queries=rag_queries,
+ k=2, # number of documents to retrieve
+ trace_id=scene_trace_id,
+ topic=topic,
+ scene_number=scene_number
+ )
+ # Format the retrieved documents into a string
+ if additional_context is None:
+ additional_context = []
+ additional_context.append(retrieved_docs)
+
+ # Format code generation prompt with plan and retrieved context
+ prompt = get_prompt_code_generation(
+ scene_outline=scene_outline,
+ scene_implementation=scene_implementation,
+ topic=topic,
+ description=description,
+ scene_number=scene_number,
+ additional_context=additional_context
+ )
+
+ # Generate code using model
+ response_text = self.scene_model(
+ _prepare_text_inputs(prompt),
+ metadata={"generation_name": "code_generation", "trace_id": scene_trace_id, "tags": [topic, f"scene{scene_number}"], "session_id": session_id}
+ )
+
+ # Extract code with retries
+ code = self._extract_code_with_retries(
+ response_text,
+ r"```python(.*)```",
+ generation_name="code_generation",
+ trace_id=scene_trace_id,
+ session_id=session_id
+ )
+ return code, response_text
+
+ def fix_code_errors(self, implementation_plan: str, code: str, error: str, scene_trace_id: str, topic: str, scene_number: int, session_id: str, rag_queries_cache: Dict = None) -> str:
+ """Fix errors in generated Manim code.
+
+ Args:
+ implementation_plan (str): Original implementation plan
+ code (str): Code containing errors
+ error (str): Error message to fix
+ scene_trace_id (str): Trace identifier
+ topic (str): Topic of the scene
+ scene_number (int): Scene number
+ session_id (str): Session identifier
+ rag_queries_cache (Dict, optional): Cache for RAG queries. Defaults to None.
+
+ Returns:
+ Tuple[str, str]: Fixed code and response text
+ """
+ # Format error fix prompt
+ prompt = get_prompt_fix_error(implementation_plan=implementation_plan, manim_code=code, error=error)
+
+ if self.use_rag:
+ # Generate RAG queries for error fixing
+ rag_queries = self._generate_rag_queries_error_fix(
+ error=error,
+ code=code,
+ scene_trace_id=scene_trace_id,
+ topic=topic,
+ scene_number=scene_number,
+ session_id=session_id
+ )
+ retrieved_docs = self.vector_store.find_relevant_docs(
+ queries=rag_queries,
+ k=2, # number of documents to retrieve for error fixing
+ trace_id=scene_trace_id,
+ topic=topic,
+ scene_number=scene_number
+ )
+ # Format the retrieved documents into a string
+ prompt = get_prompt_fix_error(implementation_plan=implementation_plan, manim_code=code, error=error, additional_context=retrieved_docs)
+
+ # Get fixed code from model
+ response_text = self.scene_model(
+ _prepare_text_inputs(prompt),
+ metadata={"generation_name": "code_fix_error", "trace_id": scene_trace_id, "tags": [topic, f"scene{scene_number}"], "session_id": session_id}
+ )
+
+ # Extract fixed code with retries
+ fixed_code = self._extract_code_with_retries(
+ response_text,
+ r"```python(.*)```",
+ generation_name="code_fix_error",
+ trace_id=scene_trace_id,
+ session_id=session_id
+ )
+ return fixed_code, response_text
+
+ def visual_self_reflection(self, code: str, media_path: Union[str, Image.Image], scene_trace_id: str, topic: str, scene_number: int, session_id: str) -> str:
+ """Use snapshot image or mp4 video to fix code.
+
+ Args:
+ code (str): Code to fix
+ media_path (Union[str, Image.Image]): Path to media file or PIL Image
+ scene_trace_id (str): Trace identifier
+ topic (str): Topic of the scene
+ scene_number (int): Scene number
+ session_id (str): Session identifier
+
+ Returns:
+ Tuple[str, str]: Fixed code and response text
+ """
+
+ # Determine if we're dealing with video or image
+ is_video = isinstance(media_path, str) and media_path.endswith('.mp4')
+
+ # Load prompt template
+ with open('task_generator/prompts_raw/prompt_visual_self_reflection.txt', 'r') as f:
+ prompt_template = f.read()
+
+ # Format prompt
+ prompt = prompt_template.format(code=code)
+
+ # Prepare input based on media type
+ if is_video and isinstance(self.scene_model, (GeminiWrapper, VertexAIWrapper)):
+ # For video with Gemini models
+ messages = [
+ {"type": "text", "content": prompt},
+ {"type": "video", "content": media_path}
+ ]
+ else:
+ # For images or non-Gemini models
+ if isinstance(media_path, str):
+ media = Image.open(media_path)
+ else:
+ media = media_path
+ messages = [
+ {"type": "text", "content": prompt},
+ {"type": "image", "content": media}
+ ]
+
+ # Get model response
+ response_text = self.scene_model(
+ messages,
+ metadata={
+ "generation_name": "visual_self_reflection",
+ "trace_id": scene_trace_id,
+ "tags": [topic, f"scene{scene_number}"],
+ "session_id": session_id
+ }
+ )
+
+ # Extract code with retries
+ fixed_code = self._extract_code_with_retries(
+ response_text,
+ r"```python(.*)```",
+ generation_name="visual_self_reflection",
+ trace_id=scene_trace_id,
+ session_id=session_id
+ )
+ return fixed_code, response_text
\ No newline at end of file
diff --git a/src/core/parse_video.py b/src/core/parse_video.py
new file mode 100644
index 0000000000000000000000000000000000000000..8a4a014b446635a9c8e3b8374cd1fb366dcd529a
--- /dev/null
+++ b/src/core/parse_video.py
@@ -0,0 +1,227 @@
+import os
+import pysrt
+from moviepy import VideoFileClip
+import shutil
+from PIL import Image, ImageOps
+import numpy as np
+import speech_recognition as sr
+
+def get_images_from_video(video_path, fps=0.2):
+ """Extract frames from a video file at specified FPS.
+
+ Args:
+ video_path (str): Path to the video file.
+ fps (float, optional): Frames per second to extract. Defaults to 0.2.
+
+ Returns:
+ list: List of frames as numpy arrays.
+ """
+ clip = VideoFileClip(video_path)
+ images = clip.iter_frames(fps=fps)
+ return images
+
+def image_with_most_non_black_space(images, output_path, return_type="path"):
+ """Find and save the image with the most non-black space from a list of images.
+
+ Args:
+ images (list): List of image file paths, PIL Image objects, or numpy arrays.
+ output_path (str): Path where the output image should be saved.
+ return_type (str, optional): Type of return value - "path" or "image". Defaults to "path".
+
+ Returns:
+ Union[str, PIL.Image, None]: Path to saved image, PIL Image object, or None if no valid image found.
+ """
+ max_non_black_area = 0
+ image_with_max_non_black_space = None
+
+ for img in images:
+ try:
+ # If img is a path, open the image
+ if isinstance(img, str):
+ image = Image.open(img)
+ elif isinstance(img, Image.Image):
+ image = img
+ elif isinstance(img, np.ndarray):
+ image = Image.fromarray(img)
+ else:
+ print(f"Unsupported type: {type(img)}. Skipping.")
+ continue
+
+ # Convert to grayscale
+ gray = ImageOps.grayscale(image)
+
+ # Convert to numpy array
+ gray_array = np.array(gray)
+
+ # Count non-black pixels (threshold to consider near-black as black)
+ non_black_pixels = np.sum(gray_array > 10) # Threshold 10 to account for slight variations in black
+
+ if non_black_pixels > max_non_black_area:
+ max_non_black_area = non_black_pixels
+ image_with_max_non_black_space = image
+
+ except Exception as e:
+ print(f"Warning: Unable to process image {img}: {e}")
+
+ if image_with_max_non_black_space is not None:
+ image_with_max_non_black_space.save(output_path)
+ print(f"Saved image with most non-black space to {output_path}")
+
+ if return_type == "path":
+ return output_path
+ else:
+ return image_with_max_non_black_space
+ return image_with_max_non_black_space
+
+def parse_srt_to_text(output_dir, topic_name):
+ """Convert SRT subtitle file to plain text.
+
+ Args:
+ output_dir (str): Directory containing the topic folders.
+ topic_name (str): Name of the topic/video.
+ """
+ topic_name = topic_name.replace(" ", "_").lower()
+ srt_path = os.path.join(output_dir, topic_name, f"{topic_name}_combined.srt")
+ txt_path = os.path.join(output_dir, topic_name, f"{topic_name}_combined.txt")
+ subs = pysrt.open(srt_path)
+
+ with open(txt_path, 'w') as f:
+ full_text = ""
+ for sub in subs:
+ sub.text = sub.text.replace("...", ".")
+ full_text += sub.text + " "
+ f.write(full_text.strip())
+
+def parse_srt_and_extract_frames(output_dir, topic_name):
+ """Extract frames from video at subtitle timestamps and save with corresponding text.
+
+ Args:
+ output_dir (str): Directory containing the topic folders.
+ topic_name (str): Name of the topic/video.
+ """
+ topic_name = topic_name.replace(" ", "_").lower()
+ video_path = os.path.join(output_dir, topic_name, f"{topic_name}_combined.mp4")
+ srt_path = os.path.join(output_dir, topic_name, f"{topic_name}_combined.srt")
+ subs = pysrt.open(srt_path)
+
+ # Create extract_images folder if it doesn't exist
+ images_dir = os.path.join(output_dir, topic_name, "extract_images")
+ if os.path.exists(images_dir):
+ shutil.rmtree(images_dir)
+ os.makedirs(images_dir)
+
+ # Load the video file
+ video = VideoFileClip(video_path)
+
+ # Dictionary to store image-text pairs
+ pairs = {}
+
+ i = 0
+ while i < len(subs):
+ sub = subs[i]
+ text = sub.text
+ sub_indexes = [sub.index]
+
+ # Check if we need to concatenate with next subtitle
+ while i < len(subs) - 1 and not text.strip().endswith('.'):
+ i += 1
+ next_sub = subs[i]
+ text += " " + next_sub.text
+ sub_indexes.append(next_sub.index)
+
+ # Get the end time of the last concatenated subtitle
+ end_time = sub.end.to_time()
+ # Convert end time to seconds
+ end_time_seconds = end_time.hour * 3600 + end_time.minute * 60 + end_time.second + end_time.microsecond / 1e6
+
+ # Save the frame as an image in extract_images folder
+ frame_path = os.path.join(images_dir, f"{sub.index}.jpg")
+ video.save_frame(frame_path, t=end_time_seconds)
+
+ # Save the subtitle text to a txt file
+ text_path = os.path.join(images_dir, f"{sub.index}.txt")
+ with open(text_path, 'w') as f:
+ f.write(text)
+
+ # Add pair to dictionary
+ pairs[str(sub.index)] = {
+ "image_path": f"{sub.index}.jpg",
+ "text": text,
+ "text_path": f"{sub.index}.txt",
+ "srt_index": sub_indexes,
+ }
+
+ i += 1
+
+ # Save pairs to json file
+ import json
+ json_path = os.path.join(images_dir, "pairs.json")
+ with open(json_path, 'w') as f:
+ json.dump(pairs, f, indent=4)
+
+ # Close the video file
+ video.close()
+
+def extract_trasnscript(video_path):
+ """Extract transcript from video audio using Google Speech Recognition.
+
+ Args:
+ video_path (str): Path to the video file.
+
+ Returns:
+ str: Transcribed text from the video audio.
+
+ Raises:
+ FileNotFoundError: If video file does not exist.
+ """
+ if not os.path.exists(video_path):
+ raise FileNotFoundError(f"Video file not found: {video_path}")
+
+ clip = VideoFileClip(video_path)
+
+ # write the video to a temporary audio file
+ audio_path = os.path.join(os.path.dirname(video_path), "audio.wav")
+ clip.audio.write_audiofile(audio_path)
+
+ try:
+ # extract the subtitles from the audio file
+ recognizer = sr.Recognizer()
+ with sr.AudioFile(audio_path) as source:
+ audio = recognizer.record(source)
+ return recognizer.recognize_google(audio)
+ finally:
+ # clean up the temporary audio file
+ if os.path.exists(audio_path):
+ os.remove(audio_path)
+
+if __name__ == "__main__":
+ import argparse
+
+ def process_all_topics(output_folder):
+ """Process all topic folders in the output directory.
+
+ Args:
+ output_folder (str): Directory containing the topic folders.
+ """
+ # Only get immediate subdirectories
+ topics = [d for d in os.listdir(output_folder)
+ if os.path.isdir(os.path.join(output_folder, d))]
+
+ for topic in topics:
+ print(f"\nProcessing topic: {topic}")
+ try:
+ parse_srt_to_text(output_folder, topic)
+ parse_srt_and_extract_frames(output_folder, topic)
+ except Exception as e:
+ print(f"Error processing {topic}: {str(e)}")
+ continue
+
+ # Set up argument parser
+ parser = argparse.ArgumentParser(description='Process video files and extract frames with subtitles')
+ parser.add_argument('--output_dir', type=str, default="output",
+ help='Directory containing the topic folders')
+
+ args = parser.parse_args()
+
+ # Process topics using provided output directory
+ process_all_topics(args.output_dir)
\ No newline at end of file
diff --git a/src/core/video_planner.py b/src/core/video_planner.py
new file mode 100644
index 0000000000000000000000000000000000000000..bc962cf4917dc055ee1f784ea17cd2f8b7418725
--- /dev/null
+++ b/src/core/video_planner.py
@@ -0,0 +1,417 @@
+import os
+import re
+import json
+import glob
+from typing import List, Optional
+import uuid
+import asyncio
+
+from mllm_tools.utils import _prepare_text_inputs
+from src.utils.utils import extract_xml
+from task_generator import (
+ get_prompt_scene_plan,
+ get_prompt_scene_vision_storyboard,
+ get_prompt_scene_technical_implementation,
+ get_prompt_scene_animation_narration,
+ get_prompt_context_learning_scene_plan,
+ get_prompt_context_learning_vision_storyboard,
+ get_prompt_context_learning_technical_implementation,
+ get_prompt_context_learning_animation_narration,
+ get_prompt_context_learning_code
+)
+from src.rag.rag_integration import RAGIntegration
+
+class VideoPlanner:
+ """A class for planning and generating video content.
+
+ This class handles the planning and generation of video content including scene outlines,
+ vision storyboards, technical implementations, and animation narrations.
+
+ Args:
+ planner_model: The model used for planning tasks
+ helper_model: Optional helper model, defaults to planner_model if None
+ output_dir (str): Directory for output files. Defaults to "output"
+ print_response (bool): Whether to print model responses. Defaults to False
+ use_context_learning (bool): Whether to use context learning. Defaults to False
+ context_learning_path (str): Path to context learning examples. Defaults to "data/context_learning"
+ use_rag (bool): Whether to use RAG. Defaults to False
+ session_id (str): Session identifier. Defaults to None
+ chroma_db_path (str): Path to ChromaDB. Defaults to "data/rag/chroma_db"
+ manim_docs_path (str): Path to Manim docs. Defaults to "data/rag/manim_docs"
+ embedding_model (str): Name of embedding model. Defaults to "text-embedding-ada-002"
+ use_langfuse (bool): Whether to use Langfuse logging. Defaults to True
+ """
+
+ def __init__(self, planner_model, helper_model=None, output_dir="output", print_response=False, use_context_learning=False, context_learning_path="data/context_learning", use_rag=False, session_id=None, chroma_db_path="data/rag/chroma_db", manim_docs_path="data/rag/manim_docs", embedding_model="text-embedding-ada-002", use_langfuse=True):
+ self.planner_model = planner_model
+ self.helper_model = helper_model if helper_model is not None else planner_model
+ self.output_dir = output_dir
+ self.print_response = print_response
+ self.use_context_learning = use_context_learning
+ self.context_learning_path = context_learning_path
+ # Initialize different types of context examples
+ self.scene_plan_examples = self._load_context_examples('scene_plan') if use_context_learning else None
+ self.vision_storyboard_examples = self._load_context_examples('scene_vision_storyboard') if use_context_learning else None
+ self.technical_implementation_examples = self._load_context_examples('technical_implementation') if use_context_learning else None
+ self.animation_narration_examples = self._load_context_examples('scene_animation_narration') if use_context_learning else None
+ self.code_examples = self._load_context_examples('code') if use_context_learning else None
+ self.use_rag = use_rag
+ self.rag_integration = None
+ if use_rag:
+ self.rag_integration = RAGIntegration(
+ helper_model=helper_model,
+ output_dir=output_dir,
+ chroma_db_path=chroma_db_path,
+ manim_docs_path=manim_docs_path,
+ embedding_model=embedding_model,
+ use_langfuse=use_langfuse,
+ session_id=session_id
+ )
+ self.relevant_plugins = [] # Initialize as an empty list
+
+ def _load_context_examples(self, example_type: str) -> str:
+ """Load context learning examples of a specific type from files.
+
+ Args:
+ example_type (str): Type of examples to load ('scene_plan', 'scene_vision_storyboard', etc.)
+
+ Returns:
+ str: Formatted string containing the loaded examples, or None if no examples found
+ """
+ examples = []
+
+ # Define file patterns for different types
+ file_patterns = {
+ 'scene_plan': '*_scene_plan.txt',
+ 'scene_vision_storyboard': '*_scene_vision_storyboard.txt',
+ 'technical_implementation': '*_technical_implementation.txt',
+ 'scene_animation_narration': '*_scene_animation_narration.txt',
+ 'code': '*.py'
+ }
+
+ pattern = file_patterns.get(example_type)
+ if not pattern:
+ return None
+
+ # Search in subdirectories of context_learning_path
+ for root, _, _ in os.walk(self.context_learning_path):
+ for example_file in glob.glob(os.path.join(root, pattern)):
+ with open(example_file, 'r') as f:
+ content = f.read()
+ if example_type == 'code':
+ examples.append(f"# Example from {os.path.basename(example_file)}\n{content}\n")
+ else:
+ examples.append(f"# Example from {os.path.basename(example_file)}\n{content}\n")
+
+ # Format examples using appropriate template
+ if examples:
+ formatted_examples = self._format_examples(example_type, examples)
+ return formatted_examples
+ return None
+
+ def _format_examples(self, example_type: str, examples: List[str]) -> str:
+ """Format examples using the appropriate template based on their type.
+
+ Args:
+ example_type (str): Type of examples to format
+ examples (List[str]): List of example strings to format
+
+ Returns:
+ str: Formatted examples string, or None if no template found
+ """
+ templates = {
+ 'scene_plan': get_prompt_context_learning_scene_plan,
+ 'scene_vision_storyboard': get_prompt_context_learning_vision_storyboard,
+ 'technical_implementation': get_prompt_context_learning_technical_implementation,
+ 'scene_animation_narration': get_prompt_context_learning_animation_narration,
+ 'code': get_prompt_context_learning_code
+ }
+
+ template = templates.get(example_type)
+ if template:
+ return template(examples="\n".join(examples))
+ return None
+
+ def generate_scene_outline(self,
+ topic: str,
+ description: str,
+ session_id: str) -> str:
+ """Generate a scene outline based on the topic and description.
+
+ Args:
+ topic (str): The topic of the video
+ description (str): Description of the video content
+ session_id (str): Session identifier
+
+ Returns:
+ str: Generated scene outline
+ """
+ # Detect relevant plugins upfront if RAG is enabled
+ if self.use_rag:
+ self.relevant_plugins = self.rag_integration.detect_relevant_plugins(topic, description) or []
+ self.rag_integration.set_relevant_plugins(self.relevant_plugins)
+ print(f"Detected relevant plugins: {self.relevant_plugins}")
+
+ prompt = get_prompt_scene_plan(topic, description)
+
+ if self.use_context_learning and self.scene_plan_examples:
+ prompt += f"\n\nHere are some example scene plans for reference:\n{self.scene_plan_examples}"
+
+ # Generate plan using planner model
+ response_text = self.planner_model(
+ _prepare_text_inputs(prompt),
+ metadata={"generation_name": "scene_outline", "tags": [topic, "scene-outline"], "session_id": session_id}
+ )
+ # extract scene outline ...
+ scene_outline_match = re.search(r'(.*?)', response_text, re.DOTALL)
+ scene_outline = scene_outline_match.group(1) if scene_outline_match else response_text
+
+ # replace all spaces and special characters with underscores for file path compatibility
+ file_prefix = topic.lower()
+ file_prefix = re.sub(r'[^a-z0-9_]+', '_', file_prefix)
+ # save plan to file
+ os.makedirs(os.path.join(self.output_dir, file_prefix), exist_ok=True) # Ensure directory exists
+ with open(os.path.join(self.output_dir, file_prefix, f"{file_prefix}_scene_outline.txt"), "w") as f:
+ f.write(scene_outline)
+ print(f"Plan saved to {file_prefix}_scene_outline.txt")
+
+ return scene_outline
+
+ async def _generate_scene_implementation_single(self, topic: str, description: str, scene_outline_i: str, i: int, file_prefix: str, session_id: str, scene_trace_id: str) -> str:
+ """Generate implementation plan for a single scene.
+
+ Args:
+ topic (str): The topic of the video
+ description (str): Description of the video content
+ scene_outline_i (str): Outline for this specific scene
+ i (int): Scene number
+ file_prefix (str): Prefix for output files
+ session_id (str): Session identifier
+ scene_trace_id (str): Unique trace ID for this scene
+
+ Returns:
+ str: Generated implementation plan for the scene
+ """
+ # Initialize empty implementation plan
+ implementation_plan = ""
+ scene_dir = os.path.join(self.output_dir, file_prefix, f"scene{i}")
+ subplan_dir = os.path.join(scene_dir, "subplans")
+ os.makedirs(scene_dir, exist_ok=True)
+ os.makedirs(subplan_dir, exist_ok=True)
+
+ # Save scene_trace_id to file
+ trace_id_file = os.path.join(subplan_dir, "scene_trace_id.txt")
+ with open(trace_id_file, 'w') as f:
+ f.write(scene_trace_id)
+ print(f"Scene trace ID saved to {trace_id_file}")
+
+ # ===== Step 1: Generate Scene Vision and Storyboard =====
+ # ===================================================
+ prompt_vision_storyboard = get_prompt_scene_vision_storyboard(i, topic, description, scene_outline_i, self.relevant_plugins)
+
+ # Add vision storyboard examples only for this stage if available
+ if self.use_context_learning and self.vision_storyboard_examples:
+ prompt_vision_storyboard += f"\n\nHere are some example storyboards:\n{self.vision_storyboard_examples}"
+
+ if self.rag_integration:
+ # Use the already detected plugins instead of detecting again
+ # relevant_plugins = self.relevant_plugins # Removed redundant variable
+ # print(f"Using detected plugins: {relevant_plugins}") # Removed redundant print
+
+ # Generate RAG queries
+ rag_queries = self.rag_integration._generate_rag_queries_storyboard(
+ scene_plan=scene_outline_i,
+ scene_trace_id=scene_trace_id,
+ topic=topic,
+ scene_number=i,
+ session_id=session_id,
+ relevant_plugins=self.relevant_plugins # Use self.relevant_plugins directly
+ )
+
+ retrieved_docs = self.rag_integration.get_relevant_docs(
+ rag_queries=rag_queries,
+ scene_trace_id=scene_trace_id,
+ topic=topic,
+ scene_number=i
+ )
+
+ # Add documentation to prompt
+ prompt_vision_storyboard += f"\n\n{retrieved_docs}"
+
+ vision_storyboard_plan = self.planner_model(
+ _prepare_text_inputs(prompt_vision_storyboard),
+ metadata={"generation_name": "scene_vision_storyboard", "trace_id": scene_trace_id, "tags": [topic, f"scene{i}"], "session_id": session_id}
+ )
+ # extract vision storyboard plan ...
+ vision_match = re.search(r'(.*?)', vision_storyboard_plan, re.DOTALL)
+ vision_storyboard_plan = vision_match.group(1) if vision_match else vision_storyboard_plan
+ implementation_plan += vision_storyboard_plan + "\n\n"
+ file_path_vs = os.path.join(subplan_dir, f"{file_prefix}_scene{i}_vision_storyboard_plan.txt")
+ with open(file_path_vs, "w") as f:
+ f.write(vision_storyboard_plan)
+ print(f"Scene {i} Vision and Storyboard Plan saved to {file_path_vs}")
+
+ # ===== Step 2: Generate Technical Implementation Plan =====
+ # =========================================================
+ prompt_technical_implementation = get_prompt_scene_technical_implementation(i, topic, description, scene_outline_i, vision_storyboard_plan, self.relevant_plugins)
+
+ # Add technical implementation examples only for this stage if available
+ if self.use_context_learning and self.technical_implementation_examples:
+ prompt_technical_implementation += f"\n\nHere are some example technical implementations:\n{self.technical_implementation_examples}"
+
+ if self.rag_integration:
+ # Use the already detected plugins instead of detecting again
+ # relevant_plugins = self.relevant_plugins # Removed redundant variable
+ # print(f"Using detected plugins: {relevant_plugins}") # Removed redundant print
+
+ # Generate RAG queries
+ rag_queries = self.rag_integration._generate_rag_queries_technical(
+ storyboard=vision_storyboard_plan,
+ scene_trace_id=scene_trace_id,
+ topic=topic,
+ scene_number=i,
+ session_id=session_id,
+ relevant_plugins=self.relevant_plugins # Use self.relevant_plugins directly
+ )
+
+ retrieved_docs = self.rag_integration.get_relevant_docs(
+ rag_queries=rag_queries,
+ scene_trace_id=scene_trace_id,
+ topic=topic,
+ scene_number=i
+ )
+
+ # Add documentation to prompt
+ prompt_technical_implementation += f"\n\n{retrieved_docs}"
+
+ technical_implementation_plan = self.planner_model(
+ _prepare_text_inputs(prompt_technical_implementation),
+ metadata={"generation_name": "scene_technical_implementation", "trace_id": scene_trace_id, "tags": [topic, f"scene{i}"], "session_id": session_id}
+ )
+ # extract technical implementation plan ...
+ technical_match = re.search(r'(.*?)', technical_implementation_plan, re.DOTALL)
+ technical_implementation_plan = technical_match.group(1) if technical_match else technical_implementation_plan
+ implementation_plan += technical_implementation_plan + "\n\n"
+ file_path_ti = os.path.join(subplan_dir, f"{file_prefix}_scene{i}_technical_implementation_plan.txt")
+ with open(file_path_ti, "w") as f:
+ f.write(technical_implementation_plan)
+ print(f"Scene {i} Technical Implementation Plan saved to {file_path_ti}")
+
+ # ===== Step 3: Generate Animation and Narration Plan =====
+ # =========================================================
+ prompt_animation_narration = get_prompt_scene_animation_narration(i, topic, description, scene_outline_i, vision_storyboard_plan, technical_implementation_plan, self.relevant_plugins)
+
+ # Add animation narration examples only for this stage if available
+ if self.use_context_learning and self.animation_narration_examples:
+ prompt_animation_narration += f"\n\nHere are some example animation and narration plans:\n{self.animation_narration_examples}"
+
+ if self.rag_integration:
+ rag_queries = self.rag_integration._generate_rag_queries_narration(
+ storyboard=vision_storyboard_plan,
+ scene_trace_id=scene_trace_id,
+ topic=topic,
+ scene_number=i,
+ session_id=session_id,
+ relevant_plugins=self.relevant_plugins # Use self.relevant_plugins directly
+ )
+ retrieved_docs = self.rag_integration.get_relevant_docs(
+ rag_queries=rag_queries,
+ scene_trace_id=scene_trace_id,
+ topic=topic,
+ scene_number=i
+ )
+ prompt_animation_narration += f"\n\n{retrieved_docs}"
+
+ animation_narration_plan = self.planner_model(
+ _prepare_text_inputs(prompt_animation_narration),
+ metadata={"generation_name": "scene_animation_narration", "trace_id": scene_trace_id, "tags": [topic, f"scene{i}"], "session_id": session_id}
+ )
+ # extract animation narration plan ...
+ animation_match = re.search(r'(.*?)', animation_narration_plan, re.DOTALL)
+ animation_narration_plan = animation_match.group(1) if animation_match else animation_narration_plan
+ implementation_plan += animation_narration_plan + "\n\n"
+ file_path_an = os.path.join(subplan_dir, f"{file_prefix}_scene{i}_animation_narration_plan.txt")
+ with open(file_path_an, "w") as f:
+ f.write(animation_narration_plan)
+ print(f"Scene {i} Animation and Narration Plan saved to {file_path_an}")
+
+ # ===== Step 4: Save Implementation Plan =====
+ # ==========================================
+ # save the overall implementation plan to file
+ with open(os.path.join(self.output_dir, file_prefix, f"scene{i}", f"{file_prefix}_scene{i}_implementation_plan.txt"), "w") as f:
+ f.write(f"# Scene {i} Implementation Plan\n\n")
+ f.write(implementation_plan)
+ print(f"Scene {i} Implementation Plan saved to {file_path_ti}")
+
+ return implementation_plan
+
+ async def generate_scene_implementation(self,
+ topic: str,
+ description: str,
+ plan: str,
+ session_id: str) -> List[str]:
+ """Generate detailed implementation plans for all scenes.
+
+ Args:
+ topic (str): The topic of the video
+ description (str): Description of the video content
+ plan (str): Overall scene plan
+ session_id (str): Session identifier
+
+ Returns:
+ List[str]: List of implementation plans for each scene
+ """
+ # extract scene outline ...
+ scene_outline = re.search(r'(.*?)', plan, re.DOTALL).group(1)
+ # check the number of scenes in the outline
+ scene_number = len(re.findall(r'[^<]', scene_outline))
+ # replace all spaces and special characters with underscores for file path compatibility
+ file_prefix = topic.lower()
+ file_prefix = re.sub(r'[^a-z0-9_]+', '_', file_prefix)
+ # generate implementation plan for each scene
+ all_scene_implementation_plans = []
+
+ tasks = []
+ for i in range(1, scene_number):
+ print(f"Generating implementation plan for scene {i} in topic {topic}")
+ scene_outline_i = re.search(r'(.*?)'.format(i=i), scene_outline, re.DOTALL).group(1)
+ scene_trace_id = str(uuid.uuid4())
+ task = asyncio.create_task(self._generate_scene_implementation_single(topic, description, scene_outline_i, i, file_prefix, session_id, scene_trace_id))
+ tasks.append(task)
+
+ all_scene_implementation_plans = await asyncio.gather(*tasks)
+ return all_scene_implementation_plans
+
+ async def generate_scene_implementation_concurrently(self,
+ topic: str,
+ description: str,
+ plan: str,
+ session_id: str,
+ scene_semaphore) -> List[str]:
+ """Generate detailed implementation plans for all scenes concurrently with controlled concurrency.
+
+ Args:
+ topic (str): The topic of the video
+ description (str): Description of the video content
+ plan (str): Overall scene plan
+ session_id (str): Session identifier
+ scene_semaphore: Semaphore to control concurrent scene generation
+
+ Returns:
+ List[str]: List of implementation plans for each scene
+ """
+ scene_outline = extract_xml(plan)
+ scene_number = len(re.findall(r'[^<]', scene_outline))
+ file_prefix = re.sub(r'[^a-z0-9_]+', '_', topic.lower())
+ all_scene_implementation_plans = []
+
+ async def generate_single_scene_implementation(i):
+ async with scene_semaphore: # controls parallelism
+ print(f"Generating implementation plan for scene {i} in topic {topic}")
+ scene_outline_i = re.search(r'(.*?)'.format(i=i), scene_outline, re.DOTALL).group(1)
+ scene_trace_id = str(uuid.uuid4()) # Generate UUID here
+ return await self._generate_scene_implementation_single(topic, description, scene_outline_i, i, file_prefix, session_id, scene_trace_id)
+
+ tasks = [generate_single_scene_implementation(i + 1) for i in range(scene_number)]
+ all_scene_implementation_plans = await asyncio.gather(*tasks)
+ return all_scene_implementation_plans
\ No newline at end of file
diff --git a/src/core/video_renderer.py b/src/core/video_renderer.py
new file mode 100644
index 0000000000000000000000000000000000000000..138152f026411eed4233d4df5ced4d3da11fd009
--- /dev/null
+++ b/src/core/video_renderer.py
@@ -0,0 +1,448 @@
+import os
+import re
+import subprocess
+import asyncio
+from PIL import Image
+from typing import Optional, List
+import traceback
+import sys
+
+from src.core.parse_video import (
+ get_images_from_video,
+ image_with_most_non_black_space
+)
+from mllm_tools.vertex_ai import VertexAIWrapper
+from mllm_tools.gemini import GeminiWrapper
+
+class VideoRenderer:
+ """Class for rendering and combining Manim animation videos."""
+
+ def __init__(self, output_dir="output", print_response=False, use_visual_fix_code=False):
+ """Initialize the VideoRenderer.
+
+ Args:
+ output_dir (str, optional): Directory for output files. Defaults to "output".
+ print_response (bool, optional): Whether to print responses. Defaults to False.
+ use_visual_fix_code (bool, optional): Whether to use visual fix code. Defaults to False.
+ """
+ self.output_dir = output_dir
+ self.print_response = print_response
+ self.use_visual_fix_code = use_visual_fix_code
+
+ async def render_scene(self, code: str, file_prefix: str, curr_scene: int, curr_version: int, code_dir: str, media_dir: str, max_retries: int = 3, use_visual_fix_code=False, visual_self_reflection_func=None, banned_reasonings=None, scene_trace_id=None, topic=None, session_id=None):
+ """Render a single scene and handle error retries and visual fixes.
+
+ Args:
+ code (str): The Manim code to render
+ file_prefix (str): Prefix for output files
+ curr_scene (int): Current scene number
+ curr_version (int): Current version number
+ code_dir (str): Directory for code files
+ media_dir (str): Directory for media output
+ max_retries (int, optional): Maximum retry attempts. Defaults to 3.
+ use_visual_fix_code (bool, optional): Whether to use visual fix code. Defaults to False.
+ visual_self_reflection_func (callable, optional): Function for visual self-reflection. Defaults to None.
+ banned_reasonings (list, optional): List of banned reasoning strings. Defaults to None.
+ scene_trace_id (str, optional): Scene trace identifier. Defaults to None.
+ topic (str, optional): Topic name. Defaults to None.
+ session_id (str, optional): Session identifier. Defaults to None.
+
+ Returns:
+ tuple: (code, error_message) where error_message is None on success
+ """
+ retries = 0
+ while retries < max_retries:
+ try:
+ # Execute manim in a thread to prevent blocking
+ file_path = os.path.join(code_dir, f"{file_prefix}_scene{curr_scene}_v{curr_version}.py")
+ result = await asyncio.to_thread(
+ subprocess.run,
+ ["manim", "-qh", file_path, "--media_dir", media_dir, "--progress_bar", "none"],
+ capture_output=True,
+ text=True
+ )
+
+ # if result.returncode != 0, it means that the code is not rendered successfully
+ # so we need to fix the code by returning the code and the error message
+ if result.returncode != 0:
+ raise Exception(result.stderr)
+
+ if use_visual_fix_code and visual_self_reflection_func and banned_reasonings:
+ # Get the rendered video path
+ video_path = os.path.join(
+ media_dir,
+ "videos",
+ f"{file_prefix}_scene{curr_scene}_v{curr_version}.mp4"
+ )
+
+ # For Gemini/Vertex AI models, pass the video directly
+ if self.scene_model.model_name.startswith(('gemini/', 'vertex_ai/')):
+ media_input = video_path
+ else:
+ # For other models, use image snapshot
+ media_input = self.create_snapshot_scene(
+ topic, curr_scene, curr_version, return_type="path"
+ )
+
+ new_code, log = visual_self_reflection_func(
+ code,
+ media_input,
+ scene_trace_id=scene_trace_id,
+ topic=topic,
+ scene_number=curr_scene,
+ session_id=session_id
+ )
+
+ with open(os.path.join(code_dir, f"{file_prefix}_scene{curr_scene}_v{curr_version}_vfix_log.txt"), "w") as f:
+ f.write(log)
+
+ # Check for termination markers
+ if "" in new_code or any(word in new_code for word in banned_reasonings):
+ break
+
+ code = new_code
+ curr_version += 1
+ with open(os.path.join(code_dir, f"{file_prefix}_scene{curr_scene}_v{curr_version}.py"), "w") as f:
+ f.write(code)
+ print(f"Code saved to scene{curr_scene}/code/{file_prefix}_scene{curr_scene}_v{curr_version}.py")
+ retries = 0
+ continue
+
+ break # Exit retry loop on success
+
+ except Exception as e:
+ print(f"Error: {e}")
+ print(f"Retrying {retries+1} of {max_retries}...")
+
+ with open(os.path.join(code_dir, f"{file_prefix}_scene{curr_scene}_v{curr_version}_error.log"), "a") as f:
+ f.write(f"\nError in attempt {retries}:\n{str(e)}\n")
+ retries += 1
+ return code, str(e) # Indicate failure and return error message
+
+ print(f"Successfully rendered {file_path}")
+ with open(os.path.join(self.output_dir, file_prefix, f"scene{curr_scene}", "succ_rendered.txt"), "w") as f:
+ f.write("")
+
+ return code, None # Indicate success
+
+ def run_manim_process(self,
+ topic: str):
+ """Run manim on all generated manim code for a specific topic.
+
+ Args:
+ topic (str): Topic name to process
+
+ Returns:
+ subprocess.CompletedProcess: Result of the final manim process
+ """
+ file_prefix = topic.lower()
+ file_prefix = re.sub(r'[^a-z0-9_]+', '_', file_prefix)
+ search_path = os.path.join(self.output_dir, file_prefix)
+ # Iterate through scene folders
+ scene_folders = [f for f in os.listdir(search_path) if os.path.isdir(os.path.join(search_path, f))]
+ scene_folders.sort() # Sort to process scenes in order
+
+ for folder in scene_folders:
+ folder_path = os.path.join(search_path, folder)
+
+ # Get all Python files in version order
+ py_files = [f for f in os.listdir(folder_path) if f.endswith('.py')]
+ py_files.sort(key=lambda x: int(x.split('_v')[-1].split('.')[0])) # Sort by version number
+
+ for file in py_files:
+ file_path = os.path.join(folder_path, file)
+ try:
+ media_dir = os.path.join(self.output_dir, file_prefix, "media")
+ result = subprocess.run(
+ f"manim -qh {file_path} --media_dir {media_dir}",
+ shell=True,
+ capture_output=True,
+ text=True
+ )
+ if result.returncode != 0:
+ raise Exception(result.stderr)
+ print(f"Successfully rendered {file}")
+ break # Move to next scene folder if successful
+ except Exception as e:
+ print(f"Error rendering {file}: {e}")
+ error_log_path = os.path.join(folder_path, f"{file.split('.')[0]}_error.log") # drop the extra py
+ with open(error_log_path, "w") as f:
+ f.write(f"Error:\n{str(e)}\n")
+ print(f"Error log saved to {error_log_path}")
+ return result
+
+ def create_snapshot_scene(self, topic: str, scene_number: int, version_number: int, return_type: str = "image"):
+ """Create a snapshot of the video for a specific topic and scene.
+
+ Args:
+ topic (str): Topic name
+ scene_number (int): Scene number
+ version_number (int): Version number
+ return_type (str, optional): Type of return value - "path" or "image". Defaults to "image".
+
+ Returns:
+ Union[str, PIL.Image]: Path to saved image or PIL Image object
+
+ Raises:
+ FileNotFoundError: If no mp4 files found in video folder
+ """
+ file_prefix = topic.lower()
+ file_prefix = re.sub(r'[^a-z0-9_]+', '_', file_prefix)
+ search_path = os.path.join(self.output_dir, file_prefix)
+ video_folder_path = os.path.join(search_path, "media", "videos", f"{file_prefix}_scene{scene_number}_v{version_number}", "1080p60")
+ os.makedirs(video_folder_path, exist_ok=True)
+ snapshot_path = os.path.join(video_folder_path, "snapshot.png")
+ # Get the mp4 video file from the video folder path
+ video_files = [f for f in os.listdir(video_folder_path) if f.endswith('.mp4')]
+ if not video_files:
+ raise FileNotFoundError(f"No mp4 files found in {video_folder_path}")
+ video_path = os.path.join(video_folder_path, video_files[0])
+ saved_image = image_with_most_non_black_space(get_images_from_video(video_path), snapshot_path, return_type=return_type)
+ return saved_image
+
+ def combine_videos(self, topic: str):
+ """Combine all videos and subtitle files for a specific topic using ffmpeg.
+
+ Args:
+ topic (str): Topic name to combine videos for
+
+ This function will:
+ - Find all scene videos and subtitles
+ - Combine videos with or without audio
+ - Merge subtitle files with correct timing
+ - Save combined video and subtitles to output directory
+ """
+ file_prefix = topic.lower()
+ file_prefix = re.sub(r'[^a-z0-9_]+', '_', file_prefix)
+ search_path = os.path.join(self.output_dir, file_prefix, "media", "videos")
+
+ # Create output directory if it doesn't exist
+ video_output_dir = os.path.join(self.output_dir, file_prefix)
+ os.makedirs(video_output_dir, exist_ok=True)
+
+ output_video_path = os.path.join(video_output_dir, f"{file_prefix}_combined.mp4")
+ output_srt_path = os.path.join(video_output_dir, f"{file_prefix}_combined.srt")
+
+ if os.path.exists(output_video_path) and os.path.exists(output_srt_path):
+ print(f"Combined video and subtitles already exist at {output_video_path}, not combining again.")
+ return
+
+ # Get scene count from outline
+ scene_outline_path = os.path.join(self.output_dir, file_prefix, f"{file_prefix}_scene_outline.txt")
+ if not os.path.exists(scene_outline_path):
+ print(f"Warning: Scene outline file not found at {scene_outline_path}. Cannot determine scene count.")
+ return
+ with open(scene_outline_path) as f:
+ plan = f.read()
+ scene_outline = re.search(r'(.*?)', plan, re.DOTALL).group(1)
+ scene_count = len(re.findall(r'[^<]', scene_outline))
+
+ # Find all scene folders and videos
+ scene_folders = []
+ for root, dirs, files in os.walk(search_path):
+ for dir in dirs:
+ if dir.startswith(file_prefix + "_scene"):
+ scene_folders.append(os.path.join(root, dir))
+
+ scene_videos = []
+ scene_subtitles = []
+
+ for scene_num in range(1, scene_count + 1):
+ folders = [f for f in scene_folders if int(f.split("scene")[-1].split("_")[0]) == scene_num]
+ if not folders:
+ print(f"Warning: Missing scene {scene_num}")
+ continue
+
+ folders.sort(key=lambda f: int(f.split("_v")[-1]))
+ folder = folders[-1]
+
+ video_found = False
+ subtitles_found = False
+ for filename in os.listdir(os.path.join(folder, "1080p60")):
+ if filename.endswith('.mp4'):
+ scene_videos.append(os.path.join(folder, "1080p60", filename))
+ video_found = True
+ elif filename.endswith('.srt'):
+ scene_subtitles.append(os.path.join(folder, "1080p60", filename))
+ subtitles_found = True
+
+ if not video_found:
+ print(f"Warning: Missing video for scene {scene_num}")
+ if not subtitles_found:
+ scene_subtitles.append(None)
+
+ if len(scene_videos) != scene_count:
+ print("Not all videos/subtitles are found, aborting video combination.")
+ return
+
+ try:
+ import ffmpeg # You might need to install ffmpeg-python package: pip install ffmpeg-python
+ from tqdm import tqdm
+
+ print("Analyzing video streams...")
+ # Check if videos have audio streams
+ has_audio = []
+ for video in tqdm(scene_videos, desc="Checking audio streams"):
+ probe = ffmpeg.probe(video)
+ audio_streams = [stream for stream in probe['streams'] if stream['codec_type'] == 'audio']
+ has_audio.append(len(audio_streams) > 0)
+
+ print("Preparing video combination...")
+ # If any video has audio, we need to ensure all videos have audio streams
+ if any(has_audio):
+ # Create list to store video and audio streams
+ streams = []
+ for video, has_aud in tqdm(list(zip(scene_videos, has_audio)), desc="Processing videos"):
+ if has_aud:
+ # Video has audio, use as is
+ input_vid = ffmpeg.input(video)
+ streams.extend([input_vid['v'], input_vid['a']])
+ else:
+ # Video lacks audio, add silent audio
+ input_vid = ffmpeg.input(video)
+ # Generate silent audio for the duration of the video
+ probe = ffmpeg.probe(video)
+ duration = float(probe['streams'][0]['duration'])
+ silent_audio = ffmpeg.input(f'anullsrc=channel_layout=stereo:sample_rate=44100',
+ f='lavfi', t=duration)['a']
+ streams.extend([input_vid['v'], silent_audio])
+
+ print("Combining videos with audio...")
+ try:
+ # Concatenate all streams using optimized CPU encoding settings
+ concat = ffmpeg.concat(*streams, v=1, a=1, unsafe=True)
+ process = (
+ concat
+ .output(output_video_path,
+ **{'c:v': 'libx264',
+ 'c:a': 'aac',
+ 'preset': 'veryfast', # Changed from ultrafast for better speed/quality balance
+ 'crf': '28', # Same quality setting
+ 'threads': '0', # Use all CPU threads
+ 'tune': 'fastdecode', # Optimize for decoding speed
+ 'profile:v': 'baseline', # Simpler profile for faster encoding
+ 'level': '4.0',
+ 'x264-params': 'aq-mode=0:no-deblock:no-cabac:ref=1:subme=0:trellis=0:weightp=0', # Added aggressive speed optimizations
+ 'movflags': '+faststart',
+ 'stats': None,
+ 'progress': 'pipe:1'})
+ .overwrite_output()
+ .run_async(pipe_stdout=True, pipe_stderr=True)
+ )
+
+ # Process progress output
+ while True:
+ line = process.stdout.readline().decode('utf-8')
+ if not line:
+ break
+ if 'frame=' in line:
+ sys.stdout.write('\rProcessing: ' + line.strip())
+ sys.stdout.flush()
+
+ # Wait for the process to complete and capture output
+ stdout, stderr = process.communicate()
+ print("\nEncoding complete!")
+
+ except ffmpeg.Error as e:
+ print(f"FFmpeg stdout:\n{e.stdout.decode('utf8')}")
+ print(f"FFmpeg stderr:\n{e.stderr.decode('utf8')}")
+ raise
+ else:
+ # No videos have audio, concatenate video streams only
+ streams = []
+ for video in tqdm(scene_videos, desc="Processing videos"):
+ streams.append(ffmpeg.input(video)['v'])
+
+ print("Combining videos without audio...")
+ try:
+ concat = ffmpeg.concat(*streams, v=1, unsafe=True)
+ process = (
+ concat
+ .output(output_video_path,
+ **{'c:v': 'libx264',
+ 'preset': 'medium',
+ 'crf': '23',
+ 'stats': None, # Enable progress stats
+ 'progress': 'pipe:1'}) # Output progress to pipe
+ .overwrite_output()
+ .run_async(pipe_stdout=True, pipe_stderr=True)
+ )
+
+ # Process progress output
+ while True:
+ line = process.stdout.readline().decode('utf-8')
+ if not line:
+ break
+ if 'frame=' in line:
+ sys.stdout.write('\rProcessing: ' + line.strip())
+ sys.stdout.flush()
+
+ # Wait for the process to complete and capture output
+ stdout, stderr = process.communicate()
+ print("\nEncoding complete!")
+
+ except ffmpeg.Error as e:
+ print(f"FFmpeg stdout:\n{e.stdout.decode('utf8')}")
+ print(f"FFmpeg stderr:\n{e.stderr.decode('utf8')}")
+ raise
+
+ print(f"Successfully combined videos into {output_video_path}")
+
+ # Handle subtitle combination (existing subtitle code remains the same)
+ if scene_subtitles:
+ with open(output_srt_path, 'w', encoding='utf-8') as outfile:
+ current_time_offset = 0
+ subtitle_index = 1
+
+ for srt_file, video_file in zip(scene_subtitles, scene_videos):
+ if srt_file is None:
+ continue
+
+ with open(srt_file, 'r', encoding='utf-8') as infile:
+ lines = infile.readlines()
+ i = 0
+ while i < len(lines):
+ line = lines[i].strip()
+ if line.isdigit(): # Subtitle index
+ outfile.write(f"{subtitle_index}\n")
+ subtitle_index += 1
+ i += 1
+
+ # Time codes line
+ time_line = lines[i].strip()
+ start_time, end_time = time_line.split(' --> ')
+
+ # Convert time codes and add offset
+ def adjust_time(time_str, offset):
+ h, m, s = time_str.replace(',', '.').split(':')
+ total_seconds = float(h) * 3600 + float(m) * 60 + float(s) + offset
+ h = int(total_seconds // 3600)
+ m = int((total_seconds % 3600) // 60)
+ s = total_seconds % 60
+ return f"{h:02d}:{m:02d}:{s:06.3f}".replace('.', ',')
+
+ new_start = adjust_time(start_time, current_time_offset)
+ new_end = adjust_time(end_time, current_time_offset)
+ outfile.write(f"{new_start} --> {new_end}\n")
+ i += 1
+
+ # Subtitle text (could be multiple lines)
+ while i < len(lines) and lines[i].strip():
+ outfile.write(lines[i])
+ i += 1
+ outfile.write('\n')
+ else:
+ i += 1
+
+ # Update time offset using ffprobe
+ probe = ffmpeg.probe(video_file)
+ duration = float(probe['streams'][0]['duration'])
+ current_time_offset += duration
+
+ print(f"Successfully combined videos into {output_video_path}")
+ if scene_subtitles:
+ print(f"Successfully combined subtitles into {output_srt_path}")
+
+ except Exception as e:
+ print(f"Error combining videos and subtitles: {e}")
+ traceback.print_exc()
\ No newline at end of file
diff --git a/src/rag/__init__.py b/src/rag/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/src/rag/rag_integration.py b/src/rag/rag_integration.py
new file mode 100644
index 0000000000000000000000000000000000000000..0a870734d2fb2cae9ece212dda69d641cc47a4c1
--- /dev/null
+++ b/src/rag/rag_integration.py
@@ -0,0 +1,390 @@
+import os
+import re
+import json
+from typing import List, Dict
+
+from mllm_tools.utils import _prepare_text_inputs
+from task_generator import (
+ get_prompt_rag_query_generation_fix_error,
+ get_prompt_detect_plugins,
+ get_prompt_rag_query_generation_technical,
+ get_prompt_rag_query_generation_vision_storyboard,
+ get_prompt_rag_query_generation_narration,
+ get_prompt_rag_query_generation_code
+)
+from src.rag.vector_store import RAGVectorStore
+
+class RAGIntegration:
+ """Class for integrating RAG (Retrieval Augmented Generation) functionality.
+
+ This class handles RAG integration including plugin detection, query generation,
+ and document retrieval.
+
+ Args:
+ helper_model: Model used for generating queries and processing text
+ output_dir (str): Directory for output files
+ chroma_db_path (str): Path to ChromaDB
+ manim_docs_path (str): Path to Manim documentation
+ embedding_model (str): Name of embedding model to use
+ use_langfuse (bool, optional): Whether to use Langfuse logging. Defaults to True
+ session_id (str, optional): Session identifier. Defaults to None
+ """
+
+ def __init__(self, helper_model, output_dir, chroma_db_path, manim_docs_path, embedding_model, use_langfuse=True, session_id=None):
+ self.helper_model = helper_model
+ self.output_dir = output_dir
+ self.manim_docs_path = manim_docs_path
+ self.session_id = session_id
+ self.relevant_plugins = None
+
+ self.vector_store = RAGVectorStore(
+ chroma_db_path=chroma_db_path,
+ manim_docs_path=manim_docs_path,
+ embedding_model=embedding_model,
+ session_id=self.session_id,
+ use_langfuse=use_langfuse,
+ helper_model=helper_model
+ )
+
+ def set_relevant_plugins(self, plugins: List[str]) -> None:
+ """Set the relevant plugins for the current video.
+
+ Args:
+ plugins (List[str]): List of plugin names to set as relevant
+ """
+ self.relevant_plugins = plugins
+
+ def detect_relevant_plugins(self, topic: str, description: str) -> List[str]:
+ """Detect which plugins might be relevant based on topic and description.
+
+ Args:
+ topic (str): Topic of the video
+ description (str): Description of the video content
+
+ Returns:
+ List[str]: List of detected relevant plugin names
+ """
+ # Load plugin descriptions
+ plugins = self._load_plugin_descriptions()
+ if not plugins:
+ return []
+
+ # Get formatted prompt using the task_generator function
+ prompt = get_prompt_detect_plugins(
+ topic=topic,
+ description=description,
+ plugin_descriptions=json.dumps([{'name': p['name'], 'description': p['description']} for p in plugins], indent=2)
+ )
+
+ try:
+ response = self.helper_model(
+ _prepare_text_inputs(prompt),
+ metadata={"generation_name": "detect-relevant-plugins", "tags": [topic, "plugin-detection"], "session_id": self.session_id}
+ )
+ # Clean the response to ensure it only contains the JSON array
+ response = re.search(r'```json(.*)```', response, re.DOTALL).group(1)
+ try:
+ relevant_plugins = json.loads(response)
+ except json.JSONDecodeError as e:
+ print(f"JSONDecodeError when parsing relevant plugins: {e}")
+ print(f"Response text was: {response}")
+ return []
+
+ print(f"LLM detected relevant plugins: {relevant_plugins}")
+ return relevant_plugins
+ except Exception as e:
+ print(f"Error detecting plugins with LLM: {e}")
+ return []
+
+ def _load_plugin_descriptions(self) -> list:
+ """Load plugin descriptions from JSON file.
+
+ Returns:
+ list: List of plugin descriptions, empty list if loading fails
+ """
+ try:
+ plugin_config_path = os.path.join(
+ self.manim_docs_path,
+ "plugin_docs",
+ "plugins.json"
+ )
+ if os.path.exists(plugin_config_path):
+ with open(plugin_config_path, "r") as f:
+ return json.load(f)
+ else:
+ print(f"Plugin descriptions file not found at {plugin_config_path}")
+ return []
+ except Exception as e:
+ print(f"Error loading plugin descriptions: {e}")
+ return []
+
+ def _generate_rag_queries_storyboard(self, scene_plan: str, scene_trace_id: str = None, topic: str = None, scene_number: int = None, session_id: str = None, relevant_plugins: List[str] = []) -> List[str]:
+ """Generate RAG queries from the scene plan to help create storyboard.
+
+ Args:
+ scene_plan (str): Scene plan text to generate queries from
+ scene_trace_id (str, optional): Trace identifier for the scene. Defaults to None
+ topic (str, optional): Topic name. Defaults to None
+ scene_number (int, optional): Scene number. Defaults to None
+ session_id (str, optional): Session identifier. Defaults to None
+ relevant_plugins (List[str], optional): List of relevant plugins. Defaults to empty list
+
+ Returns:
+ List[str]: List of generated RAG queries
+ """
+ cache_key = f"{topic}_scene{scene_number}_storyboard_rag"
+ cache_dir = os.path.join(self.output_dir, re.sub(r'[^a-z0-9_]+', '_', topic.lower()), f"scene{scene_number}", "rag_cache")
+ os.makedirs(cache_dir, exist_ok=True)
+ cache_file = os.path.join(cache_dir, "rag_queries_storyboard.json")
+
+ if os.path.exists(cache_file):
+ with open(cache_file, 'r') as f:
+ return json.load(f)
+
+ # Format relevant plugins as a string
+ plugins_str = ", ".join(relevant_plugins) if relevant_plugins else "No plugins are relevant."
+
+ # Generate the prompt with only the required arguments
+ prompt = get_prompt_rag_query_generation_vision_storyboard(
+ scene_plan=scene_plan,
+ relevant_plugins=plugins_str
+ )
+
+ queries = self.helper_model(
+ _prepare_text_inputs(prompt),
+ metadata={"generation_name": "rag_query_generation_storyboard", "trace_id": scene_trace_id, "tags": [topic, f"scene{scene_number}"], "session_id": session_id}
+ )
+
+ # retreive json triple backticks
+
+ try: # add try-except block to handle potential json decode errors
+ queries = re.search(r'```json(.*)```', queries, re.DOTALL).group(1)
+ queries = json.loads(queries)
+ except json.JSONDecodeError as e:
+ print(f"JSONDecodeError when parsing RAG queries for storyboard: {e}")
+ print(f"Response text was: {queries}")
+ return [] # Return empty list in case of parsing error
+
+ # Cache the queries
+ with open(cache_file, 'w') as f:
+ json.dump(queries, f)
+
+ return queries
+
+ def _generate_rag_queries_technical(self, storyboard: str, scene_trace_id: str = None, topic: str = None, scene_number: int = None, session_id: str = None, relevant_plugins: List[str] = []) -> List[str]:
+ """Generate RAG queries from the storyboard to help create technical implementation.
+
+ Args:
+ storyboard (str): Storyboard text to generate queries from
+ scene_trace_id (str, optional): Trace identifier for the scene. Defaults to None
+ topic (str, optional): Topic name. Defaults to None
+ scene_number (int, optional): Scene number. Defaults to None
+ session_id (str, optional): Session identifier. Defaults to None
+ relevant_plugins (List[str], optional): List of relevant plugins. Defaults to empty list
+
+ Returns:
+ List[str]: List of generated RAG queries
+ """
+ cache_key = f"{topic}_scene{scene_number}_technical_rag"
+ cache_dir = os.path.join(self.output_dir, re.sub(r'[^a-z0-9_]+', '_', topic.lower()), f"scene{scene_number}", "rag_cache")
+ os.makedirs(cache_dir, exist_ok=True)
+ cache_file = os.path.join(cache_dir, "rag_queries_technical.json")
+
+ if os.path.exists(cache_file):
+ with open(cache_file, 'r') as f:
+ return json.load(f)
+
+ prompt = get_prompt_rag_query_generation_technical(
+ storyboard=storyboard,
+ relevant_plugins=", ".join(relevant_plugins) if relevant_plugins else "No plugins are relevant."
+ )
+
+ queries = self.helper_model(
+ _prepare_text_inputs(prompt),
+ metadata={"generation_name": "rag_query_generation_technical", "trace_id": scene_trace_id, "tags": [topic, f"scene{scene_number}"], "session_id": session_id}
+ )
+
+ try: # add try-except block to handle potential json decode errors
+ queries = re.search(r'```json(.*)```', queries, re.DOTALL).group(1)
+ queries = json.loads(queries)
+ except json.JSONDecodeError as e:
+ print(f"JSONDecodeError when parsing RAG queries for technical implementation: {e}")
+ print(f"Response text was: {queries}")
+ return [] # Return empty list in case of parsing error
+
+ # Cache the queries
+ with open(cache_file, 'w') as f:
+ json.dump(queries, f)
+
+ return queries
+
+ def _generate_rag_queries_narration(self, storyboard: str, scene_trace_id: str = None, topic: str = None, scene_number: int = None, session_id: str = None, relevant_plugins: List[str] = []) -> List[str]:
+ """Generate RAG queries from the storyboard to help create narration plan.
+
+ Args:
+ storyboard (str): Storyboard text to generate queries from
+ scene_trace_id (str, optional): Trace identifier for the scene. Defaults to None
+ topic (str, optional): Topic name. Defaults to None
+ scene_number (int, optional): Scene number. Defaults to None
+ session_id (str, optional): Session identifier. Defaults to None
+ relevant_plugins (List[str], optional): List of relevant plugins. Defaults to empty list
+
+ Returns:
+ List[str]: List of generated RAG queries
+ """
+ cache_key = f"{topic}_scene{scene_number}_narration_rag"
+ cache_dir = os.path.join(self.output_dir, re.sub(r'[^a-z0-9_]+', '_', topic.lower()), f"scene{scene_number}", "rag_cache")
+ os.makedirs(cache_dir, exist_ok=True)
+ cache_file = os.path.join(cache_dir, "rag_queries_narration.json")
+
+ if os.path.exists(cache_file):
+ with open(cache_file, 'r') as f:
+ return json.load(f)
+
+ prompt = get_prompt_rag_query_generation_narration(
+ storyboard=storyboard,
+ relevant_plugins=", ".join(relevant_plugins) if relevant_plugins else "No plugins are relevant."
+ )
+
+ queries = self.helper_model(
+ _prepare_text_inputs(prompt),
+ metadata={"generation_name": "rag_query_generation_narration", "trace_id": scene_trace_id, "tags": [topic, f"scene{scene_number}"], "session_id": session_id}
+ )
+
+ try: # add try-except block to handle potential json decode errors
+ queries = re.search(r'```json(.*)```', queries, re.DOTALL).group(1)
+ queries = json.loads(queries)
+ except json.JSONDecodeError as e:
+ print(f"JSONDecodeError when parsing narration RAG queries: {e}")
+ print(f"Response text was: {queries}")
+ return [] # Return empty list in case of parsing error
+
+ # Cache the queries
+ with open(cache_file, 'w') as f:
+ json.dump(queries, f)
+
+ return queries
+
+ def get_relevant_docs(self, rag_queries: List[Dict], scene_trace_id: str, topic: str, scene_number: int) -> List[str]:
+ """Get relevant documentation using the vector store.
+
+ Args:
+ rag_queries (List[Dict]): List of RAG queries to search for
+ scene_trace_id (str): Trace identifier for the scene
+ topic (str): Topic name
+ scene_number (int): Scene number
+
+ Returns:
+ List[str]: List of relevant documentation snippets
+ """
+ return self.vector_store.find_relevant_docs(
+ queries=rag_queries,
+ k=2,
+ trace_id=scene_trace_id,
+ topic=topic,
+ scene_number=scene_number
+ )
+
+ def _generate_rag_queries_code(self, implementation_plan: str, scene_trace_id: str = None, topic: str = None, scene_number: int = None, relevant_plugins: List[str] = None) -> List[str]:
+ """Generate RAG queries from implementation plan.
+
+ Args:
+ implementation_plan (str): Implementation plan text to generate queries from
+ scene_trace_id (str, optional): Trace identifier for the scene. Defaults to None
+ topic (str, optional): Topic name. Defaults to None
+ scene_number (int, optional): Scene number. Defaults to None
+ relevant_plugins (List[str], optional): List of relevant plugins. Defaults to None
+
+ Returns:
+ List[str]: List of generated RAG queries
+ """
+ cache_key = f"{topic}_scene{scene_number}"
+ cache_dir = os.path.join(self.output_dir, re.sub(r'[^a-z0-9_]+', '_', topic.lower()), f"scene{scene_number}", "rag_cache")
+ os.makedirs(cache_dir, exist_ok=True)
+ cache_file = os.path.join(cache_dir, "rag_queries_code.json")
+
+ if os.path.exists(cache_file):
+ with open(cache_file, 'r') as f:
+ return json.load(f)
+
+ prompt = get_prompt_rag_query_generation_code(
+ implementation_plan=implementation_plan,
+ relevant_plugins=", ".join(relevant_plugins) if relevant_plugins else "No plugins are relevant."
+ )
+
+ try:
+ response = self.helper_model(
+ _prepare_text_inputs(prompt),
+ metadata={"generation_name": "rag_query_generation_code", "trace_id": scene_trace_id, "tags": [topic, f"scene{scene_number}"], "session_id": self.session_id}
+ )
+
+ # Clean and parse response
+ response = re.search(r'```json(.*)```', response, re.DOTALL).group(1)
+ queries = json.loads(response)
+
+ # Cache the queries
+ with open(cache_file, 'w') as f:
+ json.dump(queries, f)
+
+ return queries
+ except Exception as e:
+ print(f"Error generating RAG queries: {e}")
+ return []
+
+ def _generate_rag_queries_error_fix(self, error: str, code: str, scene_trace_id: str = None, topic: str = None, scene_number: int = None, session_id: str = None) -> List[str]:
+ """Generate RAG queries for fixing code errors.
+
+ Args:
+ error (str): Error message to generate queries from
+ code (str): Code containing the error
+ scene_trace_id (str, optional): Trace identifier for the scene. Defaults to None
+ topic (str, optional): Topic name. Defaults to None
+ scene_number (int, optional): Scene number. Defaults to None
+ session_id (str, optional): Session identifier. Defaults to None
+
+ Returns:
+ List[str]: List of generated RAG queries
+ """
+ if self.relevant_plugins is None:
+ print("Warning: No plugins have been detected yet")
+ plugins_str = "No plugins are relevant."
+ else:
+ plugins_str = ", ".join(self.relevant_plugins) if self.relevant_plugins else "No plugins are relevant."
+
+ cache_key = f"{topic}_scene{scene_number}_error_fix"
+ cache_dir = os.path.join(self.output_dir, re.sub(r'[^a-z0-9_]+', '_', topic.lower()), f"scene{scene_number}", "rag_cache")
+ os.makedirs(cache_dir, exist_ok=True)
+ cache_file = os.path.join(cache_dir, "rag_queries_error_fix.json")
+
+ if os.path.exists(cache_file):
+ with open(cache_file, 'r') as f:
+ cached_queries = json.load(f)
+ print(f"Using cached RAG queries for error fix in {cache_key}")
+ return cached_queries
+
+ prompt = get_prompt_rag_query_generation_fix_error(
+ error=error,
+ code=code,
+ relevant_plugins=plugins_str
+ )
+
+ queries = self.helper_model(
+ _prepare_text_inputs(prompt),
+ metadata={"generation_name": "rag-query-generation-fix-error", "trace_id": scene_trace_id, "tags": [topic, f"scene{scene_number}"], "session_id": session_id}
+ )
+
+
+ try:
+ # retrieve json triple backticks
+ queries = re.search(r'```json(.*)```', queries, re.DOTALL).group(1)
+ queries = json.loads(queries)
+ except json.JSONDecodeError as e:
+ print(f"JSONDecodeError when parsing RAG queries for error fix: {e}")
+ print(f"Response text was: {queries}")
+ return []
+
+ # Cache the queries
+ with open(cache_file, 'w') as f:
+ json.dump(queries, f)
+
+ return queries
\ No newline at end of file
diff --git a/src/rag/vector_store.py b/src/rag/vector_store.py
new file mode 100644
index 0000000000000000000000000000000000000000..850517e4d2ac640f2d3dc7842453555db4832cce
--- /dev/null
+++ b/src/rag/vector_store.py
@@ -0,0 +1,356 @@
+import json
+import os
+from typing import List, Dict
+import uuid
+from langchain.schema import Document
+from langchain.text_splitter import RecursiveCharacterTextSplitter
+from langchain_community.document_loaders import TextLoader
+from langchain_community.vectorstores import Chroma
+from langchain_text_splitters import Language
+from langchain_core.embeddings import Embeddings
+import statistics
+from litellm import embedding
+import litellm
+import tiktoken
+from tqdm import tqdm
+from langfuse import Langfuse
+
+from mllm_tools.utils import _prepare_text_inputs
+from task_generator import get_prompt_detect_plugins
+
+class RAGVectorStore:
+ """A class for managing vector stores for RAG (Retrieval Augmented Generation).
+
+ This class handles creation, loading and querying of vector stores for both Manim core
+ and plugin documentation.
+
+ Args:
+ chroma_db_path (str): Path to ChromaDB storage directory
+ manim_docs_path (str): Path to Manim documentation files
+ embedding_model (str): Name of the embedding model to use
+ trace_id (str, optional): Trace identifier for logging. Defaults to None
+ session_id (str, optional): Session identifier. Defaults to None
+ use_langfuse (bool, optional): Whether to use Langfuse logging. Defaults to True
+ helper_model: Helper model for processing. Defaults to None
+ """
+
+ def __init__(self,
+ chroma_db_path: str = "chroma_db",
+ manim_docs_path: str = "rag/manim_docs",
+ embedding_model: str = "text-embedding-ada-002",
+ trace_id: str = None,
+ session_id: str = None,
+ use_langfuse: bool = True,
+ helper_model = None):
+ self.chroma_db_path = chroma_db_path
+ self.manim_docs_path = manim_docs_path
+ self.embedding_model = embedding_model
+ self.trace_id = trace_id
+ self.session_id = session_id
+ self.use_langfuse = use_langfuse
+ self.helper_model = helper_model
+ self.enc = tiktoken.encoding_for_model("gpt-4")
+ self.plugin_stores = {}
+ self.vector_store = self._load_or_create_vector_store()
+
+ def _load_or_create_vector_store(self):
+ """Loads existing or creates new ChromaDB vector stores.
+
+ Creates/loads vector stores for both Manim core documentation and any available plugins.
+ Stores are persisted to disk for future reuse.
+
+ Returns:
+ Chroma: The core Manim vector store instance
+ """
+ print("Entering _load_or_create_vector_store with trace_id:", self.trace_id)
+ core_path = os.path.join(self.chroma_db_path, "manim_core")
+
+ # Load or create core vector store
+ if os.path.exists(core_path):
+ print("Loading existing core ChromaDB...")
+ self.core_vector_store = Chroma(
+ collection_name="manim_core",
+ persist_directory=core_path,
+ embedding_function=self._get_embedding_function()
+ )
+ else:
+ print("Creating new core ChromaDB...")
+ self.core_vector_store = self._create_core_store()
+
+ # Fix: Use correct path construction for plugin_docs
+ plugin_docs_path = os.path.join(self.manim_docs_path, "plugin_docs")
+ print(f"Plugin docs path: {plugin_docs_path}")
+ if os.path.exists(plugin_docs_path):
+ for plugin_name in os.listdir(plugin_docs_path):
+ plugin_store_path = os.path.join(self.chroma_db_path, f"manim_plugin_{plugin_name}")
+ if os.path.exists(plugin_store_path):
+ print(f"Loading existing plugin store: {plugin_name}")
+ self.plugin_stores[plugin_name] = Chroma(
+ collection_name=f"manim_plugin_{plugin_name}",
+ persist_directory=plugin_store_path,
+ embedding_function=self._get_embedding_function()
+ )
+ else:
+ print(f"Creating new plugin store: {plugin_name}")
+ plugin_path = os.path.join(plugin_docs_path, plugin_name)
+ if os.path.isdir(plugin_path):
+ plugin_store = Chroma(
+ collection_name=f"manim_plugin_{plugin_name}",
+ embedding_function=self._get_embedding_function(),
+ persist_directory=plugin_store_path
+ )
+ plugin_docs = self._process_documentation_folder(plugin_path)
+ if plugin_docs:
+ self._add_documents_to_store(plugin_store, plugin_docs, plugin_name)
+ self.plugin_stores[plugin_name] = plugin_store
+
+ return self.core_vector_store # Return core store for backward compatibility
+
+ def _get_embedding_function(self) -> Embeddings:
+ """Creates an embedding function using litellm.
+
+ Returns:
+ Embeddings: A LangChain Embeddings instance that wraps litellm functionality
+ """
+ class LiteLLMEmbeddings(Embeddings):
+ def __init__(self, embedding_model):
+ self.embedding_model = embedding_model
+
+ def embed_documents(self, texts: list[str]) -> list[list[float]]:
+ litellm.success_callback = []
+ litellm.failure_callback = []
+ response = embedding(
+ model=self.embedding_model,
+ input=texts,
+ task_type="CODE_RETRIEVAL_QUERY" if self.embedding_model == "vertex_ai/text-embedding-005" else None
+ )
+ litellm.success_callback = ["langfuse"]
+ litellm.failure_callback = ["langfuse"]
+ return [r["embedding"] for r in response["data"]]
+
+ def embed_query(self, text: str) -> list[float]:
+ litellm.success_callback = []
+ litellm.failure_callback = []
+ response = embedding(
+ model=self.embedding_model,
+ input=[text],
+ task_type="CODE_RETRIEVAL_QUERY" if self.embedding_model == "vertex_ai/text-embedding-005" else None
+ )
+ litellm.success_callback = ["langfuse"]
+ litellm.failure_callback = ["langfuse"]
+ return response["data"][0]["embedding"]
+
+ return LiteLLMEmbeddings(self.embedding_model)
+
+ def _create_core_store(self):
+ """Creates the main ChromaDB vector store for Manim core documentation.
+
+ Returns:
+ Chroma: The initialized and populated core vector store
+ """
+ core_vector_store = Chroma(
+ collection_name="manim_core",
+ embedding_function=self._get_embedding_function(),
+ persist_directory=os.path.join(self.chroma_db_path, "manim_core")
+ )
+
+ # Process manim core docs
+ core_docs = self._process_documentation_folder(os.path.join(self.manim_docs_path, "manim_core"))
+ if core_docs:
+ self._add_documents_to_store(core_vector_store, core_docs, "manim_core")
+
+ return core_vector_store
+
+ def _process_documentation_folder(self, folder_path: str) -> List[Document]:
+ """Processes documentation files from a folder into LangChain documents.
+
+ Args:
+ folder_path (str): Path to the folder containing documentation files
+
+ Returns:
+ List[Document]: List of processed LangChain documents
+ """
+ all_docs = []
+
+ for root, _, files in os.walk(folder_path):
+ for file in files:
+ if file.endswith(('.md', '.py')):
+ file_path = os.path.join(root, file)
+ try:
+ loader = TextLoader(file_path)
+ documents = loader.load()
+ for doc in documents:
+ doc.metadata['source'] = file_path
+ all_docs.extend(documents)
+ except Exception as e:
+ print(f"Error loading file {file_path}: {e}")
+
+ if not all_docs:
+ print(f"No markdown or python files found in {folder_path}")
+ return []
+
+ # Split documents using appropriate splitters
+ split_docs = []
+ markdown_splitter = RecursiveCharacterTextSplitter.from_language(
+ language=Language.MARKDOWN
+ )
+ python_splitter = RecursiveCharacterTextSplitter.from_language(
+ language=Language.PYTHON
+ )
+
+ for doc in all_docs:
+ if doc.metadata['source'].endswith('.md'):
+ temp_docs = markdown_splitter.split_documents([doc])
+ for temp_doc in temp_docs:
+ temp_doc.page_content = f"Source: {doc.metadata['source']}\n\n{temp_doc.page_content}"
+ split_docs.extend(temp_docs)
+ elif doc.metadata['source'].endswith('.py'):
+ temp_docs = python_splitter.split_documents([doc])
+ for temp_doc in temp_docs:
+ temp_doc.page_content = f"Source: {doc.metadata['source']}\n\n{temp_doc.page_content}"
+ split_docs.extend(temp_docs)
+
+ return split_docs
+
+ def _add_documents_to_store(self, vector_store: Chroma, documents: List[Document], store_name: str):
+ """Adds documents to a vector store in batches with rate limiting.
+
+ Args:
+ vector_store (Chroma): The vector store to add documents to
+ documents (List[Document]): List of documents to add
+ store_name (str): Name of the store for logging purposes
+ """
+ print(f"Adding documents to {store_name} store")
+
+ # Calculate token statistics
+ token_lengths = [len(self.enc.encode(doc.page_content)) for doc in documents]
+ print(f"Token length statistics for {store_name}: "
+ f"Min: {min(token_lengths)}, Max: {max(token_lengths)}, "
+ f"Mean: {sum(token_lengths) / len(token_lengths):.1f}, "
+ f"Median: {statistics.median(token_lengths)}, "
+ f"Std: {statistics.stdev(token_lengths):.1f}")
+
+ import time
+
+ batch_size = 10
+ request_count = 0
+ for i in tqdm(range(0, len(documents), batch_size), desc=f"Processing {store_name} batches"):
+ batch_docs = documents[i:i + batch_size]
+ batch_ids = [str(uuid.uuid4()) for _ in batch_docs]
+ vector_store.add_documents(documents=batch_docs, ids=batch_ids)
+ request_count += 1
+ if request_count % 100 == 0:
+ time.sleep(60) # Sleep for 1 second every 100 requests
+
+ vector_store.persist()
+
+ def find_relevant_docs(self, queries: List[Dict], k: int = 5, trace_id: str = None, topic: str = None, scene_number: int = None) -> List[str]:
+ """Finds relevant documentation based on the provided queries.
+
+ Args:
+ queries (List[Dict]): List of query dictionaries with 'type' and 'query' keys
+ k (int, optional): Number of results to return per query. Defaults to 5
+ trace_id (str, optional): Trace identifier for logging. Defaults to None
+ topic (str, optional): Topic name for logging. Defaults to None
+ scene_number (int, optional): Scene number for logging. Defaults to None
+
+ Returns:
+ List[str]: Formatted string containing relevant documentation snippets
+ """
+ manim_core_formatted_results = []
+ manim_plugin_formatted_results = []
+
+ # Create a Langfuse span if enabled
+ if self.use_langfuse:
+ langfuse = Langfuse()
+ span = langfuse.span(
+ trace_id=trace_id, # Use the passed trace_id
+ name=f"RAG search for {topic} - scene {scene_number}",
+ metadata={
+ "topic": topic,
+ "scene_number": scene_number,
+ "session_id": self.session_id
+ }
+ )
+
+ # Separate queries by type
+ manim_core_queries = [query for query in queries if query["type"] == "manim-core"]
+ manim_plugin_queries = [query for query in queries if query["type"] != "manim-core" and query["type"] in self.plugin_stores]
+
+ if len([q for q in queries if q["type"] != "manim-core"]) != len(manim_plugin_queries):
+ print("Warning: Some plugin queries were skipped because their types weren't found in available plugin stores")
+
+ # Search in core manim docs
+ for query in manim_core_queries:
+ query_text = query["query"]
+ self.core_vector_store._embedding_function.parent_observation_id = span.id
+ manim_core_results = self.core_vector_store.similarity_search_with_relevance_scores(
+ query=query_text,
+ k=k,
+ score_threshold=0.5
+ )
+ for result in manim_core_results:
+ manim_core_formatted_results.append({
+ "query": query_text,
+ "source": result[0].metadata['source'],
+ "content": result[0].page_content,
+ "score": result[1]
+ })
+
+ # Search in relevant plugin docs
+ for query in manim_plugin_queries:
+ plugin_name = query["type"]
+ query_text = query["query"]
+ self.plugin_stores[plugin_name]._embedding_function.parent_observation_id = span.id
+ if plugin_name in self.plugin_stores:
+ plugin_results = self.plugin_stores[plugin_name].similarity_search_with_relevance_scores(
+ query=query_text,
+ k=k,
+ score_threshold=0.5
+ )
+ for result in plugin_results:
+ manim_plugin_formatted_results.append({
+ "query": query_text,
+ "source": result[0].metadata['source'],
+ "content": result[0].page_content,
+ "score": result[1]
+ })
+
+ print(f"Number of results before removing duplicates: {len(manim_core_formatted_results) + len(manim_plugin_formatted_results)}")
+
+ # Remove duplicates based on content
+ manim_core_unique_results = []
+ manim_plugin_unique_results = []
+ seen = set()
+ for item in manim_core_formatted_results:
+ key = item['content']
+ if key not in seen:
+ manim_core_unique_results.append(item)
+ seen.add(key)
+ for item in manim_plugin_formatted_results:
+ key = item['content']
+ if key not in seen:
+ manim_plugin_unique_results.append(item)
+ seen.add(key)
+
+ print(f"Number of results after removing duplicates: {len(manim_core_unique_results) + len(manim_plugin_unique_results)}")
+
+ total_tokens = sum(len(self.enc.encode(res['content'])) for res in manim_core_unique_results + manim_plugin_unique_results)
+ print(f"Total tokens for the RAG search: {total_tokens}")
+
+ # Update Langfuse with the deduplicated results
+ if self.use_langfuse:
+ filtered_results_markdown = json.dumps(manim_core_unique_results + manim_plugin_unique_results, indent=2)
+ span.update( # Use span.update, not span.end
+ output=filtered_results_markdown,
+ metadata={
+ "total_tokens": total_tokens,
+ "initial_results_count": len(manim_core_formatted_results) + len(manim_plugin_formatted_results),
+ "filtered_results_count": len(manim_core_unique_results) + len(manim_plugin_unique_results)
+ }
+ )
+
+ manim_core_results = "Please refer to the following Manim core documentation that may be helpful for the code generation:\n\n" + "\n\n".join([f"Content:\n````text\n{res['content']}\n````\nScore: {res['score']}" for res in manim_core_unique_results])
+ manim_plugin_results = "Please refer to the following Manim plugin documentation that may be helpful for the code generation:\n\n" + "\n\n".join([f"Content:\n````text\n{res['content']}\n````\nScore: {res['score']}" for res in manim_plugin_unique_results])
+
+ return manim_core_results + "\n\n" + manim_plugin_results
\ No newline at end of file
diff --git a/src/utils/__init__.py b/src/utils/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/src/utils/allowed_models.json b/src/utils/allowed_models.json
new file mode 100644
index 0000000000000000000000000000000000000000..2ca44ee7620dd5de7e1246e9548e986eecdee65d
--- /dev/null
+++ b/src/utils/allowed_models.json
@@ -0,0 +1,18 @@
+{
+ "allowed_models": [
+ "gemini/gemini-1.5-pro-002",
+ "gemini/gemini-1.5-flash-002",
+ "gemini/gemini-2.0-flash-001",
+ "vertex_ai/gemini-1.5-flash-002",
+ "vertex_ai/gemini-1.5-pro-002",
+ "vertex_ai/gemini-2.0-flash-001",
+ "openai/o3-mini",
+ "gpt-4o",
+ "azure/gpt-4o",
+ "azure/gpt-4o-mini",
+ "bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0",
+ "bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0",
+ "bedrock/anthropic.claude-3-5-haiku-20241022-v1:0",
+ "bedrock/us.anthropic.claude-3-7-sonnet-20250219-v1:0"
+ ]
+}
\ No newline at end of file
diff --git a/src/utils/kokoro_voiceover.py b/src/utils/kokoro_voiceover.py
new file mode 100644
index 0000000000000000000000000000000000000000..4b7ee8675a4eeb2ee840d27e490e4922e3cdf730
--- /dev/null
+++ b/src/utils/kokoro_voiceover.py
@@ -0,0 +1,117 @@
+"""
+Copyright (c) 2025 Xposed73
+All rights reserved.
+This file is part of the Manim Voiceover project.
+"""
+
+import hashlib
+import json
+import numpy as np
+from pathlib import Path
+from manim_voiceover.services.base import SpeechService
+from kokoro_onnx import Kokoro
+from manim_voiceover.helper import remove_bookmarks, wav2mp3
+from scipy.io.wavfile import write as write_wav
+from src.config.config import Config
+
+
+class KokoroService(SpeechService):
+ """Speech service class for kokoro_self (using text_to_speech via Kokoro ONNX)."""
+
+ def __init__(self, engine=None,
+ model_path: str = Config.KOKORO_MODEL_PATH,
+ voices_path: str = Config.KOKORO_VOICES_PATH,
+ voice: str = Config.KOKORO_DEFAULT_VOICE,
+ speed: float = Config.KOKORO_DEFAULT_SPEED,
+ lang: str = Config.KOKORO_DEFAULT_LANG,
+ **kwargs):
+ self.kokoro = Kokoro(model_path, voices_path)
+ self.voice = voice
+ self.speed = speed
+ self.lang = lang
+
+ if engine is None:
+ engine = self.text_to_speech # Default to local function
+
+ self.engine = engine
+ super().__init__(**kwargs)
+
+ def get_data_hash(self, input_data: dict) -> str:
+ """
+ Generates a hash based on the input data dictionary.
+ The hash is used to create a unique identifier for the input data.
+
+ Parameters:
+ input_data (dict): A dictionary of input data (e.g., text, voice, etc.).
+
+ Returns:
+ str: The generated hash as a string.
+ """
+ # Convert the input data dictionary to a JSON string (sorted for consistency)
+ data_str = json.dumps(input_data, sort_keys=True)
+ # Generate a SHA-256 hash of the JSON string
+ return hashlib.sha256(data_str.encode('utf-8')).hexdigest()
+
+ def text_to_speech(self, text, output_file, voice_name, speed, lang):
+ """
+ Generates speech from text using Kokoro ONNX and saves the audio file.
+ Normalizes the audio to make it audible.
+ """
+ # Generate audio samples using Kokoro
+ samples, sample_rate = self.kokoro.create(
+ text, voice=voice_name, speed=speed, lang=lang
+ )
+
+ # Normalize audio to the range [-1, 1]
+ max_val = np.max(np.abs(samples))
+ if max_val > 0:
+ samples = samples / max_val
+
+ # Convert to 16-bit integer PCM format
+ samples = (samples * 32767).astype("int16")
+
+ # Save the normalized audio as a .wav file
+ write_wav(output_file, sample_rate, samples)
+ print(f"Saved at {output_file}")
+
+ return output_file
+
+
+ def generate_from_text(self, text: str, cache_dir: str = None, path: str = None) -> dict:
+ if cache_dir is None:
+ cache_dir = self.cache_dir
+
+ input_data = {"input_text": text, "service": "kokoro_self", "voice": self.voice, "lang": self.lang}
+ cached_result = self.get_cached_result(input_data, cache_dir)
+ if cached_result is not None:
+ return cached_result
+
+ if path is None:
+ audio_path = self.get_data_hash(input_data) + ".mp3"
+ else:
+ audio_path = path
+
+ # Generate .wav file using the text_to_speech function
+ audio_path_wav = str(Path(cache_dir) / audio_path.replace(".mp3", ".wav"))
+ self.engine(
+ text=text,
+ output_file=audio_path_wav,
+ voice_name=self.voice,
+ speed=self.speed,
+ lang=self.lang,
+ )
+
+ # Convert .wav to .mp3
+ mp3_audio_path = str(Path(cache_dir) / audio_path)
+ wav2mp3(audio_path_wav, mp3_audio_path)
+
+ # Remove original .wav file
+ remove_bookmarks(audio_path_wav)
+
+ json_dict = {
+ "input_text": text,
+ "input_data": input_data,
+ "original_audio": audio_path,
+ }
+
+ return json_dict
\ No newline at end of file
diff --git a/src/utils/utils.py b/src/utils/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..53a47c764daa1136604338a0aff83b53760d664c
--- /dev/null
+++ b/src/utils/utils.py
@@ -0,0 +1,128 @@
+import json
+import re
+try:
+ from pylatexenc.latexencode import utf8tolatex, UnicodeToLatexEncoder
+except:
+ print("Warning: Missing pylatexenc, please do pip install pylatexenc")
+
+def _print_response(response_type: str, theorem_name: str, content: str, separator: str = "=" * 50) -> None:
+ """Print formatted responses from the video generation process.
+
+ Prints a formatted response with separators and headers for readability.
+
+ Args:
+ response_type (str): Type of response (e.g., 'Scene Plan', 'Implementation Plan')
+ theorem_name (str): Name of the theorem being processed
+ content (str): The content to print
+ separator (str, optional): Separator string for visual distinction. Defaults to 50 equals signs.
+
+ Returns:
+ None
+ """
+ print(f"\n{separator}")
+ print(f"{response_type} for {theorem_name}:")
+ print(f"{separator}\n")
+ print(content)
+ print(f"\n{separator}")
+
+def _extract_code(response_text: str) -> str:
+ """Extract code blocks from a text response.
+
+ Extracts Python code blocks delimited by ```python markers. If no code blocks are found,
+ returns the entire response text.
+
+ Args:
+ response_text (str): The text response containing code blocks
+
+ Returns:
+ str: The extracted code blocks joined by newlines, or the full response if no blocks found
+ """
+ code = ""
+ code_blocks = re.findall(r'```python\n(.*?)\n```', response_text, re.DOTALL)
+ if code_blocks:
+ code = "\n\n".join(code_blocks)
+ elif "```" not in response_text: # if no code block, return the whole response
+ code = response_text
+ return code
+
+def extract_json(response: str) -> dict:
+ """Extract and parse JSON content from a text response.
+
+ Attempts to parse the response as JSON directly, then tries to extract JSON from code blocks
+ if direct parsing fails.
+
+ Args:
+ response (str): The text response containing JSON content
+
+ Returns:
+ dict: The parsed JSON content as a dictionary, or empty list if parsing fails
+
+ Note:
+ Will attempt to parse content between ```json markers first, then between generic ``` markers
+ """
+ try:
+ evaluation_json = json.loads(response)
+ except json.JSONDecodeError:
+ # If JSON parsing fails, try to extract the content between ```json and ```
+ match = re.search(r'```json\n(.*?)\n```', response, re.DOTALL)
+ if not match:
+ # If no match for ```json, try to extract content between ``` and ```
+ match = re.search(r'```\n(.*?)\n```', response, re.DOTALL)
+
+ if match:
+ evaluation_content = match.group(1)
+ evaluation_json = json.loads(evaluation_content)
+ else:
+ # return empty list
+ evaluation_json = []
+ print(f"Warning: Failed to extract valid JSON content from {response}")
+ return evaluation_json
+
+def _fix_unicode_to_latex(text: str, parse_unicode: bool = True) -> str:
+ """Convert Unicode symbols to LaTeX source code.
+
+ Converts Unicode subscripts and superscripts to LaTeX format, with optional full Unicode parsing.
+
+ Args:
+ text (str): The text containing Unicode symbols to convert
+ parse_unicode (bool, optional): Whether to perform full Unicode to LaTeX conversion. Defaults to True.
+
+ Returns:
+ str: The text with Unicode symbols converted to LaTeX format
+ """
+ # Map of unicode subscripts to latex format
+ subscripts = {
+ "₀": "_0", "₁": "_1", "₂": "_2", "₃": "_3", "₄": "_4",
+ "₅": "_5", "₆": "_6", "₇": "_7", "₈": "_8", "₉": "_9",
+ "₊": "_+", "₋": "_-"
+ }
+ # Map of unicode superscripts to latex format
+ superscripts = {
+ "⁰": "^0", "¹": "^1", "²": "^2", "³": "^3", "⁴": "^4",
+ "⁵": "^5", "⁶": "^6", "⁷": "^7", "⁸": "^8", "⁹": "^9",
+ "⁺": "^+", "⁻": "^-"
+ }
+
+ for unicode_char, latex_format in {**subscripts, **superscripts}.items():
+ text = text.replace(unicode_char, latex_format)
+
+ if parse_unicode:
+ text = utf8tolatex(text)
+
+ return text
+
+def extract_xml(response: str) -> str:
+ """Extract XML content from a text response.
+
+ Extracts XML content between ```xml markers. Returns the full response if no XML blocks found.
+
+ Args:
+ response (str): The text response containing XML content
+
+ Returns:
+ str: The extracted XML content, or the full response if no XML blocks found
+ """
+ try:
+ return re.search(r'```xml\n(.*?)\n```', response, re.DOTALL).group(1)
+ except:
+ return response
diff --git a/task_generator/__init__.py b/task_generator/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..949046b4903d61bb48137be257f57aa6f2b1546c
--- /dev/null
+++ b/task_generator/__init__.py
@@ -0,0 +1,297 @@
+from .prompts_raw import (
+ _prompt_code_generation,
+ _prompt_fix_error,
+ _prompt_visual_fix_error,
+ _prompt_scene_plan,
+ _prompt_scene_vision_storyboard,
+ _prompt_scene_technical_implementation,
+ _prompt_scene_animation_narration,
+ _prompt_animation_simple,
+ _prompt_animation_fix_error,
+ _prompt_animation_rag_query_generation,
+ _prompt_animation_rag_query_generation_fix_error,
+ _banned_reasonings,
+ _prompt_context_learning_scene_plan,
+ _prompt_context_learning_vision_storyboard,
+ _prompt_context_learning_technical_implementation,
+ _prompt_context_learning_animation_narration,
+ _prompt_context_learning_code,
+ _prompt_detect_plugins,
+ _prompt_rag_query_generation_code,
+ _prompt_rag_query_generation_vision_storyboard,
+ _prompt_rag_query_generation_technical,
+ _prompt_rag_query_generation_narration,
+ _prompt_rag_query_generation_fix_error
+)
+from typing import Union, List
+
+def get_prompt_scene_plan(topic: str, description: str) -> str:
+ """
+ Generate a prompt for scene planning based on the given parameters.
+
+ Args:
+ topic (str): The topic of the video.
+ description (str): A brief description of the video content.
+
+ Returns:
+ str: The formatted prompt for scene planning.
+ """
+ prompt = _prompt_scene_plan.format(topic=topic, description=description)
+ return prompt
+
+def get_prompt_scene_vision_storyboard(scene_number: int, topic: str, description: str, scene_outline: str, relevant_plugins: List[str]) -> str:
+ prompt = _prompt_scene_vision_storyboard.format(
+ scene_number=scene_number,
+ topic=topic,
+ description=description,
+ scene_outline=scene_outline,
+ relevant_plugins=", ".join(relevant_plugins)
+ )
+ return prompt
+
+def get_prompt_scene_technical_implementation(scene_number: int, topic: str, description: str, scene_outline: str, scene_vision_storyboard: str, relevant_plugins: List[str], additional_context: Union[str, List[str]] = None) -> str:
+ prompt = _prompt_scene_technical_implementation.format(
+ scene_number=scene_number,
+ topic=topic,
+ description=description,
+ scene_outline=scene_outline,
+ scene_vision_storyboard=scene_vision_storyboard,
+ relevant_plugins=", ".join(relevant_plugins)
+ )
+ if additional_context is not None:
+ if isinstance(additional_context, str):
+ prompt += f"\nAdditional context: {additional_context}"
+ elif isinstance(additional_context, list):
+ prompt += f"\nAdditional context: {additional_context[0]}"
+ if len(additional_context) > 1:
+ prompt += f"\n" + "\n".join(additional_context[1:])
+ return prompt
+
+def get_prompt_scene_animation_narration(scene_number: int, topic: str, description: str, scene_outline: str, scene_vision_storyboard: str, technical_implementation_plan: str, relevant_plugins: List[str]) -> str:
+ prompt = _prompt_scene_animation_narration.format(
+ scene_number=scene_number,
+ topic=topic,
+ description=description,
+ scene_outline=scene_outline,
+ scene_vision_storyboard=scene_vision_storyboard,
+ technical_implementation_plan=technical_implementation_plan,
+ relevant_plugins=", ".join(relevant_plugins)
+ )
+ return prompt
+
+def get_prompt_code_generation(topic: str,
+ description: str,
+ scene_outline: str,
+ scene_implementation: str,
+ scene_number: int,
+ additional_context: Union[str, List[str]] = None) -> str:
+ """
+ Generate a prompt for code generation based on the given video plan and implementation details.
+
+ Args:
+ topic (str): The topic of the video.
+ description (str): A brief description of the video content.
+ scene_outline (str): The scene outline.
+ scene_implementation (str): The detailed scene implementation.
+ scene_number (int): The scene number
+ additional_context (Union[str, List[str]]): Additional context to include in the prompt
+ Returns:
+ str: The formatted prompt for code generation.
+ """
+ prompt = _prompt_code_generation.format(
+ topic=topic,
+ description=description,
+ scene_outline=scene_outline,
+ scene_implementation=scene_implementation,
+ scene_number=scene_number
+ )
+ if additional_context is not None:
+ if isinstance(additional_context, str):
+ prompt += f"\nAdditional context: {additional_context}"
+ elif isinstance(additional_context, list):
+ prompt += f"\nAdditional context: {additional_context[0]}"
+ if len(additional_context) > 1:
+ prompt += f"\n" + "\n".join(additional_context[1:])
+ return prompt
+
+def get_prompt_fix_error(implementation_plan: str, manim_code: str, error: str, additional_context: Union[str, List[str]] = None) -> str:
+ """
+ Generate a prompt to fix errors in the given manim code.
+
+ Args:
+ implementation_plan (str): The implementation plan of the scene.
+ code (str): The manim code with errors.
+ error (str): The error message encountered.
+
+ Returns:
+ str: The formatted prompt to fix the code errors.
+ """
+ prompt = _prompt_fix_error.format(
+ implementation_plan=implementation_plan,
+ manim_code=manim_code,
+ error_message=error
+ )
+ if additional_context is not None:
+ if isinstance(additional_context, str):
+ prompt += f"\nAdditional context: {additional_context}"
+ elif isinstance(additional_context, list) and additional_context:
+ prompt += f"\nAdditional context: {additional_context[0]}"
+ if len(additional_context) > 1:
+ prompt += f"\n" + "\n".join(additional_context[1:])
+ return prompt
+
+def get_prompt_visual_fix_error(implementation: str, generated_code: str) -> str:
+ prompt = _prompt_visual_fix_error.format(
+ implementation=implementation,
+ generated_code=generated_code
+ )
+ return prompt
+
+def get_banned_reasonings() -> List[str]:
+ return _banned_reasonings.split("\n")
+
+def get_prompt_rag_query_generation_vision_storyboard(scene_plan: str, relevant_plugins: str) -> str:
+ prompt = _prompt_rag_query_generation_vision_storyboard.format(
+ scene_plan=scene_plan,
+ relevant_plugins=relevant_plugins
+ )
+ return prompt
+
+def get_prompt_rag_query_generation_technical(storyboard: str, relevant_plugins: str) -> str:
+ """For generating RAG queries during storyboard to technical implementation stage"""
+ prompt = _prompt_rag_query_generation_technical.format(
+ storyboard=storyboard,
+ relevant_plugins=relevant_plugins
+ )
+ return prompt
+
+def get_prompt_rag_query_generation_narration(storyboard: str, relevant_plugins: str) -> str:
+ """For generating RAG queries during storyboard to narration stage"""
+ prompt = _prompt_rag_query_generation_narration.format(
+ storyboard=storyboard,
+ relevant_plugins=relevant_plugins
+ )
+ return prompt
+
+def get_prompt_rag_query_generation_code(implementation_plan: str, relevant_plugins: str) -> str:
+ """For generating RAG queries during technical implementation to code generation stage"""
+ prompt = _prompt_rag_query_generation_code.format(
+ implementation_plan=implementation_plan,
+ relevant_plugins=relevant_plugins
+ )
+ return prompt
+
+def get_prompt_rag_query_generation_fix_error(error: str, code: str, relevant_plugins: str) -> str:
+ prompt = _prompt_rag_query_generation_fix_error.format(
+ error=error,
+ code=code,
+ relevant_plugins=relevant_plugins
+ )
+ return prompt
+
+def get_prompt_context_learning_scene_plan(examples: str) -> str:
+ prompt = _prompt_context_learning_scene_plan.format(
+ examples=examples
+ )
+ return prompt
+
+def get_prompt_context_learning_vision_storyboard(examples: str) -> str:
+ prompt = _prompt_context_learning_vision_storyboard.format(
+ examples=examples
+ )
+ return prompt
+
+def get_prompt_context_learning_technical_implementation(examples: str) -> str:
+ prompt = _prompt_context_learning_technical_implementation.format(
+ examples=examples
+ )
+ return prompt
+
+def get_prompt_context_learning_animation_narration(examples: str) -> str:
+ prompt = _prompt_context_learning_animation_narration.format(
+ examples=examples
+ )
+ return prompt
+
+def get_prompt_context_learning_code(examples: str) -> str:
+ prompt = _prompt_context_learning_code.format(
+ examples=examples
+ )
+ return prompt
+
+def get_prompt_detect_plugins(topic: str, description: str, plugin_descriptions: str) -> str:
+ """
+ Generate a prompt for detecting relevant plugins based on topic and description.
+
+ Args:
+ topic (str): The video topic
+ description (str): The video description
+ plugin_descriptions (str): JSON string of available plugin descriptions
+
+ Returns:
+ str: The formatted prompt for plugin detection
+ """
+ prompt = _prompt_detect_plugins.format(
+ topic=topic,
+ description=description,
+ plugin_descriptions=plugin_descriptions
+ )
+ return prompt
+
+def get_prompt_animation(topic: str, description: str, additional_context: Union[str, List[str]] = None) -> str:
+ prompt = _prompt_animation_simple.format(
+ topic=topic,
+ description=description
+ )
+ if additional_context is not None:
+ if isinstance(additional_context, str):
+ prompt += f"\nAdditional context: {additional_context}"
+ elif isinstance(additional_context, list) and additional_context:
+ prompt += f"\nAdditional context: {additional_context[0]}"
+ if len(additional_context) > 1:
+ prompt += f"\n" + "\n".join(additional_context[1:])
+ return prompt
+
+def get_prompt_animation_fix_error(text_explanation: str, manim_code: str, error: str, additional_context: Union[str, List[str]] = None) -> str:
+ """
+ Generate a prompt to fix errors in the given manim code.
+
+ Args:
+ text_explanation (str): The implementation plan of the scene.
+ code (str): The manim code with errors.
+ error (str): The error message encountered.
+
+ Returns:
+ str: The formatted prompt to fix the code errors.
+ """
+ prompt = _prompt_animation_fix_error.format(
+ text_explanation=text_explanation,
+ manim_code=manim_code,
+ error_message=error
+ )
+ if additional_context is not None:
+ if isinstance(additional_context, str):
+ prompt += f"\nAdditional context: {additional_context}"
+ elif isinstance(additional_context, list):
+ prompt += f"\nAdditional context: {additional_context[0]}"
+ if len(additional_context) > 1:
+ prompt += f"\n" + "\n".join(additional_context[1:])
+ return prompt
+
+def get_prompt_animation_rag_query_generation(topic: str, context: str, relevant_plugins: str) -> str:
+ if context is None:
+ context = ""
+ prompt = _prompt_animation_rag_query_generation.format(
+ topic=topic,
+ context=context,
+ relevant_plugins=relevant_plugins
+ )
+ return prompt
+
+def get_prompt_animation_rag_query_generation_fix_error(text_explanation: str, error: str, code: str) -> str:
+ prompt = _prompt_animation_rag_query_generation_fix_error.format(
+ text_explanation=text_explanation,
+ error=error,
+ code=code
+ )
+ return prompt
\ No newline at end of file
diff --git a/task_generator/parse_prompt.py b/task_generator/parse_prompt.py
new file mode 100644
index 0000000000000000000000000000000000000000..075ce754894bb73bc60d47c100c951b98c2a7e9b
--- /dev/null
+++ b/task_generator/parse_prompt.py
@@ -0,0 +1,54 @@
+import os
+from tqdm import tqdm
+
+
+def call_parse_prompt():
+ """
+ Find the prompts_raw directory and generate an __init__.py file containing prompt texts.
+
+ Searches for prompts_raw directory in current and parent directories. Once found,
+ calls create_python_file_with_texts() to generate the __init__.py file.
+ """
+ current_file_path = os.path.abspath(__file__)
+ current_folder_path = os.path.dirname(current_file_path)
+ folder_path = os.path.join(current_folder_path, "prompts_raw")
+
+ # If prompts_raw not found in current directory, search parent directories
+ if not os.path.exists(folder_path):
+ parent_dir = current_folder_path
+ while parent_dir != os.path.dirname(parent_dir): # Stop at root directory
+ parent_dir = os.path.dirname(parent_dir)
+ test_path = os.path.join(parent_dir, "prompts_raw")
+ if os.path.exists(test_path):
+ folder_path = test_path
+ break
+
+ output_file = os.path.join(folder_path, "__init__.py")
+ create_python_file_with_texts(folder_path, output_file)
+
+
+def create_python_file_with_texts(folder_path: str, output_file: str) -> None:
+ """
+ Generate a Python file containing prompt texts from .txt files.
+
+ Args:
+ folder_path (str): Path to directory containing prompt .txt files
+ output_file (str): Path where the generated Python file will be saved
+
+ The function reads all .txt files in the given folder, converts their contents
+ into Python variables, and writes them to the output file. Variable names are
+ derived from file paths with special characters replaced.
+ """
+ with open(output_file, 'w', encoding='utf-8') as out_file:
+ out_file.write("# This file is generated automatically through parse_prompt.py\n\n")
+ txt_files = [file for root, dirs, files in os.walk(folder_path) for file in files if file.endswith(".txt")]
+ for file in tqdm(txt_files, desc="Processing files"):
+ file_path = os.path.join(folder_path, file)
+ var_name = "_" + file_path.replace(folder_path, "").replace(os.sep, "_").replace(".txt", "").strip("_")
+ with open(file_path, 'r', encoding='utf-8') as f:
+ content = f.read().replace('"""', '\"\"\"')
+ out_file.write(f'{var_name} = """{content}"""\n\n')
+
+
+if __name__ == "__main__":
+ call_parse_prompt()
\ No newline at end of file
diff --git a/task_generator/prompts_raw/__init__.py b/task_generator/prompts_raw/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..eb0c2c4d06b4086f4307f168d7092a0dd780f026
--- /dev/null
+++ b/task_generator/prompts_raw/__init__.py
@@ -0,0 +1,1877 @@
+# This file is generated automatically through parse_prompt.py
+
+_prompt_context_learning_scene_plan = """Here are some example scene plans to help guide your scene planning:
+
+{examples}
+
+Please follow a similar structure while maintaining creativity and relevance to the current topic."""
+
+_prompt_scene_vision_storyboard = """You are an expert in educational video production and Manim animation.
+**Reminder:** Each scene's vision and storyboard plan is entirely self-contained. There is no dependency on any implementation from previous or subsequent scenes. However, the narration will treat all scenes as part of a single, continuous video.
+
+Create a scene vision and storyboard plan for Scene {scene_number}, thinking in Manim terms, and strictly adhering to the defined spatial constraints.
+
+Topic: {topic}
+Description: {description}
+
+Scene Overview:
+{scene_outline}
+
+The following manim plugins are relevant to the scene:
+{relevant_plugins}
+
+**Spatial Constraints (Strictly Enforced):**
+* **Safe area margins:** 0.5 units on all sides from the scene edges. *All objects must be positioned within these margins.*
+* **Minimum spacing:** 0.3 units between any two Manim objects (measured edge to edge). *Ensure a minimum spacing of 0.3 units to prevent overlaps and maintain visual clarity. This spacing must be maintained between all objects in the scene, including text, shapes, and graphs.*
+
+**Positioning Requirements:**
+1. Safe area margins (0.5 units).
+2. Minimum spacing between objects (0.3 units).
+3. Relative positioning (`next_to`, `align_to`, `shift`) from `ORIGIN`, margins, or object references. **No absolute coordinates are allowed.** All positioning MUST be relative and clearly specified using reference points and relative positioning methods.
+4. Transition buffers (`Wait` times) between sub-scenes and animation steps for visual clarity and pacing.
+
+**Diagrams/Sketches (Optional but Recommended for Complex Scenes):**
+* For complex scenes, consider including a simple diagram or sketch (even text-based) of the intended layout to visually clarify spatial relationships and ensure adherence to spacing and margin constraints.
+
+**Focus:**
+* Focus on clear visual communication of the scene's learning objective through effective use of Manim objects and animations, while strictly adhering to the defined spatial constraints.
+* Provide detailed visual descriptions in Manim terms to guide human implementation.
+* Prioritize explanation and visualization of the theorem. Do not include any promotional elements or quiz sessions.
+* Minimize text usage - rely primarily on visual elements, mathematical notation, and animations to convey concepts. Use text sparingly and only when necessary for clarity.
+
+**Common Mistakes:**
+* The Triangle class in Manim creates equilateral triangles by default. To create a right-angled triangle, use the Polygon class instead.
+
+**Manim Plugins:**
+* Consider using established Manim plugins if they significantly simplify the implementation or offer visual elements not readily available in core Manim. If a plugin is used, clearly indicate this in the storyboard with a note like "**Plugin Suggestion:** Consider using the `manim-plugin-name` plugin for [brief explanation of benefit]."
+
+You MUST generate the scene vision and storyboard plan for the scene in the following format (from ```xml to ```):
+
+```xml
+
+[SCENE_VISION]
+1. **Scene Overview**:
+ - Scene story, key takeaway, video role. *Consider how this scene fits within the overall video narrative.*
+ - **Visual learning objectives for viewers:** Think about *specific Manim object types* that best represent the learning objective. Example: "Visualize roots as `Dot` objects on an `Axes` graph." Be specific about Manim object classes (e.g., `MathTex`, `Shapes`, `Graphs`, `Axes`, `VGroup`). If a plugin provides a relevant object type, mention it (e.g., "Visualize X using `PluginObject` from `manim-plugin-name`").
+ - How Manim visuals & animations support learning? Consider `MathTex`, `Shapes`, `Graphs`, `Axes`, `VGroup`. Focus on spatial arrangement and clarity, ensuring adherence to safe area margins and minimum spacing (0.3 units). Consider using `VGroup` to group related formula components for easier animation and spatial control. Example: "Use `VGroup` to group related formula components for easier animation and spatial control, ensuring a minimum spacing of 0.3 units between VGroup and other scene elements." If a plugin offers a more efficient way to achieve a visual effect, mention it.
+ - Key concepts to emphasize visually using visual hierarchy and spatial arrangement in Manim, while respecting safe area margins and minimum spacing (0.3 units). **Use `MathTex` for mathematical expressions and equations. Use `Tex` for general text, titles, labels, and any non-mathematical text. When mixing text with mathematical symbols in `MathTex`, use the `\\text{{}}` command (e.g., `MathTex(r"\\text{{Area}} = \\pi r^2")`)**
+
+[STORYBOARD]
+1. **Visual Flow & Pacing (Manim Animation Sequence)**:
+ - Describe the sequence of Manim visuals and animations (`Text`, `Circle`, `Arrow`, `Create`, `FadeIn`, `Transform`, etc.). Be specific about animation types and their parameters (e.g., `run_time`). If a plugin provides a specific animation type, mention it (e.g., "Use `PluginAnimation` from `manim-plugin-name`").
+ - Key visual moments: composition and arrangement of Manim elements, ensuring all elements are within safe area margins and maintain a minimum 0.3 unit spacing. Example: "`MathTex` formula center (`.move_to(ORIGIN)`) with `Write` animation, ensuring 0.3 unit spacing from scene edges and other elements."
+ - Visual transitions between ideas using Manim animations (`Transform`, `Shift`, `FadeOutAndShift`, etc.). Specify transition animations and their timings.
+ - Scene pacing (pauses, action) and Manim animation timing's role. Use `Wait()` for transition buffers and visual clarity.
+ - **Sub-scene Breakdown**: Divide the scene into logical sub-scenes, each focusing on a specific step in the explanation or visualization.
+ - For each sub-scene, start with a **Visual Element**: The primary visual component that drives the explanation (e.g., mathematical notation, diagram, graph). If this element comes from a plugin, clearly state this (e.g., "Visual Element: `PluginObject` from `manim-plugin-name`").
+ - Detail the **Animation Sequence**: Describe step-by-step the Manim animations and visual elements for each sub-scene. Be specific about:
+ - **Text Usage Guidelines:**
+ - **Use `MathTex` *only* for mathematical expressions and equations.**
+ - **Use `Tex` for all other text, including labels, explanations, and titles.**
+ - **When mixing text with mathematical symbols in `MathTex`, wrap the text portions in `\\text{{}}`. Example: `MathTex(r"\\text{{Area of circle}} = \\pi r^2")`.**
+ - Manim object classes (`MathTex`, `Circle`, `Arrow`, `Axes`, `Plot`, `Line`, `VGroup`, etc.), prioritizing mathematical notation and visual elements over text. Include plugin object classes where appropriate.
+ - Animation types (`Create`, `Write`, `FadeIn`, `Transform`, `FadeOut`, `Circumscribe`, `FocusOn`, etc.) and their parameters (e.g., `run_time`). Include plugin animation types where appropriate.
+ - Positioning of objects using relative positioning methods (`.next_to()`, `.align_to()`, `.shift()`, `.to_corner()`, `.move_to(ORIGIN)`, etc.) and references to other objects or scene elements. **No absolute coordinates allowed.**
+ - Color and style specifications (e.g., `color=BLUE`, `stroke_width=2`, `dashed=True`).
+ - Explicitly mention safe area margins and minimum spacing (0.3 units) for all objects within each sub-scene.
+
+
+```"""
+
+_prompt_rag_query_generation_storyboard = """You are an expert in generating search queries specifically for **Manim (Community Edition) documentation** (both core Manim and its plugins). Your task is to transform a storyboard plan for a Manim video scene into effective queries that will retrieve relevant information from Manim documentation. The storyboard plan describes the scene's visual elements and narrative flow.
+
+Here is the storyboard plan:
+
+{storyboard}
+
+Based on the storyboard plan, generate multiple human-like queries (maximum 10) for retrieving relevant documentation. Please ensure that the search targets are different so that the RAG can retrieve a diverse set of documents covering various aspects of the implementation.
+
+**Specifically, ensure that:**
+1. At least some queries are focused on retrieving information about **Manim core functionalities**, like general visual elements or animations. Frame these queries using Manim terminology (classes, methods, concepts).
+2. If the storyboard suggests using specific visual effects or complex animations that might be plugin-related, include at least 1 query specifically targeting **plugin documentation**. Make sure to mention the plugin name if known or suspected.
+3. Queries should be general enough to explore different possibilities within Manim and its plugins based on the storyboard's visual and narrative descriptions, but also specific enough to target Manim documentation effectively.
+
+The above storyboard might be relevant to these plugins: {relevant_plugins}.
+Note that you MUST NOT use the plugins that are not listed above.
+
+Output the queries in the following format:
+```json
+[
+ {{"query": "content of query 1", "type": "manim_core/{relevant_plugins}"}},
+ {{"query": "content of query 2", "type": "manim_core/{relevant_plugins}"}},
+ {{"query": "content of query 3", "type": "manim_core/{relevant_plugins}"}},
+ {{"query": "content of query 4", "type": "manim_core/{relevant_plugins}"}},
+ {{"query": "content of query 5", "type": "manim_core/{relevant_plugins}"}},
+ {{"query": "content of query 6", "type": "manim_core/{relevant_plugins}"}},
+ {{"query": "content of query 7", "type": "manim_core/{relevant_plugins}"}},
+]
+``` """
+
+_code_background = """PLEASE DO NOT create another color background Rectangles. Default background (Black) is enough.
+PLEASE DO NOT use BLACK color for any text.
+"""
+
+_prompt_context_learning_vision_storyboard = """Here are some example vision and storyboard plans to help guide your planning:
+
+{examples}
+
+Please follow a similar structure while maintaining creativity and relevance to the current scene."""
+
+_prompt_context_learning_code = """Here are some example Manim code implementations to help guide your code generation:
+
+{examples}
+
+Please follow similar patterns and best practices while implementing the current scene."""
+
+_code_limit = """Note that the frame width and height are 14.222222222222221 and 8.0 respectively. And the center of the frame is (0, 0, 0).
+It means to avoid putting any object out of the frame, you should limit the x and y coordinates of the objects.
+limit x to be within -7.0 and 7.0 for objects, and limit y to be within -4.0 and 4.0 for objects.
+Place the objects near the center of the frame, without overlapping with each other."""
+
+_prompt_animation_rag_query_generation = """You are an expert in Manim (Community Edition) and its plugins. Your task is to transform a topic for a Manim animation scene into queries that can be used to retrieve relevant documentation from both Manim core and any relevant plugins.
+
+Your queries should include keywords related to the specific Manim classes, methods, functions, and *concepts* that are likely to be used to implement the scene, including any plugin-specific functionality. Focus on extracting the core concepts, actions, and vocabulary from the *entire* scene plan. Generate queries that are concise and target different aspects of the documentation (class reference, method usage, animation examples, conceptual explanations) across both Manim core and relevant plugins.
+
+Here is the Topic (and the context):
+
+{topic}. {context}
+
+Based on the topic and the context, generate multiple human-like queries (maximum 5-7) for retrieving relevant documentation. Please ensure that the search targets are different so that the RAG can retrieve a diverse set of documents covering various aspects of the implementation.
+
+**Specifically, ensure that:**
+1. At least 1-2 queries are focused on retrieving information about Manim *function usage* in Manim scenes
+2. If the topic and the context can be linked to the use of plugin functionality, include at least 1 query specifically targeting plugin documentation
+3. Queries should be specific enough to distinguish between core Manim and plugin functionality when relevant
+
+The above text explanations are relevant to these plugins: {relevant_plugins}
+
+Output the queries in the following format:
+```json
+[
+ {{"query": "content of query 1", "type": "manim_core/name_of_the_plugin"}},
+ {{"query": "content of query 2", "type": "manim_core/name_of_the_plugin"}},
+ {{"query": "content of query 3", "type": "manim_core/name_of_the_plugin"}},
+ {{"query": "content of query 4", "type": "manim_core/name_of_the_plugin"}},
+ {{"query": "content of query 5", "type": "manim_core/name_of_the_plugin"}},
+ {{"query": "content of query 6", "type": "manim_core/name_of_the_plugin"}},
+ {{"query": "content of query 7", "type": "manim_core/name_of_the_plugin"}},
+]
+```"""
+
+_code_font_size = """If there is title text, font size is highly recommended to be 28.
+If there are side labels, font size is highly recommended to be 24.
+If there are formulas, font size is highly recommended to be 24.
+
+However, if the text has more than 10 words, font size should be reduced further and mutiple lines should be used."""
+
+_prompt_best_practices = """# Best practices for generating educational videos with manim
+
+1. Specify positions as relative to other objects whenever it makes sense.
+ * For example, if you want to place a label for a geometric object.
+2. Objects should be of different color from the black background.
+3. Keep the text on screen concise.
+ * On-screen elements should focus on showcasing the concept, examples and visuals. Labels and illustrative text are still encouraged.
+ * For explanations and observations, prefer narrations over on-screen text.
+ * You should still show calculations and algorithms in full on screen.
+ * For examples and practice problems, it is reasonable to show more text, especially key statements.
+ * Longer text should appear smaller to fit on screen.
+4. To control the timing of objects appearing:
+ * `add` has instantaneous effect, best used for the initial setup of the scene.
+ * Animations are best used during narration.
+ * Make sure the animations make sense. If an object is already on screen, it makes no sense to fade it in or create it again.
+5. Use TeX or MathTeX whenever you want to display math, including symbols and formulas.
+"""
+
+_prompt_scene_plan = """You are an expert in educational video production, instructional design, and {topic}. Please design a high-quality video to provide in-depth explanation on {topic}.
+
+**Video Overview:**
+
+Topic: {topic}
+Description: {description}
+
+**Scene Breakdown:**
+
+Plan individual scenes. For each scene please provide the following:
+
+* **Scene Title:** Short, descriptive title (2-5 words).
+* **Scene Purpose:** Learning objective of this scene. How does it connect to previous scenes?
+* **Scene Description:** Detailed description of scene content.
+* **Scene Layout:** Detailedly describe the spatial layout concept. Consider safe area margins and minimum spacing between objects.
+
+Please generate the scene plan for the video in the following format:
+
+```xml
+
+
+ Scene Title: [Title]
+ Scene Purpose: [Learning objective, connection to previous scene]
+ Scene Description: [Brief content description]
+ Scene Layout: [Spatial layout concept, consider safe area and spacing]
+
+
+
+ ...
+
+...
+
+```
+
+**Spatial Constraints:**
+* **Safe area margins:** 0.5 units on all sides from the scene edges. *All objects must be positioned within these margins.*
+* **Minimum spacing:** 0.3 units between any two Manim objects (measured edge to edge). *Ensure adequate spacing to prevent overlaps and maintain visual clarity.*
+
+Requirements:
+1. Scenes must build progressively, starting from foundational concepts and advancing to more complex ideas to ensure a logical flow of understanding for the viewer. Each scene should naturally follow from the previous one, creating a cohesive learning narrative. Start with simpler scene layouts and progressively increase complexity in later scenes.
+2. The total number of scenes should be between 3 and 7.
+3. Learning objectives should be distributed evenly across the scenes.
+4. The total video duration must be under 15 minutes.
+5. It is essential to use the exact output format, tags, and headers as specified in the prompt.
+6. Maintain consistent formatting throughout the entire scene plan.
+7. **No External Assets:** Do not import any external files (images, audio, video). *Use only Manim built-in elements and procedural generation.
+8. **Focus on in-depth explanation of the theorem. Do not include any promotional elements (like YouTube channel promotion, subscribe messages, or external resources) or quiz sessions. Detailed example questions are acceptable and encouraged.**
+
+Note: High-level plan. Detailed scene specifications will be generated later, ensuring adherence to safe area margins and minimum spacing. The spatial constraints defined above will be strictly enforced in subsequent planning stages."""
+
+_prompt_rag_query_generation_technical = """You are an expert in generating search queries specifically for **Manim (Community Edition) documentation** (both core Manim and its plugins). Your task is to analyze a storyboard plan and generate effective queries that will retrieve relevant technical documentation about implementation details.
+
+Here is the storyboard plan:
+
+{storyboard}
+
+Based on this storyboard plan, generate multiple human-like queries (maximum 10) for retrieving relevant technical documentation.
+
+**Specifically, ensure that:**
+1. Queries focus on retrieving information about **core Manim functionality** and implementation details
+2. Include queries about **complex animations and effects** described in the storyboard
+3. If the storyboard suggests using plugin functionality, include specific queries targeting those plugin's technical documentation
+
+The above storyboard plan is relevant to these plugins: {relevant_plugins}
+Note that you MUST NOT use the plugins that are not listed above.
+
+You MUST only output the queries in the following JSON format (with json triple backticks):
+```json
+[
+ {{"type": "manim-core", "query": "content of core functionality query"}},
+ {{"type": "", "query": "content of plugin-specific query"}},
+ {{"type": "manim-core", "query": "content of animation technique query"}}
+ ...
+]
+``` """
+
+_prompt_rag_query_generation_fix_error = """You are an expert in generating search queries specifically for **Manim (Community Edition) documentation** (both core Manim and its plugins). Your task is to transform a Manim error and its associated code into effective queries that will retrieve relevant information from Manim documentation.
+
+Here is the error message:
+{error}
+
+Here is the Manim code that caused the error:
+{code}
+
+Based on the error and code, generate multiple human-like queries (maximum 10) for retrieving relevant documentation. Please ensure that the search targets are different so that the RAG can retrieve a diverse set of documents covering various aspects of the implementation.
+
+**Specifically, ensure that:**
+1. At least some queries are focused on retrieving information about **Manim function usage** in scenes. Frame these queries to target function definitions, usage examples, and parameter details within Manim documentation.
+2. If the error suggests using plugin functionality, include at least 1 query specifically targeting **plugin documentation**. Clearly mention the plugin name in these queries to focus the search.
+3. Queries should be specific enough to distinguish between core Manim and plugin functionality when relevant, and to target the most helpful sections of the documentation (API reference, tutorials, examples).
+
+The above error and code are relevant to these plugins: {relevant_plugins}.
+Note that you MUST NOT use the plugins that are not listed above.
+
+You MUST only output the queries in the following JSON format (with json triple backticks):
+```json
+[
+ {{"type": "manim-core", "query": "content of function usage query"}},
+ {{"type": "", "query": "content of plugin-specific query"}},
+ {{"type": "manim-core", "query": "content of API reference query"}}
+ ...
+]
+``` """
+
+_code_disable = """"""
+
+_prompt_manim_cheatsheet = """The followings are the inheritance diagram of the Manim library. You can take as reference to select which class to use for the animation.
+
+```
+digraph Animation {
+ "AddTextLetterByLetter"
+ "ShowIncreasingSubsets"
+ "ShowIncreasingSubsets" -> "AddTextLetterByLetter"
+ "AddTextWordByWord";
+ "Succession";
+ "Succession" -> "AddTextWordByWord";
+ "AnimatedBoundary";
+ "VGroup";
+ "VGroup" -> "AnimatedBoundary";
+ "Animation";
+ "AnimationGroup";
+ "Animation" -> "AnimationGroup";
+ "ApplyComplexFunction";
+ "ApplyMethod";
+ "ApplyMethod" -> "ApplyComplexFunction";
+ "ApplyFunction";
+ "Transform";
+ "Transform" -> "ApplyFunction";
+ "ApplyMatrix";
+ "ApplyPointwiseFunction";
+ "ApplyPointwiseFunction" -> "ApplyMatrix";
+ "ApplyMethod";
+ "Transform" -> "ApplyMethod";
+ "ApplyPointwiseFunction";
+ "ApplyMethod" -> "ApplyPointwiseFunction";
+ "ApplyPointwiseFunctionToCenter";
+ "ApplyPointwiseFunction" -> "ApplyPointwiseFunctionToCenter";
+ "ApplyWave";
+ "Homotopy";
+ "Homotopy" -> "ApplyWave";
+ "Broadcast";
+ "LaggedStart";
+ "LaggedStart" -> "Broadcast";
+ "ChangeDecimalToValue";
+ "ChangingDecimal";
+ "ChangingDecimal" -> "ChangeDecimalToValue";
+ "ChangeSpeed";
+ "Animation" -> "ChangeSpeed";
+ "ChangingDecimal";
+ "Animation" -> "ChangingDecimal";
+ "Circumscribe";
+ "Succession" -> "Circumscribe";
+ "ClockwiseTransform";
+ "Transform" -> "ClockwiseTransform";
+ "ComplexHomotopy";
+ "Homotopy" -> "ComplexHomotopy";
+ "CounterclockwiseTransform";
+ "Transform" -> "CounterclockwiseTransform";
+ "Create";
+ "ShowPartial";
+ "ShowPartial" -> "Create";
+ "CyclicReplace";
+ "Transform" -> "CyclicReplace";
+ "DrawBorderThenFill";
+ "Animation" -> "DrawBorderThenFill";
+ "FadeIn";
+ "FadeOut";
+ "FadeToColor";
+ "ApplyMethod" -> "FadeToColor";
+ "FadeTransform";
+ "Transform" -> "FadeTransform";
+ "FadeTransformPieces";
+ "FadeTransform" -> "FadeTransformPieces";
+ "Flash";
+ "AnimationGroup" -> "Flash";
+ "FocusOn";
+ "Transform" -> "FocusOn";
+ "GrowArrow";
+ "GrowFromPoint";
+ "GrowFromPoint" -> "GrowArrow";
+ "GrowFromCenter";
+ "GrowFromPoint" -> "GrowFromCenter";
+ "GrowFromEdge";
+ "GrowFromPoint" -> "GrowFromEdge";
+ "GrowFromPoint";
+ "Transform" -> "GrowFromPoint";
+ "Homotopy";
+ "Animation" -> "Homotopy";
+ "Indicate";
+ "Transform" -> "Indicate";
+ "LaggedStart";
+ "AnimationGroup" -> "LaggedStart";
+ "LaggedStartMap";
+ "LaggedStart" -> "LaggedStartMap";
+ "MaintainPositionRelativeTo";
+ "Animation" -> "MaintainPositionRelativeTo";
+ "Mobject";
+ "MoveAlongPath";
+ "Animation" -> "MoveAlongPath";
+ "MoveToTarget";
+ "Transform" -> "MoveToTarget";
+ "PhaseFlow";
+ "Animation" -> "PhaseFlow";
+ "RemoveTextLetterByLetter";
+ "AddTextLetterByLetter" -> "RemoveTextLetterByLetter";
+ "ReplacementTransform";
+ "Transform" -> "ReplacementTransform";
+ "Restore";
+ "ApplyMethod" -> "Restore";
+ "Rotate";
+ "Transform" -> "Rotate";
+ "Rotating";
+ "Animation" -> "Rotating";
+ "ScaleInPlace";
+ "ApplyMethod" -> "ScaleInPlace";
+ "ShowIncreasingSubsets";
+ "Animation" -> "ShowIncreasingSubsets";
+ "ShowPartial";
+ "Animation" -> "ShowPartial";
+ "ShowPassingFlash";
+ "ShowPartial" -> "ShowPassingFlash";
+ "ShowPassingFlashWithThinningStrokeWidth";
+ "AnimationGroup" -> "ShowPassingFlashWithThinningStrokeWidth";
+ "ShowSubmobjectsOneByOne";
+ "ShowIncreasingSubsets" -> "ShowSubmobjectsOneByOne";
+ "ShrinkToCenter";
+ "ScaleInPlace" -> "ShrinkToCenter";
+ "SmoothedVectorizedHomotopy";
+ "Homotopy" -> "SmoothedVectorizedHomotopy";
+ "SpinInFromNothing";
+ "GrowFromCenter" -> "SpinInFromNothing";
+ "SpiralIn";
+ "Animation" -> "SpiralIn";
+ "Succession";
+ "AnimationGroup" -> "Succession";
+ "Swap";
+ "CyclicReplace" -> "Swap";
+ "TracedPath";
+ "VMobject";
+ "VMobject" -> "TracedPath";
+ "Transform";
+ "Animation" -> "Transform";
+ "TransformAnimations";
+ "Transform" -> "TransformAnimations";
+ "TransformFromCopy";
+ "Transform" -> "TransformFromCopy";
+ "TransformMatchingAbstractBase";
+ "AnimationGroup" -> "TransformMatchingAbstractBase";
+ "TransformMatchingShapes";
+ "TransformMatchingAbstractBase" -> "TransformMatchingShapes";
+ "TransformMatchingTex";
+ "TransformMatchingAbstractBase" -> "TransformMatchingTex";
+ "Uncreate";
+ "Create" -> "Uncreate";
+ "Unwrite";
+ "Write";
+ "Write" -> "Unwrite";
+ "UpdateFromAlphaFunc";
+ "UpdateFromFunc";
+ "UpdateFromFunc" -> "UpdateFromAlphaFunc";
+ "UpdateFromFunc";
+ "Animation" -> "UpdateFromFunc";
+ "VGroup";
+ "VMobject" -> "VGroup";
+ "VMobject";
+ "Mobject" -> "VMobject";
+
+ "Wait";
+ "Animation" -> "Wait";
+ "Wiggle";
+ "Animation" -> "Wiggle";
+ "Write";
+ "DrawBorderThenFill" -> "Write";
+}
+```
+
+
+```
+digraph Camera {
+ "BackgroundColoredVMobjectDisplayer"
+ "Camera"
+ "MappingCamera"
+ "Camera" -> "MappingCamera"
+ "MovingCamera"
+ "Camera" -> "MovingCamera"
+ "MultiCamera"
+ "MovingCamera" -> "MultiCamera"
+ "OldMultiCamera"
+ "Camera" -> "OldMultiCamera"
+ "SplitScreenCamera"
+ "OldMultiCamera" -> "SplitScreenCamera"
+ "ThreeDCamera"
+ "Camera" -> "ThreeDCamera"
+}
+```
+
+```
+digraph MObject {
+ "AbstractImageMobject"
+ "Mobject" -> "AbstractImageMobject"
+ "Angle"
+ "VMobject" -> "Angle"
+ "AnnotationDot"
+ "Dot" -> "AnnotationDot"
+ "AnnularSector"
+ "Arc" -> "AnnularSector"
+ "Annulus"
+ "Circle" -> "Annulus"
+ "Arc"
+ "TipableVMobject" -> "Arc"
+ "ArcBetweenPoints"
+ "Arc" -> "ArcBetweenPoints"
+ "ArcBrace"
+ "Brace" -> "ArcBrace"
+ "ArcPolygon"
+ "VMobject" -> "ArcPolygon"
+ "ArcPolygonFromArcs"
+ "VMobject" -> "ArcPolygonFromArcs"
+ "Arrow"
+ "Line" -> "Arrow"
+ "Arrow3D"
+ "Line3D" -> "Arrow3D"
+ "ArrowCircleFilledTip"
+ "ArrowCircleTip" -> "ArrowCircleFilledTip"
+ "ArrowCircleTip"
+ "ArrowTip" -> "ArrowCircleTip"
+ "Circle" -> "ArrowCircleTip"
+ "ArrowSquareFilledTip"
+ "ArrowSquareTip" -> "ArrowSquareFilledTip"
+ "ArrowSquareTip"
+ "ArrowTip" -> "ArrowSquareTip"
+ "Square" -> "ArrowSquareTip"
+ "ArrowTip"
+ "VMobject" -> "ArrowTip"
+ "ArrowTriangleFilledTip"
+ "ArrowTriangleTip" -> "ArrowTriangleFilledTip"
+ "ArrowTriangleTip"
+ "ArrowTip" -> "ArrowTriangleTip"
+ "Triangle" -> "ArrowTriangleTip"
+ "ArrowVectorField"
+ "VectorField" -> "ArrowVectorField"
+ "Axes"
+ "VGroup" -> "Axes"
+ "CoordinateSystem" -> "Axes"
+ "BackgroundRectangle"
+ "SurroundingRectangle" -> "BackgroundRectangle"
+ "BarChart"
+ "Axes" -> "BarChart"
+ "Brace"
+ "svg_mobject.VMobjectFromSVGPath" -> "Brace"
+ "BraceBetweenPoints"
+ "Brace" -> "BraceBetweenPoints"
+ "BraceLabel"
+ "VMobject" -> "BraceLabel"
+ "BraceText"
+ "BraceLabel" -> "BraceText"
+ "BulletedList"
+ "Tex" -> "BulletedList"
+ "Circle"
+ "Arc" -> "Circle"
+ "Code"
+ "VGroup" -> "Code"
+ "ComplexPlane"
+ "NumberPlane" -> "ComplexPlane"
+ "ComplexValueTracker"
+ "ValueTracker" -> "ComplexValueTracker"
+ "Cone"
+ "Surface" -> "Cone"
+ "CoordinateSystem"
+ "Cross"
+ "VGroup" -> "Cross"
+ "Cube"
+ "VGroup" -> "Cube"
+ "CubicBezier"
+ "VMobject" -> "CubicBezier"
+ "CurvedArrow"
+ "ArcBetweenPoints" -> "CurvedArrow"
+ "CurvedDoubleArrow"
+ "CurvedArrow" -> "CurvedDoubleArrow"
+ "CurvesAsSubmobjects"
+ "VGroup" -> "CurvesAsSubmobjects"
+ "Cutout"
+ "VMobject" -> "Cutout"
+ "Cylinder"
+ "Surface" -> "Cylinder"
+ "DashedLine"
+ "Line" -> "DashedLine"
+ "DashedVMobject"
+ "VMobject" -> "DashedVMobject"
+ "DecimalMatrix"
+ "Matrix" -> "DecimalMatrix"
+ "DecimalNumber"
+ "VMobject" -> "DecimalNumber"
+ "DecimalTable"
+ "Table" -> "DecimalTable"
+ "DiGraph"
+ "GenericGraph" -> "DiGraph"
+ "Difference"
+ "Dodecahedron"
+ "Polyhedron" -> "Dodecahedron"
+ "Dot"
+ "Circle" -> "Dot"
+ "Dot3D"
+ "Sphere" -> "Dot3D"
+ "DoubleArrow"
+ "Arrow" -> "DoubleArrow"
+ "Elbow"
+ "VMobject" -> "Elbow"
+ "Ellipse"
+ "Circle" -> "Ellipse"
+ "Exclusion"
+ "FullScreenRectangle"
+ "ScreenRectangle" -> "FullScreenRectangle"
+ "FunctionGraph"
+ "ParametricFunction" -> "FunctionGraph"
+ "Generic"
+ "GenericGraph"
+ "Generic" -> "GenericGraph"
+ "Graph"
+ "GenericGraph" -> "Graph"
+ "Group"
+ "Mobject" -> "Group"
+ "Icosahedron"
+ "Polyhedron" -> "Icosahedron"
+ "ImageMobject"
+ "AbstractImageMobject" -> "ImageMobject"
+ "ImageMobjectFromCamera"
+ "AbstractImageMobject" -> "ImageMobjectFromCamera"
+ "ImplicitFunction"
+ "VMobject" -> "ImplicitFunction"
+ "Integer"
+ "DecimalNumber" -> "Integer"
+ "IntegerMatrix"
+ "Matrix" -> "IntegerMatrix"
+ "IntegerTable"
+ "Table" -> "IntegerTable"
+ "Intersection"
+ "LabeledDot"
+ "Dot" -> "LabeledDot"
+ "LayoutFunction"
+ "Protocol" -> "LayoutFunction"
+ "Line"
+ "TipableVMobject" -> "Line"
+ "Line3D"
+ "Cylinder" -> "Line3D"
+ "LinearBase"
+ "LogBase"
+ "ManimBanner"
+ "VGroup" -> "ManimBanner"
+ "MarkupText"
+ "svg_mobject.SVGMobject" -> "MarkupText"
+ "MathTable"
+ "Table" -> "MathTable"
+ "MathTex"
+ "SingleStringMathTex" -> "MathTex"
+ "Matrix"
+ "VMobject" -> "Matrix"
+ "Mobject"
+ "Mobject1D"
+ "PMobject" -> "Mobject1D"
+ "Mobject2D"
+ "PMobject" -> "Mobject2D"
+ "MobjectMatrix"
+ "Matrix" -> "MobjectMatrix"
+ "MobjectTable"
+ "Table" -> "MobjectTable"
+ "NumberLine"
+ "Line" -> "NumberLine"
+ "NumberPlane"
+ "Axes" -> "NumberPlane"
+ "Octahedron"
+ "Polyhedron" -> "Octahedron"
+ "PGroup"
+ "PMobject" -> "PGroup"
+ "PMobject"
+ "Mobject" -> "PMobject"
+ "Paragraph"
+ "VGroup" -> "Paragraph"
+ "ParametricFunction"
+ "VMobject" -> "ParametricFunction"
+ "Point"
+ "PMobject" -> "Point"
+ "PointCloudDot"
+ "Mobject1D" -> "PointCloudDot"
+ "PolarPlane"
+ "Axes" -> "PolarPlane"
+ "Polygon"
+ "Polygram" -> "Polygon"
+ "Polygram"
+ "VMobject" -> "Polygram"
+ "Polyhedron"
+ "VGroup" -> "Polyhedron"
+ "Prism"
+ "Cube" -> "Prism"
+ "Protocol"
+ "Generic" -> "Protocol"
+ "Rectangle"
+ "Polygon" -> "Rectangle"
+ "RegularPolygon"
+ "RegularPolygram" -> "RegularPolygon"
+ "RegularPolygram"
+ "Polygram" -> "RegularPolygram"
+ "RightAngle"
+ "Angle" -> "RightAngle"
+ "RoundedRectangle"
+ "Rectangle" -> "RoundedRectangle"
+ "SVGMobject"
+ "VMobject" -> "SVGMobject"
+ "SampleSpace"
+ "Rectangle" -> "SampleSpace"
+ "ScreenRectangle"
+ "Rectangle" -> "ScreenRectangle"
+ "Sector"
+ "AnnularSector" -> "Sector"
+ "SingleStringMathTex"
+ "svg_mobject.SVGMobject" -> "SingleStringMathTex"
+ "Sphere"
+ "Surface" -> "Sphere"
+ "Square"
+ "Rectangle" -> "Square"
+ "Star"
+ "Polygon" -> "Star"
+ "StealthTip"
+ "ArrowTip" -> "StealthTip"
+ "StreamLines"
+ "VectorField" -> "StreamLines"
+ "Surface"
+ "VGroup" -> "Surface"
+ "SurroundingRectangle"
+ "RoundedRectangle" -> "SurroundingRectangle"
+ "Table"
+ "VGroup" -> "Table"
+ "TangentLine"
+ "Line" -> "TangentLine"
+ "Tetrahedron"
+ "Polyhedron" -> "Tetrahedron"
+ "Tex"
+ "MathTex" -> "Tex"
+ "Text"
+ "svg_mobject.SVGMobject" -> "Text"
+ "ThreeDAxes"
+ "Axes" -> "ThreeDAxes"
+ "ThreeDVMobject"
+ "VMobject" -> "ThreeDVMobject"
+ "TipableVMobject"
+ "VMobject" -> "TipableVMobject"
+ "Title"
+ "Tex" -> "Title"
+ "Torus"
+ "Surface" -> "Torus"
+ "Triangle"
+ "RegularPolygon" -> "Triangle"
+ "Underline"
+ "Line" -> "Underline"
+ "Union"
+ "UnitInterval"
+ "NumberLine" -> "UnitInterval"
+ "VDict"
+ "VMobject" -> "VDict"
+ "VGroup"
+ "VMobject" -> "VGroup"
+ "VMobject"
+ "Mobject" -> "VMobject"
+ "VMobjectFromSVGPath"
+ "VMobject" -> "VMobjectFromSVGPath"
+ "ValueTracker"
+ "Mobject" -> "ValueTracker"
+ "Variable"
+ "VMobject" -> "Variable"
+ "Vector"
+ "Arrow" -> "Vector"
+ "VectorField"
+ "VGroup" -> "VectorField"
+ "VectorizedPoint"
+ "VMobject" -> "VectorizedPoint"
+}
+```
+
+```
+digraph Scene {
+ "LinearTransformationScene"
+ "VectorScene"
+ "VectorScene" -> "LinearTransformationScene"
+ "MovingCameraScene"
+ "Scene"
+ "Scene" -> "MovingCameraScene"
+ "RerunSceneHandler"
+ "Scene"
+ "SceneFileWriter"
+ "SpecialThreeDScene"
+ "ThreeDScene"
+ "ThreeDScene" -> "SpecialThreeDScene"
+ "ThreeDScene"
+ "Scene" -> "ThreeDScene"
+ "VectorScene"
+ "Scene" -> "VectorScene"
+ "ZoomedScene"
+ "MovingCameraScene" -> "ZoomedScene"
+}
+```"""
+
+_prompt_rag_query_generation_vision_storyboard = """You are an expert in generating search queries specifically for **Manim (Community Edition) documentation** (both core Manim and its plugins). Your task is to analyze a scene plan for a Manim animation and generate effective queries that will retrieve relevant documentation about visual elements and scene composition.
+
+Here is the scene plan:
+
+{scene_plan}
+
+Based on this scene plan, generate multiple human-like queries (maximum 10) for retrieving relevant documentation about visual elements and scene composition techniques.
+
+**Specifically, ensure that:**
+1. Queries focus on retrieving information about **visual elements** like shapes, objects, and their properties
+2. Include queries about **scene composition techniques** like layout, positioning, and grouping
+3. If the scene plan suggests using plugin functionality, include specific queries targeting those plugin's visual capabilities
+4. Queries should be high-level, aiming to discover what Manim features can be used, rather than focusing on low-level implementation details.
+ - For example, instead of "how to set the color of a circle", ask "what visual properties of shapes can I control in Manim?".
+
+The above scene plan is relevant to these plugins: {relevant_plugins}.
+Note that you MUST NOT use the plugins that are not listed above.
+
+You MUST only output the queries in the following JSON format (with json triple backticks):
+```json
+[
+ {{"type": "manim-core", "query": "content of visual element query"}},
+ {{"type": "", "query": "content of plugin-specific query"}},
+ {{"type": "manim-core", "query": "content of composition technique query"}}
+ ...
+]
+```"""
+
+_prompt_context_learning_technical_implementation = """Here are some example technical implementation plans to help guide your implementation:
+
+{examples}
+
+Please follow a similar structure while maintaining creativity and relevance to the current scene."""
+
+_prompt_detect_plugins = """You are a Manim plugin detection system. Your task is to analyze a video topic and description to determine which Manim plugins would be most relevant for the actual animation implementation needs.
+
+Topic:
+{topic}
+
+Description:
+{description}
+
+Available Plugins:
+{plugin_descriptions}
+
+Instructions:
+1. Analyze the topic and description, focusing specifically on what needs to be animated
+2. Review each plugin's capabilities and determine if they provide specific tools needed for the animations described
+3. Only select plugins that provide functionality directly needed for the core animations
+4. Consider these criteria for each plugin:
+ - Does the plugin provide specific tools or components needed for the main visual elements?
+ - Are the plugin's features necessary for implementing the core animations?
+ - Would the animation be significantly more difficult to create without this plugin?
+5. Exclude plugins that:
+ - Only relate to the general topic area but don't provide needed animation tools
+ - Might be "nice to have" but aren't essential for the core visualization
+ - Could be replaced easily with basic Manim shapes and animations
+
+Your response must follow the output format below:
+
+[brief description of your thinking process]
+
+
+```json
+["plugin_name1", "plugin_name2"]
+```
+"""
+
+_prompt_scene_animation_narration = """You are an expert in educational video production and Manim animation, skilled in creating engaging and pedagogically effective learning experiences.
+**Reminder:** This animation and narration plan is entirely self-contained; there is no dependency on any previous or subsequent scene implementations. However, the narration should flow smoothly as part of a larger, single video.
+
+Your task is to create a **detailed animation and narration plan for Scene {scene_number}**, ensuring it is not just visually appealing but also serves a clear educational purpose within the overall video topic.
+
+Remember, the narration should not simply describe what's happening visually, but rather **teach a concept step-by-step**, guiding the viewer to a deeper understanding. Animations should be spatially coherent, contribute to a clear visual flow, and strictly respect safe area margins (0.5 units) and minimum spacing (0.3 units). **Consider the scene number {scene_number} and the overall scene context to ensure smooth transitions and a logical flow within the larger video narrative.**
+
+Topic: {topic}
+Description: {description}
+
+Scene Overview:
+{scene_outline}
+
+Scene Vision and Storyboard:
+{scene_vision_storyboard}
+
+Technical Implementation Plan:
+{technical_implementation_plan}
+
+The following manim plugins are relevant to the scene:
+{relevant_plugins}
+
+**Spatial Constraints (Strictly Enforced Throughout Animations):**
+* **Safe area margins:** 0.5 units. *Maintain objects and VGroups within margins.*
+* **Minimum spacing:** 0.3 units. *Ensure minimum spacing between all objects and VGroups.*
+
+**Animation Timing and Pacing Requirements:**
+* Specify `run_time` for all animations.
+* Use `Wait()` for transition buffers, specifying durations and **pedagogical purpose**.
+* Coordinate animation timings with narration cues for synchronized pedagogical presentation.
+
+**Visual Flow and Pedagogical Clarity:**
+* Ensure animations create a clear and logical visual flow, **optimized for learning and concept understanding.**
+* Use animation pacing and transition buffers to visually separate ideas and **enhance pedagogical clarity.**
+* Maintain spatial coherence for predictable and understandable animations, strictly adhering to spatial constraints.
+
+**Diagrams/Sketches (Optional but Highly Recommended for Complex Scenes):**
+* For complex animations, include diagrams/sketches to visualize animation flow and object movements. This aids clarity and reduces errors.
+
+Your plan must demonstrate a strong understanding of pedagogical narration and how animations can be used to effectively teach concepts, while strictly adhering to spatial constraints and timing requirements.
+
+You MUST generate a **detailed and comprehensive** animation and narration plan for **Scene {scene_number}**, in the following format, similar to the example provided (from ```xml to ```):
+
+```xml
+
+
+[ANIMATION_STRATEGY]
+1. **Pedagogical Animation Plan:** Provide a detailed plan for all animations in the scene, explicitly focusing on how each animation contributes to **teaching the core concepts** of this scene.
+ - **Parent VGroup transitions (if applicable):**
+ - If VGroups are used, specify transitions (`Shift`, `Transform`, `FadeIn`, `FadeOut`) with `Animation` type, direction, magnitude, target VGroup, and `run_time`.
+ - **Explain the pedagogical rationale** for each VGroup transition. How does it guide the viewer's attention or contribute to understanding the scene's learning objectives? Ensure spatial coherence and respect for constraints.
+ - **Element animations within VGroups and for individual Mobjects:**
+ - Specify animation types (`Create`, `Write`, `FadeIn`, `Transform`, `Circumscribe`, `AnimationGroup`, `Succession`) for elements.
+ - For each element animation, specify `Animation` type, target object(s), and `run_time`. Detail sequences and timing for `AnimationGroup` or `Succession`.
+ - **Explain the pedagogical purpose** of each element animation. How does it break down complex information, highlight key details, or improve visual clarity for learning? Ensure spatial coherence and minimum spacing.
+ - **Coordinate element animations with VGroup transitions:**
+ - Clearly describe the synchronization between element animations and VGroup transitions (if any).
+ - Specify relative timing and `run_time` to illustrate coordination.
+ - **Explain how this animation sequence and coordination creates a pedagogical flow**, guiding the viewer's eye and attention logically through the learning material.
+
+2. **Scene Flow - Pedagogical Pacing and Clarity:** Detail the overall flow of the scene, emphasizing pedagogical effectiveness.
+ - **Overall animation sequence, spatial progression for learning:**
+ - Describe the complete animation sequence, broken down into pedagogical sub-sections (e.g., "Introducing the Problem", "Step-by-step Solution", "Concept Reinforcement").
+ - Outline the spatial progression of objects and VGroups, focusing on how it supports the **pedagogical narrative** and concept development.
+ - Ensure a clear and logical visual flow optimized for learning, respecting spatial constraints.
+ - **Transition buffers for pedagogical pauses:**
+ - Specify `Wait()` times between animation sections for visual separation and **learner processing time**.
+ - For each `Wait()`, specify duration and **explain the pedagogical reason** for this buffer (e.g., "Allow viewers time to process the formula", "Create a pause for reflection before moving to the next concept").
+ - **Coordinate animation timing with narration for engagement and comprehension:**
+ - Describe how animation timings are coordinated with the narration script to **maximize viewer engagement and comprehension**.
+ - Specify animation cues within the narration script and explain how these cues are synchronized with animations to **reinforce learning points** at the optimal moment.
+
+[NARRATION]
+- **Pedagogical Narration Script:**
+ - Provide the full narration script for Scene {scene_number}.
+ - **Embed precise animation timing cues** within the narration script (as described before).
+ - **The script should be written as if delivered by a knowledgeable and engaging lecturer.** It should:
+ - **Clearly explain concepts step-by-step.**
+ - **Use analogies and real-world examples to enhance understanding.**
+ - **Pose questions to encourage active thinking.**
+ - **Summarize key points and transitions.**
+ - **Be detailed and knowledge-rich, not just visually descriptive.**
+ - **Connect smoothly with the previous and subsequent scenes, acting as a segment within a single, cohesive video.
+ - Avoid repetitive introductions or conclusions.**
+ - Consider using phrases like "Building on what we saw in the previous part..." or "Let's now move on to..." to create a sense of continuity.
+ - Reference the scene number when appropriate (e.g., "Now, let's explore...").
+ - **Crucially, the narration should seamlessly integrate with the animations to create a cohesive and effective learning experience.**
+- **Narration Sync - Pedagogical Alignment:**
+ - Detail the synchronization strategy between narration and animations, emphasizing **pedagogical alignment**.
+ - Explain how narration timing is aligned with animation start/end times to **guide viewer attention to key learning elements precisely when they animate.**
+ - Emphasize how narration cues and animation timings work together to **create a synchronized audiovisual presentation that maximizes learning and retention.**
+
+
+```
+"""
+
+_code_color_cheatsheet = """MUST include the following color definitions if you use the colors in your code. ONLY USE THE COLORS BELOW.
+
+WHITE = '#FFFFFF'
+RED = '#FF0000'
+GREEN = '#00FF00'
+BLUE = '#0000FF'
+YELLOW = '#FFFF00'
+CYAN = '#00FFFF'
+MAGENTA = '#FF00FF'
+ORANGE = '#FFA500'
+PURPLE = '#800080'
+PINK = '#FFC0CB'
+BROWN = '#A52A2A'
+GRAY = '#808080'
+TEAL = '#008080'
+NAVY = '#000080'
+OLIVE = '#808000'
+MAROON = '#800000'
+LIME = '#00FF00'
+AQUA = '#00FFFF'
+FUCHSIA = '#FF00FF'
+SILVER = '#C0C0C0'
+GOLD = '#FFD700'"""
+
+_prompt_visual_self_reflection = """You are an expert in Manim animations and educational video quality assessment. Your task is to analyze a rendered Manim video and its corresponding audio narration to identify areas for visual and auditory improvement, ensuring alignment with the provided implementation plan and enhancing the video's teaching effectiveness.
+
+Please analyze the provided Manim video and listen to the accompanying audio narration. Conduct a thorough self-reflection focusing on the following aspects:
+
+**1. Visual Presentation and Clarity (Automated VLM Analysis & Expert Human-like Judgment):**
+
+* **Object Overlap:** Does the video exhibit any visual elements (text, shapes, equations, etc.) overlapping in a way that obscures information or makes the animation difficult to understand? If possible, Detect regions of significant overlap and highlight them in your reflection.
+* **Out-of-Bounds Objects:** Are any objects positioned partially or entirely outside of the visible frame of the video? Identify and report objects that appear to be clipped or outside the frame boundaries.
+* **Incorrect Object Positioning:** Based on your understanding of good visual design and the scene's educational purpose, are objects placed in positions that are illogical, distracting, or misaligned with their intended locations or relationships to other elements as described in the implementation plan? Consider:
+ * **Logical Flow:** Does the spatial arrangement support the intended visual flow and narrative progression of the scene?
+ * **Alignment and Balance:** Is the scene visually balanced? Are elements aligned in a way that is aesthetically pleasing and contributes to clarity, or does the layout appear haphazard or unbalanced?
+ * **Proximity and Grouping:** Are related elements positioned close enough to be visually grouped, and are unrelated elements sufficiently separated to avoid visual clutter?
+* **General Visual Clarity & Effectiveness:** Consider broader aspects of visual communication. Are there any other issues that detract from the video's clarity, impact, or overall effectiveness? This could include:
+ * **Visual Clutter:** Is the scene too busy or visually overwhelming at any point? Are there too many elements on screen simultaneously?
+ * **Poor Spacing/Layout:** Is the spacing between elements inconsistent or inefficient, making the scene feel cramped or unbalanced? Are margins and padding used effectively?
+ * **Ineffective Use of Color:** Are color choices distracting, clashing, or not contributing to the animation's message? Are colors used consistently and purposefully to highlight key information?
+ * **Pacing Issues (Visual):** Is the visual animation too fast or too slow in certain sections, hindering comprehension? Are visual transitions smooth and well-timed?
+ * **Animation Clarity:** Are the animations themselves clear and helpful in conveying the intended information? Do animations effectively guide the viewer's eye and focus attention?
+
+**2. Narration Quality:**
+
+* **Narration Clarity and Pacing:** Is the narration clear, concise, and easy to understand? Is the pacing of the narration appropriate for the visual content and the target audience? Does the narration effectively support the visual explanations?
+* **Narration Sync with Visuals:** Does the narration effectively synchronize with the on-screen visuals? Use VLM to analyze the video and identify instances where the narration is misaligned with the animations or visual elements it is describing. Report specific timings of misalignment.
+
+**3. Alignment with Implementation Plan:**
+
+* **Visual Fidelity:** Does the rendered video accurately reflect the visual elements and spatial arrangements described in the provided Manim Implementation Plan? Identify any deviations.
+* **Animation Fidelity:** Do the animations in the video match the animation methods and sequences outlined in the Implementation Plan? Report any discrepancies.
+
+Manim Implementation Plan:
+{implementation}
+
+Generated Code:
+{generated_code}
+
+Output Format 1:
+If any issues are identified in visual presentation, audio quality, narration, or plan alignment, please provide a detailed reflection on the issues and how to improve the video's visual and auditory quality, narration effectiveness, and code correctness. Then, you must return the updated Python code that directly addresses these issues. The code must be complete and executable.
+
+
+[Detailed reflection on visual, auditory, narration, and plan alignment issues and improvement suggestions. Include specific timings for narration/visual sync issues and descriptions of object overlap/out-of-bounds problems if detected by VLM. Be specific about code changes needed for improvement.]
+
+
+[Improved Python Code - Complete and Executable - Directly Addressing Reflection Points]
+
+
+Output Format 2:
+If no issues are found and the video and audio are deemed high quality, visually clear, narratively effective, and fully aligned with the implementation plan, please explicitly only return "" as output."""
+
+_prompt_teaching_framework = """# Comprehensive Educational Video Content Framework
+
+## 1. Pre-Production Planning
+
+### A. Learning Objectives
+- **Knowledge Level (Remember & Understand)**
+ Define specific, measurable learning outcomes that can be clearly assessed and evaluated. These outcomes should be concrete and observable, allowing instructors to verify that learning has occurred. Each outcome should be written using precise language that leaves no ambiguity about what constitutes success. For example, \"After watching this video, learners will be able to define and explain the concept of variables in programming\" provides a clear benchmark for assessment.
+
+ Action verbs are essential tools for crafting effective learning objectives. Choose verbs like define, list, describe, explain, and identify that clearly indicate the expected cognitive processes. These verbs should align with Bloom's Taxonomy to ensure appropriate cognitive engagement. When applicable, ensure all objectives align with relevant curriculum standards to maintain educational consistency and meet institutional requirements.
+
+- **Comprehension Level (Analyze & Evaluate)**
+ Develop objectives that emphasize deeper understanding and connections between concepts. These objectives should go beyond simple recall to require analysis and evaluation of the material. Students should be able to make meaningful connections between different aspects of the content and explain their relationships. For example, \"Learners will be able to compare different data types and explain when to use each\" demonstrates this deeper level of understanding.
+
+ Critical thinking elements should be deliberately incorporated into each objective. Create scenarios that challenge students to apply their knowledge in new contexts. These scenarios should require careful analysis and reasoned decision-making to solve problems effectively. Design learning experiences that encourage students to question assumptions and develop analytical skills.
+
+- **Application Level (Apply & Create)**
+ Develop practical skills that directly translate to real-world applications and scenarios. These objectives should focus on hands-on experience and tangible outcomes that demonstrate mastery. For example, \"Learners will be able to write a basic program using variables and proper naming conventions\" provides a clear, actionable goal that can be demonstrated through practical work.
+
+ Include hands-on exercises that allow students to practice and refine their skills in a supported environment. These exercises should gradually increase in complexity to build confidence and competence. Provide real-world context by incorporating authentic scenarios and problems that students might encounter in their future careers or daily lives. This connection to reality helps maintain engagement and demonstrates the immediate value of the learning.
+
+- **Target Audience Analysis**
+ Conduct thorough demographic research to understand your learners' backgrounds, ages, and educational levels. This analysis should include assessment of prior knowledge and experience with the subject matter. Consider the technical capabilities of your audience, including their access to necessary tools and technologies.
+
+ Evaluate different learning preferences and styles within your target audience. This understanding helps in designing varied content that appeals to visual, auditory, and kinesthetic learners. Consider cultural and linguistic factors that might impact learning effectiveness. Create content that is inclusive and accessible to learners from diverse backgrounds. Account for varying levels of technical proficiency and ensure your content can be accessed across different devices and platforms.
+
+### B. Content Structure
+
+- **Hook (5-10% of duration)**
+ Begin each video with a compelling problem or scenario that immediately captures attention and creates interest. This hook should be relevant to the content while being unexpected or intriguing enough to maintain viewer engagement. Use surprising facts or statistics that challenge common assumptions or demonstrate the importance of the topic.
+
+ Share relevant real-world applications that demonstrate immediate value to the learner. For example, \"What if you could automate your daily tasks with just a few lines of code?\" creates immediate interest by connecting to practical benefits. The hook should create an emotional connection and generate curiosity about the upcoming content. Consider using storytelling elements or real-world problems that your audience can relate to.
+
+- **Context (10-15%)**
+ Provide clear explanations of how the content relates to real-world situations and problems. This context should help learners understand why the material is relevant to their lives or career goals. Make explicit connections to previous knowledge and experiences that learners can build upon.
+
+ Address the fundamental question of \"Why should I learn this?\" by demonstrating practical applications and benefits. This explanation should be concrete and specific to your audience's needs and interests. Set clear expectations for learning outcomes so students understand what they will gain from the content. Provide a roadmap for the learning journey ahead, including how this content connects to future topics and skills.
+
+- **Core Content (60-70%)**
+ Organize material in a logical progression that builds from fundamental concepts to more complex applications. This progression should be carefully planned to avoid overwhelming learners while maintaining engagement. Include multiple examples that demonstrate concepts from different angles and perspectives.
+
+ Use varied teaching methods to accommodate different learning styles and maintain interest. These methods might include demonstrations, animations, code examples, and interactive elements. Implement frequent knowledge checks throughout the content to ensure understanding and maintain engagement. Break complex topics into manageable chunks that can be easily processed and remembered.
+
+- **Practice/Application (10-15%)**
+ Create guided practice opportunities that allow learners to apply new knowledge in a supported environment. These practice sessions should include clear instructions and immediate feedback mechanisms. Design interactive elements that engage learners and require active participation rather than passive viewing.
+
+ Develop problem-solving scenarios that challenge learners to apply concepts in realistic situations. These scenarios should gradually increase in complexity as learners gain confidence. Include opportunities for peer learning and collaboration when possible. Provide scaffolded support that can be gradually removed as learners become more proficient.
+
+- **Summary (5-10%)**
+ Conclude each video with a comprehensive recap of key points and main takeaways. This summary should reinforce the most important concepts and their practical applications. Preview upcoming topics to create anticipation and show how current learning connects to future content.
+
+ Provide specific action items that learners can implement immediately to reinforce their learning. These should be concrete, achievable tasks that build confidence and competence. Share additional resources for further learning, including reference materials, practice exercises, and advanced topics. Create clear connections between the current content and future learning objectives.
+
+## 2. Instructional Design Elements
+
+### A. Cognitive Load Management
+
+- **Chunking Strategies**
+ Break complex content into manageable segments of 3-5 minutes each. These chunks should focus on single concepts or closely related ideas that form a coherent unit. Use clear transitions between segments to maintain flow while allowing for cognitive processing.
+
+ Implement progressive complexity by building from basic concepts to more advanced applications. This progression should be carefully planned to avoid overwhelming learners. Include strategic pauses and processing time between segments to allow for reflection and integration of new information. Use visual and verbal cues to signal transitions between different concepts or levels of complexity.
+
+- **Visual Organization**
+ Develop a consistent visual hierarchy that guides learners through the content effectively. This hierarchy should use size, color, and placement to indicate the relative importance of different elements. Implement clean, uncluttered designs that minimize distractions and focus attention on key concepts.
+
+ Apply color coding consistently to help learners identify and remember related concepts. This coding should be intentional and meaningful, not merely decorative. Use white space effectively to create visual breathing room and help separate different concepts. Ensure that visual elements support rather than compete with the learning objectives.
+
+- **Information Processing**
+ Carefully limit the introduction of new concepts to 5-7 per video to prevent cognitive overload. This limitation helps ensure that learners can effectively process and retain the information presented. Develop and use mnemonics and memory aids that help learners organize and remember key concepts.
+
+ Provide visual anchors that learners can reference throughout the content. These anchors should help maintain context and show relationships between concepts. Include strategic review points that reinforce previous learning before introducing new material. Create clear connections between new information and existing knowledge to facilitate better retention.
+
+### B. Engagement Techniques
+
+- **Storytelling Elements**
+ Develop a clear narrative flow that carries learners through the content naturally. This narrative should have a beginning, middle, and end that maintains interest and supports learning objectives. Use character-driven examples that learners can relate to and remember.
+
+ Include elements of conflict and resolution to create tension and maintain engagement. These elements should be relevant to the learning objectives and help illustrate key concepts. Maintain an emotional connection through relatable scenarios and authentic problems. Create story arcs that span multiple videos or modules to maintain long-term engagement.
+
+- **Visual Support**
+ Create relevant graphics and animations that enhance understanding of key concepts. These visual elements should be purposeful and directly support learning objectives, not merely decorative. Implement a consistent visual style across all content to maintain professionalism and reduce cognitive load.
+
+ Develop clear infographics that break down complex concepts into understandable components. These should use visual hierarchy and design principles effectively. Use motion and animation thoughtfully to direct attention to important elements and demonstrate processes. Ensure all visual elements are accessible and effectively communicate their intended message.
+
+- **Interactive Components**
+ Design and embed quiz questions that check understanding at key points in the content. These questions should be strategically placed to maintain engagement and reinforce learning. Include deliberate pause points that encourage reflection and active processing of information.
+
+ Create coding challenges or practical exercises that allow immediate application of concepts. These should be scaffolded appropriately for the learner's skill level. Provide multiple opportunities for feedback, both automated and instructor-guided when possible. Design interactive elements that encourage experimentation and learning from mistakes.
+
+## 3. Content Delivery Framework
+
+### A. Teaching Sequence
+
+1. **Activate**
+ Begin each learning session by connecting to familiar concepts that students already understand. This activation of prior knowledge creates a foundation for new learning and helps students feel confident. Use carefully chosen analogies and metaphors that bridge the gap between known and new concepts. These comparisons should be relevant to your audience's experience and background.
+
+ Create explicit connections to previous learning modules or related concepts. These connections help students build a coherent mental model of the subject matter. Assess prior knowledge through quick activities or questions that reveal students' current understanding. Use this assessment to adjust your teaching approach and address any misconceptions early in the lesson.
+
+2. **Present**
+ Deliver clear, structured explanations of new concepts that build upon activated knowledge. These explanations should use precise language while remaining accessible to your target audience. Employ multiple representation methods, including verbal explanations, visual diagrams, and interactive demonstrations. This variety helps accommodate different learning styles and reinforces understanding.
+
+ Provide step-by-step demonstrations that break complex processes into manageable parts. Each step should be clearly explained and connected to the overall objective. Include real-world examples that illustrate practical applications of the concepts. These examples should be relevant to your audience's interests and career goals.
+
+3. **Guide**
+ Develop worked examples that demonstrate expert problem-solving processes and thinking strategies. These examples should include explicit explanations of decision-making and common pitfalls to avoid. Share expert thinking processes by \"thinking aloud\" through problem-solving steps. This transparency helps students understand the metacognitive aspects of learning.
+
+ Create scaffolded learning experiences that gradually reduce support as students gain confidence. Begin with highly structured guidance and progressively move toward independent work. Address common misconceptions and errors proactively, explaining why they occur and how to avoid them. Provide clear strategies for troubleshooting and problem-solving.
+
+4. **Practice**
+ Design guided exercises that allow students to apply new knowledge with appropriate support. These exercises should be carefully sequenced to build confidence and competence gradually. Include opportunities for independent practice that reinforce learning and build autonomy. Ensure these practice sessions are aligned with learning objectives and provide clear success criteria.
+
+ Create peer learning opportunities that allow students to learn from and teach others. These interactions can reinforce understanding and develop communication skills. Implement immediate feedback mechanisms that help students understand their progress and areas for improvement. This feedback should be specific, constructive, and actionable.
+
+5. **Apply**
+ Develop real-world projects that require students to integrate and apply their learning in authentic contexts. These projects should be challenging but achievable, with clear connections to practical applications. Create case studies that illustrate complex scenarios and require critical thinking and problem-solving skills. These studies should reflect realistic situations students might encounter in their careers.
+
+ Design problem-solving scenarios that encourage creative application of knowledge and skills. These scenarios should have multiple possible solutions to encourage innovative thinking. Provide opportunities for creative applications that allow students to extend their learning in personally meaningful ways. Support experimentation and risk-taking in a safe learning environment.
+
+### B. Presentation Techniques
+
+- **Transitions**
+ Implement clear verbal cues that signal shifts between concepts or activities. These cues help students maintain orientation and prepare for new information. Design visual transition elements that support cognitive processing and maintain engagement. These elements should be consistent throughout your content to establish familiar patterns.
+
+ Create concept maps that show relationships between different topics and ideas. These maps help students understand how current learning connects to broader concepts. Use progress indicators that help students track their advancement through the material. These indicators should provide a sense of accomplishment and motivation.
+
+- **Multiple Representations**
+ Combine text and graphics effectively to convey information through multiple channels. This combination should be purposeful and coordinated to enhance understanding. Integrate audio and visual elements that complement each other and reinforce key concepts. Ensure these elements work together without creating cognitive overload.
+
+ Develop interactive elements that encourage active engagement with the content. These elements should provide immediate feedback and support learning objectives. Include physical demonstrations when appropriate to illustrate concepts in tangible ways. These demonstrations should be clear, visible, and directly relevant to learning goals.
+
+## 4. Assessment Integration
+
+### A. Knowledge Verification
+- **Formative Assessment**
+ Implement regular quick checks for understanding throughout the learning process. These checks should be low-stakes and provide immediate feedback to both learner and instructor. Design self-assessment prompts that encourage students to reflect on their own learning progress. These prompts should help students develop metacognitive skills and self-awareness.
+
+ Create opportunities for peer discussion and feedback that deepen understanding through explanation and debate. These discussions should be structured to ensure productive exchanges and learning outcomes. Develop reflection questions that help students connect new learning to existing knowledge and future applications. These questions should promote deep thinking and personal connection to the material.
+
+- **Summative Assessment**
+ Design project-based assessments that evaluate comprehensive understanding and practical application. These projects should integrate multiple concepts and skills learned throughout the course. Guide students in developing portfolios that demonstrate their learning journey and achievements. These portfolios should include examples of both process and product.
+
+ Create opportunities for skill demonstration that allow students to show mastery in authentic contexts. These demonstrations should reflect real-world applications and standards. Develop knowledge application assessments that require students to transfer learning to new situations. These assessments should evaluate both understanding and adaptability.
+
+### B. Learning Reinforcement
+- **Review Strategies**
+ Implement spaced repetition techniques that optimize long-term retention of information. This approach should strategically revisit concepts at increasing intervals. Create concept mapping exercises that help students visualize and understand relationships between ideas. These maps should become increasingly complex as understanding develops.
+
+ Guide students in knowledge synthesis activities that combine multiple concepts into coherent understanding. These activities should help students see the bigger picture and make meaningful connections. Design application scenarios that require students to apply knowledge in new and challenging contexts. These scenarios should build confidence and demonstrate practical relevance.
+
+## 5. Technical Considerations
+
+### A. Video Production Elements
+- **Duration Guidelines**
+ Optimize video length to maintain engagement while effectively covering necessary content. The ideal duration of 6-12 minutes balances attention span with comprehensive coverage. Implement concept-based segmentation that breaks longer topics into digestible chunks. This segmentation should follow natural breaking points in the material.
+
+ Consider attention span patterns when planning content structure and pacing. Include variety and interaction to maintain engagement throughout longer sessions. Adapt content length to platform-specific requirements and viewing habits. Consider mobile viewing habits and platform limitations in your planning.
+
+- **Quality Standards**
+ Ensure professional audio quality through proper equipment and recording techniques. This includes clear voice recording, minimal background noise, and appropriate volume levels. Maintain consistent lighting that enhances visibility and reduces viewer fatigue. Pay attention to both subject lighting and screen content visibility.
+
+ Create clear visual presentations that effectively communicate key concepts. This includes appropriate font sizes, color contrast, and visual hierarchy. Maintain appropriate pacing that allows for processing time while maintaining engagement. Consider your audience's needs and learning objectives when determining pace.
+
+### B. Accessibility Features
+- **Universal Design**
+ Create content that accommodates multiple learning modalities and preferences. This includes providing information through visual, auditory, and interactive channels. Ensure screen reader compatibility by following accessibility best practices and standards. This includes proper heading structure and alt text for images.
+
+ Implement appropriate color contrast considerations for all visual elements. This ensures content is accessible to viewers with various visual abilities. Provide alternative text descriptions for all important images and graphics. These descriptions should convey the same information as the visual elements.
+
+## 6. Follow-up Resources
+
+### A. Supporting Materials
+- **Resource Types**
+ Develop comprehensive practice exercises that reinforce learning and build confidence. These exercises should range from basic to advanced, accommodating different skill levels. Create well-documented code samples that demonstrate best practices and common patterns. These samples should include comments explaining key concepts and decisions.
+
+ Compile detailed reference guides that support independent learning and problem-solving. These guides should be easily searchable and regularly updated. Design cheat sheets that provide quick access to essential information and common procedures. These should be concise while including all crucial information.
+
+### B. Implementation Guide
+- **Learning Pathways**
+ Create clear prerequisite maps that show relationships between different topics and skills. This mapping helps students understand learning dependencies and plan their progress. Provide advanced topic suggestions that help motivated learners extend their knowledge. These suggestions should include resources and guidance for self-directed learning.
+
+ Develop skill progression guides that show clear paths from beginner to advanced levels. These guides should include milestones and checkpoints for measuring progress. Suggest project ideas that allow practical application of learned skills. These projects should be scalable to different skill levels and interests."""
+
+_prompt_fix_error = """You are an expert Manim developer specializing in debugging and error resolution. Based on the provided implementation plan and Manim code, analyze the error message to provide a comprehensive fix and explanation.
+
+Implementation Plan of the Scene:
+{implementation_plan}
+
+Manim Code:
+```python
+{manim_code}
+```
+
+Error Message:
+{error_message}
+
+Requirements:
+1. Provide complete error analysis with specific line numbers where possible.
+2. Include exact instructions for every code change.
+3. Explain why the error occurred in plain language.
+4. If external assets (e.g., images, audio, video) are referenced, remove them.
+5. **If voiceover is present in the original code, ensure it remains preserved in the corrected code.**
+6. Preserve all original code that is not causing the reported error. Do not remove or alter any intentional elements unnecessarily.
+7. Follow best practices for code clarity and the current Manim version.
+
+You MUST only output the following format (from to ). You MUST NOT come up with any other format like JSON.
+
+
+Error Type: [Syntax/Runtime/Logic/Other]
+Error Location: [File/Line number/Component]
+Root Cause: [Brief explanation of what caused the error]
+Impact: [What functionality is affected]
+Solution:
+[FIXES_REQUIRED]
+- Fix 1: [Description]
+ - Location: [Where to apply]
+ - Change: [What to modify]
+- Fix 2: [If applicable]
+...
+
+
+```python
+# Complete corrected and fully implemented Python code
+# Include all necessary imports, definitions, and any additional code for the script to run successfully
+```
+"""
+
+_prompt_animation_simple = """Given a topic and the context, you need to explain the topic by text.
+
+Also generate a Manim script that visually illustrates a key aspect of {topic} without including explanatory text in the animation itself.
+Your text can mention the animation, but it should not be the main focus.
+Context about the topic {topic}: {description}.
+
+The animation should focus on:
+* Illustrating the significant part of the theorem or concept – Use geometric figures, graphs, number lines, or any relevant visualization.
+* Providing an intuitive example – Instead of proving the theorem, show a concrete example or transformation that visually supports understanding.
+* Separately, provide a written explanation of the theorem as text that can be displayed outside the animation.
+
+Ensure that:
+
+* The animation is concise.
+* The Manim code is compatible with the latest version of community manim.
+* The visual elements are clear and enhance understanding.
+
+Please provide the only output as:
+
+1. A text explanation of the theorem.
+2. A complete Manim script that generates the animation. Only give the code.
+
+Output format:
+
+(Text Explanation Output)
+--- (split by ---)
+(Manim Code Output)
+
+Please do not include any other text or headers in your output.
+Only use one --- to split the text explanation and the Manim code."""
+
+_prompt_animation_rag_query_generation_fix_error = """You are an expert in Manim (Community Edition) and its plugins. Your task is to transform a complete implementation plan for a Manim animation scene into queries that can be used to retrieve relevant documentation from both Manim core and any relevant plugins. The implementation plan will describe the scene's vision, technical implementation, and animation strategy.
+
+Here is the Text Explanation (Implementation Plan) as the context:
+
+{text_explanation}
+
+The error message will describe a problem encountered while running Manim code. Your queries should include keywords related to the specific Manim classes, methods, functions, and *concepts* that are likely related to the error, including any plugin-specific functionality. Focus on extracting the core concepts, actions, and vocabulary from the error message itself and the code snippet that produced the error. Generate queries that are concise and target different aspects of the documentation (class reference, method usage, animation examples, conceptual explanations) across both Manim core and relevant plugins.
+
+Here is the error message and the code snippet:
+
+**Error Message:**
+{error}
+
+**Code Snippet:**
+{code}
+
+Based on the error message and the code snippet, generate multiple human-like queries (maximum 5-7) for retrieving relevant documentation to fix this error. Please ensure that the search targets are different so that the RAG can retrieve a diverse set of documents covering various aspects of the error and its potential solutions.
+
+**Specifically, ensure that:**
+1. At least 1-2 queries are focused on retrieving information about Manim *function or class usage* that might be causing the error.
+2. If the error message or code suggests the use of plugin functionality, include at least 1 query specifically targeting plugin documentation related to the error.
+3. Queries should be specific enough to distinguish between core Manim and plugin functionality when relevant.
+
+Output the queries in the following format:
+[
+ {{"query": "content of query 1", "type": "manim_core/name_of_the_plugin"}},
+ {{"query": "content of query 2", "type": "manim_core/name_of_the_plugin"}},
+ {{"query": "content of query 3", "type": "manim_core/name_of_the_plugin"}},
+ {{"query": "content of query 4", "type": "manim_core/name_of_the_plugin"}},
+ {{"query": "content of query 5", "type": "manim_core/name_of_the_plugin"}},
+ {{"query": "content of query 6", "type": "manim_core/name_of_the_plugin"}},
+ {{"query": "content of query 7", "type": "manim_core/name_of_the_plugin"}},
+] """
+
+_prompt_animation_fix_error = """You are an expert Manim developer specializing in debugging and error resolution. Analyze the provided code and error message to provide a comprehensive fix and explanation.
+
+
+Text Explanation:
+{text_explanation}
+
+Manim Code Animation to complement the Text Explanation:
+```python
+{manim_code}
+```
+
+Error Message on code running:
+{error_message}
+
+
+You MUST only output the following format (make sure to include the ```python and ``` in the code):
+
+
+Error Type: [Syntax/Runtime/Logic/Other]
+Error Location: [File/Line number/Component]
+Root Cause: [Brief explanation of what caused the error]
+Impact: [What functionality is affected]
+
+
+
+[FIXES_REQUIRED]
+- Fix 1: [Description]
+ - Location: [Where to apply]
+ - Change: [What to modify]
+- Fix 2: [If applicable]
+ ...
+
+[CORRECTED_CODE]
+```python
+# Complete corrected and fully implemented code, don't be lazy
+# Include all necessary imports, definitions, and any additional code for the script to run successfully
+```
+
+
+
+Requirements:
+1. Provide complete error analysis with specific line numbers where possible.
+2. Include exact instructions for every code change.
+3. Ensure that the [CORRECTED_CODE] section contains complete, executable Python code (not just code snippets). Do not assume context from the prompt.
+4. Explain why the error occurred in plain language.
+5. Include verification steps to confirm the error is resolved.
+6. Suggest preventive measures for avoiding similar errors in the future.
+7. If external assets (e.g., images, audio, video) are referenced, remove them.
+8. Preserve all original code that is not causing the reported error. Do not remove or alter any intentional elements unnecessarily.
+9. Follow best practices for code clarity and the current Manim version."""
+
+_prompt_scene_technical_implementation = """You are an expert in educational video production and Manim (Community Edition), adept at translating pedagogical narration plans into robust and spatially accurate Manim code.
+**Reminder:** This technical implementation plan is fully self-contained. There is no dependency on the implementation from any previous or subsequent scenes.
+
+Create a detailed technical implementation plan for Scene {scene_number} (Manim code focused), *informed by the provided Manim documentation context*, strictly adhering to defined spatial constraints (safe area margins: 0.5 units, minimum spacing: 0.3 units), and **addressing potential text bounding box overflow issues**.
+
+Topic: {topic}
+Description: {description}
+
+Scene Overview:
+{scene_outline}
+
+Scene Vision and Storyboard:
+{scene_vision_storyboard}
+
+The following manim plugins are relevant to the scene:
+{relevant_plugins}
+
+**Spatial Constraints (Strictly Enforced):**
+* **Safe area margins:** 0.5 units on all sides from the scene edges. All objects must be positioned within these margins.
+* **Minimum spacing:** 0.3 units between any two Manim objects (measured edge to edge). This prevents overlaps and maintains visual clarity.
+
+**Positioning Requirements:**
+1. All positioning MUST be relative (`next_to`, `align_to`, `shift`) from ORIGIN, safe margins, or other objects. **No absolute coordinates are allowed.**
+2. Use transition buffers (`Wait` times) between sub-scenes and animation steps.
+
+**Diagrams/Sketches (Highly Recommended):**
+* Include diagrams/sketches (even text-based) for complex layouts to visualize spatial relationships, improve clarity, and reduce spatial errors.
+
+**Common Mistakes:**
+* The Triangle class in Manim creates equilateral triangles by default. To create a right-angled triangle, use the Polygon class instead.
+
+**Manim Plugins:**
+* You may use established, well-documented Manim plugins if they offer significant advantages in terms of code clarity, efficiency, or functionality not readily available in core Manim.
+* **If a plugin is used:**
+ * Clearly state the plugin name and version (if applicable).
+ * Provide a brief justification for using the plugin (e.g., "Using `manim-plugin-name` for its advanced graph layout capabilities").
+ * Ensure all plugin usage adheres to the plugin's documentation.
+ * Include a comment in the plan: `### Plugin: - `.
+
+**Focus:**
+* Creating *pedagogically sound and spatially correct Manim code*.
+* Detailed technical descriptions, referencing Manim documentation.
+* Strict adherence to spatial constraints and relative positioning.
+
+You MUST generate the technical implementation plan for the scene in the following format (from ```xml to ```):
+
+```xml
+
+0. **Dependencies**:
+ - **Manim API Version**: Target the latest stable Manim release, using only documented API elements.
+ - **Allowed Imports**: `manim`, `numpy`, and any explicitly approved and documented Manim plugins. No external assets (e.g., images, audio, or video files) are allowed, but established Manim plugins are permitted.
+
+1. **Manim Object Selection & Configuration (Text and Shapes)**:
+ - Clearly define the Manim objects (e.g., `Tex`, `MathTex`, `Circle`, `Line`, etc.) used to construct the scene. Also include any objects provided by used plugins.
+ - Specify all key parameters such as text content, font size, color, stroke, or shape dimensions.
+ - **Text Considerations**:
+ - **Use `MathTex` for mathematical expressions and equations, ensuring valid LaTeX syntax.** For example: `MathTex("x^2 + y^2 = r^2")`.
+ - **Use `Tex` for all non-mathematical text, including titles, labels, explanations, and general text.** For example: `Tex("This is a circle")`.
+ - **If you need to include regular text *within* a `MathTex` environment (e.g., for explanations alongside a formula), use the `\\text{{}}` command.** For example: `MathTex(r"\\text{{Area of circle}} = \\pi r^2")`.
+ - **Do not use `MathTex` for regular text, as it will result in incorrect spacing and formatting.**
+ - **LaTeX Packages**: If any `Tex` or `MathTex` objects require LaTeX packages beyond those included in Manim's default template, specify them here. For example: "Requires: `\\usepackage{{amssymb}}`". Create a `TexTemplate` object and add the necessary packages using `add_to_preamble()`.
+ - **Font Size Recommendations**:
+ - If there is title text, font size is highly recommended to be 28.
+ - If there are side labels or formulas, font size is highly recommended to be 24.
+ - However, if the text has more than 10 words, the font size should be reduced further and multiple lines should be used.
+ - Confirm all objects begin within the safe area (0.5 units from all edges) and maintain at least 0.3 units spacing to avoid overlaps.
+
+2. **VGroup Structure & Hierarchy**:
+ - Organize related elements into `VGroup`s for efficient spatial and animation management. If a plugin provides a specialized group-like object, consider using it.
+ - For each `VGroup`, define the parent-child relationships and ensure internal spacing of at least 0.3 units.
+ - Clearly document the purpose for each grouping (e.g., "formula_group" for mathematical expressions).
+
+3. **Spatial Positioning Strategy**:
+ - Mandate the exclusive use of relative positioning methods (`next_to`, `align_to`, `shift`), based on ORIGIN, safe margins, or other objects.
+ - For every object, specify:
+ - The reference object (or safe edge) used for positioning.
+ - The specific method (and direction/aligned edge) along with a `buff` value (minimum 0.3 units).
+ - Outline the layout in sequential stages, inserting visual checkpoints to verify that every element continues to respect safe margins and spacing.
+ - Highlight measures to safeguard text bounding boxes, especially for multi-line text.
+ - Reference the font size recommendations under "Text Considerations" to ensure appropriate sizing and prevent overflow.
+
+4. **Animation Methods & Object Lifecycle Management**:
+ - Define clear animation sequences using documented methods such as `Create`, `Write`, `FadeIn`, `Transform`, and corresponding removal animations (`FadeOut`, `Uncreate`). Include animation methods from plugins if they are used.
+ - For each animation, specify parameters like `run_time`, `lag_ratio`, and the use of `Wait()` for transition buffers.
+ - Ensure every object's appearance and removal is managed to prevent clutter and maintain scene clarity.
+
+5. **Code Structure & Reusability**:
+ - Propose modular functions for creating and animating common objects to promote code reusability.
+ - Organize the overall code structure into logical sections: dependencies, object definitions, individual layout stages, and the main `construct` method.
+ - Include inline comments to document the rationale for configuration choices, referencing the Manim Documentation *and the plugin documentation where applicable*.
+
+***Mandatory Safety Checks***:
+ - **Safe Area Enforcement**: All objects, including text bounding boxes, must remain within 0.5 unit margins.
+ - **Minimum Spacing Validation**: Confirm a minimum of 0.3 units spacing between every pair of objects.
+ - **Transition Buffers**: Use explicit `Wait()` calls to separate animation steps and sub-scenes.
+
+```
+"""
+
+_prompt_rag_query_generation_narration = """You are an expert in generating search queries specifically for **Manim (Community Edition) documentation** (both core Manim and its plugins). Your task is to analyze a storyboard and generate effective queries that will retrieve relevant documentation about narration, text animations, and audio-visual synchronization.
+
+Here is the storyboard:
+
+{storyboard}
+
+Based on this storyboard, generate multiple human-like queries (maximum 10) for retrieving relevant documentation about narration and text animation techniques.
+
+**Specifically, ensure that:**
+1. Queries focus on retrieving information about **text animations** and their properties
+2. Include queries about **timing and synchronization** techniques
+3. If the storyboard suggests using plugin functionality, include specific queries targeting those plugin's narration capabilities
+
+The above storyboard is relevant to these plugins: {relevant_plugins}.
+Note that you MUST NOT use the plugins that are not listed above.
+
+You MUST only output the queries in the following JSON format (with json triple backticks):
+```json
+[
+ {{"type": "manim-core", "query": "content of text animation query"}},
+ {{"type": "", "query": "content of plugin-specific query"}},
+ {{"type": "manim-core", "query": "content of timing synchronization query"}}
+ ...
+]
+```"""
+
+_prompt_context_learning_animation_narration = """Here are some example animation and narration plans to help guide your planning:
+
+{examples}
+
+Please follow a similar structure while maintaining creativity and relevance to the current scene."""
+
+_prompt_scene_implementation = """You are an expert in educational video production and Manim (Community Edition) animation development. Your task is to create a detailed implementation plan for Scene {scene_number}.
+
+
+Topic: {topic}
+Description: {description}
+
+
+
+Scene Overview:
+{scene_outline}
+
+
+
+
+[SCENE_VISION]
+1. **Overall Narrative**:
+ - Describe the overall story or message of the scene. What is the key takeaway for the viewer?
+ - How does this scene fit into the larger narrative of the video?
+ - What is the desired emotional impact on the viewer?
+
+2. **Learning Objectives**:
+ - What specific knowledge or skills should the viewer gain from this scene?
+ - How will the visual elements and animations support these learning objectives?
+ - What are the key concepts that need to be emphasized?
+
+[STORYBOARD]
+1. **Visual Flow**:
+ - Describe the sequence of visual elements and animations in the scene.
+ - Provide a rough sketch or description of the key visual moments.
+ - How will the scene transition between different ideas or concepts?
+ - What is the pacing of the scene? Are there moments of pause or rapid action?
+
+[TECHNICAL_IMPLEMENTATION]
+1. **High-Level Components (VGroups)**:
+ - **Identify the main conceptual sections of the scene.** Think of this like outlining chapters in a story or sections in a presentation.
+ - **Define the purpose of each high-level component.** What should the viewer learn or understand from each section?
+ - **Describe how these components relate to each other and the overall scene flow.** How will you transition between these sections to create a cohesive narrative?
+ - **Provide a brief rationale for your choice of high-level components.** Why did you choose these specific sections?
+
+2. **VGroup Hierarchy**:
+ - **For each high-level component, define a parent VGroup.** This VGroup will act as a container for all elements within that section.
+ - **Break down each parent VGroup into nested VGroups for sub-components as needed.** Think about logical groupings of elements.
+ - **Specify the relative positioning of these VGroups within the scene using `next_to()`, `align_to()`, and `shift()` where possible.** How will the parent VGroups be arranged on the screen relative to each other? (e.g., stacked vertically, side-by-side, etc.) Prioritize relative positioning using the following references:
+ - `ORIGIN`: the center of the scene
+ - scene margins (e.g., corners, edges)
+ - other VGroups as references.
+ - **MUST NOT use absolute coordinates.**
+ - **Define the scale relationships between different levels of the VGroup hierarchy.** Will sub-VGroups inherit scale from parent VGroups? How will scaling be managed to maintain visual consistency?
+ - **Provide a brief rationale for your VGroup hierarchy.** Why did you choose this specific structure?
+
+ For each VGroup level (from high-level down to sub-components):
+ - Name: [Descriptive name for the VGroup, e.g., "TitleSection", "ProblemStatementGroup", "Explanation1Group"]
+ - Purpose: [What is the purpose of this VGroup? What should the viewer learn or understand from this VGroup?]
+ - Contents: [List all child VGroups and individual elements (Text, MathTex, Shapes, etc.) that belong to this VGroup.]
+ - Positioning:
+ * Reference: [Specify what this VGroup is positioned relative to. Do not use absolute coordinates.]
+ * Alignment: [How is it aligned relative to the reference? Use `align_to()` with options like `UP`, `DOWN`, `LEFT`, `RIGHT`, `ORIGIN`, etc.]
+ * Spacing: [Describe any spacing considerations relative to sibling VGroups or elements within the parent. Use `buff` argument in `next_to()` or `arrange()`. Refer to the defined minimum spacing value.]
+ - Scale: [Specify the scale of this VGroup relative to its parent VGroup. Use relative scaling factors (e.g., 1.0 for same scale, 0.8 for smaller).]
+ - Rationale: [Explain the reasoning behind the structure and organization of this VGroup. Why did you group these elements together?]
+
+3. **Element Specification**:
+ For each individual element (Text, MathTex, Shapes, etc.) within a VGroup:
+ - Name: [Descriptive name for the element, e.g., "ProblemTitleText", "Equation1", "HighlightCircle"]
+ - Type: [Manim object type. Examples: Text, MathTex, Circle, Rectangle, Arrow, Line, etc.]
+ - Parent VGroup: [Specify the VGroup this element belongs to. This establishes the hierarchical relationship.]
+ - Positioning:
+ * Reference: [Specify what this element is positioned relative to. Use its parent VGroup, other elements, `ORIGIN`, or scene margins as references. Do not use absolute coordinates.]
+ * Alignment: [How is it aligned within its parent VGroup? Use `align_to()` or `next_to()` with appropriate directions, e.g. `UP`, `DOWN`, `LEFT`, `RIGHT`, `ORIGIN`, `UL`, `UR`, `DL`, `DR`]
+ * Spacing: [If applicable, describe spacing relative to other elements using `buff` in `next_to()`. Refer to the defined minimum spacing value.]
+ - Style Properties:
+ * Color: [Hex code or named color (e.g., "RED", "BLUE"). Use hex codes for specific colors. e.g., #FF0000 for red]
+ * Opacity: [Value between 0 and 1. 1 for fully opaque, 0 for fully transparent.]
+ * Stroke Width: [Specify stroke width using levels: `thin`, `medium`, or `thick`.]
+ * Font: [Font family name, if applicable.]
+ * Font Size: [Specify font size using levels: `heading1`, `heading2`, `heading3`, `heading4`, `heading5`, `heading6`, or `body`. Refer to the defined font size levels.]
+ * Fill Color: [Hex code for fill color, if applicable.]
+ * ... [Include any other relevant style properties]
+ - Z-Index: [Integer value for layering order within the VGroup. Higher values are on top.]
+ - Required Imports: [List specific Manim classes that need to be imported to create this element. e.g., `from manim import Text, Circle`]
+
+[ANIMATION_STRATEGY]
+1. **VGroup Transitions**:
+ - **Define how parent VGroups will transition onto and off of the scene, and between different sections.** Describe the movement patterns for these high-level groups. Examples: 'Slide in from left', 'Fade in and scale up', 'Move to top of screen'.
+ - **Specify the timing and coordination of VGroup transitions.** How long will each transition take? Will transitions overlap or be sequential?
+ - **Describe any transformation sequences applied to VGroups during transitions.** Will VGroups rotate, scale, or change shape during transitions?
+
+2. **Element Animations**:
+ - **Define the animations for individual elements within each VGroup.** What animations will bring each element to life? Examples: 'Write in text', 'Draw a circle', 'Highlight an equation', 'Fade in an image'.
+ - **Group related element animations using Manim's animation grouping features (e.g., `AnimationGroup`, `Succession`).** Explain how these groups will be used to create cohesive animation sequences.
+ - **Coordinate element animations with parent VGroup movements and transitions.** Ensure element animations are synchronized with the overall scene flow.
+ - **Specify the timing of element animations relative to VGroup transitions and other element animations.** Create a timeline or sequence of animations.
+
+3. **Scene Flow**:
+ - **Describe the overall animation sequence for the entire scene.** Outline the order in which VGroups and elements will be animated.
+ - **Specify transition buffers or pauses between major sections of the scene.** How much time will be left between animations for the viewer to process information?
+ - **Consider how the animation timing will coordinate with the narration (if narration is planned).** Animations should complement and reinforce the spoken content.
+
+[NARRATION]
+- **Narration Script:** [Provide the full script for the narration, including timing cues or markers for when specific animations should occur. The script should be clear, detailed, and engaging, and should align with the visual elements and animations.]
+- **Narration Sync:** [Describe how the narration should be synchronized with the animations. Specify how timing cues in the narration script will be used to trigger animations. Are there specific points where the narration and animations should be perfectly synchronized? Explain how you will achieve this synchronization.]
+
+[VIEWER_EXPERIENCE]
+1. **Cognitive Load**:
+ - How will you manage the amount of information presented at any given time?
+ - Are there any complex concepts that need to be broken down into smaller steps?
+ - How will you use visual cues to guide the viewer's attention?
+
+2. **Pacing**:
+ - Is the pacing of the scene appropriate for the content?
+ - Are there moments where the viewer needs time to pause and reflect?
+ - How will you use animation timing to control the pace of the scene?
+
+3. **Accessibility**:
+ - How will you ensure that the scene is accessible to viewers with different needs?
+ - Are there any specific considerations for color contrast or text readability?
+
+[TECHNICAL_CHECKS]
+- **VGroup boundary validation:** Ensure all elements are contained within their intended VGroup boundaries and are not overflowing unexpectedly.
+- **Hierarchy scale consistency:** Verify that scaling is applied consistently throughout the VGroup hierarchy and that text and elements remain readable at all scales.
+- **Animation coordination between levels:** Check that animations at different VGroup levels are coordinated and do not clash or look disjointed.
+- **Performance optimization for nested groups:** Consider the performance implications of deeply nested VGroups and optimize structure and animations for smooth playback.
+- **Text readability:** Ensure all text elements are legible in terms of size, color contrast, and positioning.
+- **Color contrast:** Verify sufficient color contrast between text and background, and between different visual elements for accessibility.
+- **Animation smoothness:** Check for any jerky or abrupt animations and refine timing and easing for smoother transitions.
+
+
+
+Requirements:
+1. All elements must stay within safe area margins
+2. Maintain minimum spacing between objects: [value] (This value is defined in the project settings)
+3. Use relative positioning when possible, leveraging `next_to()`, `align_to()`, and `shift()`. Only reference positions relative to `ORIGIN`, scene margins, or other object reference points. Do not use absolute coordinates.
+4. Include transition buffers between animations
+5. Specify z-index for overlapping elements
+6. All colors must use hex codes or named colors
+7. Define scale relative to base unit
+8. No external dependencies
+9. Currently, there are no images or other assets available locally or remotely for you to use in the scene. Only include elements that can be generated through manim.
+10. **Do not generate any code in this plan, except for illustrative examples where necessary. This plan is for outlining the scene and should not include any python code.**
+11. **The purpose of this plan is to be a detailed guide for a human to implement the scene in manim.**"""
+
+_prompt_visual_fix_error = """You are an expert in Manim animations. Your task is to ensure that the rendered animation frame (image) aligns with the intended teaching content based on the provided implementation plan.
+
+Instructions:
+Evaluate whether the object coordinates and positions in the image match the described plan and educational purpose.
+The implementation plan serves as a reference, but your primary goal is to verify that the rendered animation frame supports effective teaching.
+For example:
+* If the object is supposed to be at the top of the screen, but it is at the bottom, you need to adjust the position.
+* If the object is supposed to be at the left side but it is too far to the left, you need to adjust the position.
+* If the two objects are not supposed to be overlapped but it is overlapped, you need to adjust the positions.
+
+If adjustments are needed, provide the complete code of the adjusted version.
+If the current code is correct, return it as is.
+
+Manim Implementation Plan:
+{implementation}
+
+Generated Code:
+{generated_code}
+
+Return the complete code of the adjusted version if the code needs to be updated. If the code is correct, only return "" as output.
+"""
+
+_banned_reasonings = """evaluation cannot
+can't assist
+cannot assist
+can't provide
+cannot provide
+can't evaluate
+cannot evaluate
+cannot be evaluated
+cannot be rated
+cannot be completed
+cannot be assessed
+cannot be scored
+cannot be conducted
+unable to evaluate
+do not have the capability
+do not have the ability
+are photographs and not AI-generated
+unable to provide the evaluation"""
+
+_prompt_code_generation = """You are an expert Manim (Community Edition) developer for educational content. Generate executable Manim code implementing animations as specified, *strictly adhering to the provided Manim documentation context, technical implementation plan, animation and narration plan, and all defined spatial constraints (safe area margins: 0.5 units, minimum spacing: 0.3 units)*.
+
+Think of reusable animation components for a clean, modular, and maintainable library, *prioritizing code structure and best practices as demonstrated in the Manim documentation context*. *Throughout code generation, rigorously validate all spatial positioning and animations against the defined safe area margins and minimum spacing constraints. If any potential constraint violation is detected, generate a comment in the code highlighting the issue for manual review and correction.*
+
+Input Context:
+
+Topic: {topic}
+Description: {description}
+
+Scene Outline:
+{scene_outline}
+
+Scene Technical Implementation:
+{scene_implementation}
+
+**Code Generation Guidelines:**
+
+1. **Scene Class:** Class name `Scene{scene_number}`, where `{scene_number}` is replaced by the scene number (e.g., `Scene1`, `Scene2`). The scene class should at least inherit from `VoiceoverScene`. However, you can add more Manim Scene classes on top of VoiceoverScene for multiple inheritance if needed.
+2. **Imports:** Include ALL necessary imports explicitly at the top of the file, based on used Manim classes, functions, colors, and constants. Do not rely on implicit imports. Double-check for required modules, classes, functions, colors, and constants, *ensuring all imports are valid and consistent with the Manim Documentation*. **Include imports for any used Manim plugins.**
+3. **Speech Service:** Initialize `KokoroService()`. You MUST import like this: `from src.utils.kokoro_voiceover import KokoroService` as this is our custom voiceover service.
+4. **Reusable Animations:** Implement functions for each animation sequence to create modular and reusable code. Structure code into well-defined functions, following function definition patterns from Manim Documentation.
+5. **Voiceover:** Use `with self.voiceover(text="...")` for speech synchronization, precisely matching the narration script and animation timings from the Animation and Narration Plan.
+6. **Comments:** Add clear and concise comments for complex animations, spatial logic (positioning, arrangements), and object lifecycle management. *Use comments extensively to explain code logic, especially for spatial positioning, animation sequences, and constraint enforcement, mirroring commenting style in Manim Documentation*. **Add comments to explain the purpose and usage of any Manim plugins.**
+7. **Error Handling & Constraint Validation:** Implement basic error handling if error handling strategies are suggested or exemplified in the Manim Documentation. **Critically, during code generation, implement explicit checks to validate if each object's position and animation adheres to the safe area margins (0.5 units) and minimum spacing (0.3 units).**
+8. **Performance:** Follow Manim best practices for efficient code and rendering performance, as recommended in the Manim Documentation.
+9. **Manim Plugins:** You are allowed and encouraged to use established, well-documented Manim plugins if they simplify the code, improve efficiency, or provide functionality not readily available in core Manim.
+ * **If a plugin is used:**
+ * Include the necessary import statement at the top of the file.
+ * Add a comment indicating the plugin used and its purpose: `### Plugin: - `.
+ * Ensure all plugin usage adheres to the plugin's documentation.
+10. **No External Assets:** No external files (images, audio, video). *Use only Manim built-in elements and procedural generation, or elements provided by approved Manim plugins. No external assets are allowed*.
+11. **No Main Function:** Only scene class. No `if __name__ == "__main__":`.
+12. **Spatial Accuracy (Paramount):** Achieve accurate spatial positioning as described in the technical implementation plan, *strictly using relative positioning methods (`next_to`, `align_to`, `shift`, VGroups) and enforcing safe area margins and minimum 0.3 unit spacing, as documented in Manim Documentation Context*. *Spatial accuracy and constraint adherence are the highest priorities in code generation.*
+13. **VGroup Structure:** Implement VGroup hierarchy precisely as defined in the Technical Implementation Plan, using documented VGroup methods for object grouping and manipulation.
+14. **Spacing & Margins (Strict Enforcement):** Adhere strictly to safe area margins (0.5 units) and minimum spacing (0.3 units) requirements for *all* objects and VGroups throughout the scene and all animations. Prevent overlaps and ensure all objects stay within the safe area. *Rigorously enforce spacing and margin requirements using `buff` parameters, relative positioning, and explicit constraint validation checks during code generation, and validate against safe area guidelines from Manim Documentation Context*.
+15. **Background:** Default background (Black) is sufficient. Do not create custom color background Rectangles.
+16. **Text Color:** Do not use BLACK color for any text. Use predefined colors (BLUE_C, BLUE_D, GREEN_C, GREEN_D, GREY_A, GREY_B, GREY_C, LIGHTER_GRAY, LIGHT_GRAY, GOLD_C, GOLD_D, PURPLE_C, TEAL_C, TEAL_D, WHITE).
+17. **Default Colors:** You MUST use the provided color definitions if you use colors in your code. ONLY USE THE COLORS PREVIOUSLY DEFINED.
+18. **Animation Timings and Narration Sync:** Implement animations with precise `run_time` values and synchronize them with the narration script according to the Animation and Narration Plan. Use `Wait()` commands with specified durations for transition buffers.
+19. **Don't be lazy on code generation:** Generate full, complete code including all helper functions. Ensure that the output is comprehensive and the code is fully functional, incorporating all necessary helper methods and complete scene implementation details.
+20. **LaTeX Package Handling:** If the technical implementation plan specifies the need for additional LaTeX packages:
+ * Create a `TexTemplate` object.
+ * Use `myTemplate = TexTemplate()`
+ * Use `myTemplate.add_to_preamble(r"\\usepackage{{package_name}}")` to add the required package.
+ * Pass this template to the `Tex` or `MathTex` object: `tex = Tex(..., tex_template=myTemplate)`.
+
+**Example Code Style and Structure to Emulate:**
+
+* **Helper Classes:** Utilize helper classes (like `Scene2_Helper`) to encapsulate object creation and scene logic, promoting modularity and reusability.
+* **Stage-Based `construct` Method:** Structure the `construct` method into logical stages (e.g., Stage 1, Stage 2, Stage 3) with comments to organize the scene flow.
+* **Reusable Object Creation Functions:** Define reusable functions within helper classes for creating specific Manim objects (e.g., `create_axes`, `create_formula_tex`, `create_explanation_text`).
+* **Clear Comments and Variable Names:** Use clear, concise comments to explain code sections and logic. Employ descriptive variable names (e.g., `linear_function_formula`, `logistic_plot`) for better readability.
+* **Text Elements:** Create text elements using `Tex` or `MathTex` for formulas and explanations, styling them with `color` and `font_size` as needed.
+* **Manim Best Practices:** Follow Manim best practices, including using `VoiceoverScene`, `KokoroService`, common Manim objects, animations, relative positioning, and predefined colors.
+
+You MUST generate the Python code in the following format (from to
):
+
+```python
+from manim import *
+from manim import config as global_config
+from manim_voiceover import VoiceoverScene
+from src.utils.kokoro_voiceover import KokoroService # You MUST import like this as this is our custom voiceover service.
+
+# plugins imports, don't change the import statements
+from manim_circuit import *
+from manim_physics import *
+from manim_chemistry import *
+from manim_dsa import *
+from manim_ml import *
+
+# Helper Functions/Classes (Implement and use helper classes and functions for improved code reusability and organization)
+class Scene{scene_number}_Helper: # Example: class Scene1_Helper:
+ # Helper class containing utility functions for scene {scene_number}.
+ def __init__(self, scene):
+ self.scene = scene
+ # ... (add any necessary initializations)
+
+ # Reusable object creation functions (Implement object creation functions for modularity and reusability as per plan)
+ def get_center_of_edges(self, polygon, buff=SMALL_BUFF*3):
+ # Calculate the center points of each edge in a polygon (Triangle, Square, etc.) with an optional buffer.
+ # Get the vertices of the polygon
+ vertices = polygon.get_vertices()
+ n_vertices = len(vertices)
+ # Initialize list to store edge centers
+ coords_vertices = []
+ # Calculate center point and normal for each edge
+ for i in range(n_vertices):
+ # Get current and next vertex (wrapping around to first vertex)
+ v1 = vertices[i]
+ v2 = vertices[(i + 1) % n_vertices]
+ # Calculate edge center
+ edge_center = (v1 + v2) / 2
+ # Calculate edge vector and normalize
+ edge_vector = v2 - v1
+ edge_length = np.linalg.norm(edge_vector)
+ normal = np.array([-edge_vector[1], edge_vector[0], 0]) / edge_length
+ # Add buffer in the normal direction
+ coords_vertices.append(edge_center + normal * buff)
+
+ return coords_vertices
+
+ def create_formula_tex(self, formula_str, color):
+ # Example function to create a MathTex formula with a specified color.
+ # Check if a custom TexTemplate is needed (from the technical plan).
+ if hasattr(self.scene, 'tex_template'):
+ formula = MathTex(formula_str, color=color, tex_template=self.scene.tex_template)
+ else:
+ formula = MathTex(formula_str, color=color)
+ return formula
+
+ # ... (add more helper functions as needed for object creation and scene logic)
+
+
+class Scene{scene_number}(VoiceoverScene, MovingCameraScene): # Note: You can add more Manim Scene classes on top of current templates for multiple inheritance if needed.
+ # Reminder: This scene class is fully self-contained. There is no dependency on the implementation from previous or subsequent scenes.
+ def construct(self):
+ # Initialize speech service
+ self.set_speech_service(KokoroService())
+
+ # Instantiate helper class (as per plan)
+ helper = Scene{scene_number}_Helper(self) # Example: helper = Scene1_Helper(self)
+
+ # Check for LaTeX packages and create TexTemplate if needed.
+ # This section should be generated based on the technical implementation plan.
+ # For example, if the plan includes: "Requires: \\usepackage{{amsmath}}"
+ # Then generate:
+ #
+ # my_template = TexTemplate()
+ # my_template.add_to_preamble(r"\\usepackage{{amsmath}}")
+ # self.tex_template = my_template
+
+ # --- Stage 1: Scene Setup (adapt stage numbers and descriptions to your scene, following plan) ---
+ with self.voiceover(text="[Narration for Stage 1 - from Animation and Narration Plan]") as tracker: # Voiceover for Stage 1
+ # Object Creation using helper functions (as per plan)
+ axes = helper.create_axes() # Example: axes = helper.create_axes()
+ formula = helper.create_formula_tex("...", BLUE_C) # Example: formula = helper.create_formula_tex("...", BLUE_C)
+ explanation = helper.create_explanation_text("...") # Example: explanation = helper.create_explanation_text("...")
+
+ # Positioning objects (relative positioning, constraint validation - as per plan)
+ formula.to_corner(UL) # Example positioning
+ axes.move_to(ORIGIN) # Example positioning
+ explanation.next_to(axes, RIGHT) # Example positioning
+
+ # Animations for Stage 1 (synced with voiceover - as per plan)
+ self.play(Write(formula), Write(axes), run_time=tracker.duration) # Example animations
+ self.wait(0.5) # Transition buffer
+
+ # --- Stage 2: ... (Implement Stage 2, Stage 3, etc. in a similar modular and structured way, following plan) ---
+ with self.voiceover(text="[Narration for Stage 2 - from Animation and Narration Plan]") as tracker: # Voiceover for Stage 2
+ # ... (Object creation, positioning, and animations for Stage 2, using helper functions and constraint validation)
+ pass # Replace with actual Stage 2 code
+
+ # ... (Implement remaining stages in a similar modular and structured way, following the Animation and Narration Plan and Technical Implementation Plan, and rigorously validating spatial constraints in each stage)
+
+ self.wait(1) # Scene end transition buffer
+```
+
+
+Notes:
+The `get_center_of_edges` helper function is particularly useful for:
+1. Finding the midpoint of polygon edges for label placement
+2. Calculating offset positions for side labels that don't overlap with the polygon
+3. Creating consistent label positioning across different polygon sizes and orientations
+
+Example usage in your scene:
+```python
+def label_triangle_sides(self, triangle, labels=["a", "b", "c"]):
+ # Helper function to label triangle sides.
+ edge_centers = self.helper.get_center_of_edges(triangle)
+ labeled_sides = VGroup()
+ for center, label in zip(edge_centers, labels):
+ tex = MathTex(label).move_to(center)
+ labeled_sides.add(tex)
+ return labeled_sides
+```"""
+
+_prompt_rag_query_generation_code = """You are an expert in generating search queries specifically for **Manim (Community Edition) documentation** (both core Manim and its plugins). Your task is to transform a complete implementation plan for a Manim video scene into effective queries that will retrieve relevant information from Manim documentation. The implementation plan describes the scene's vision, storyboard, technical implementation, and animation/narration strategy.
+
+Here is the complete scene implementation plan:
+
+{implementation_plan}
+
+Based on the complete implementation plan, generate multiple human-like queries (maximum 10) for retrieving relevant documentation. Please ensure that the search targets are different so that the RAG can retrieve a diverse set of documents covering various aspects of the implementation.
+
+**Specifically, ensure that:**
+1. At least some queries are focused on retrieving information about **Manim function usage** in scenes. Frame these queries to target function definitions, usage examples, and parameter details within Manim documentation.
+2. If the implementation suggests using plugin functionality, include at least 1 query specifically targeting **plugin documentation**. Clearly mention the plugin name in these queries to focus the search.
+3. Queries should be specific enough to distinguish between core Manim and plugin functionality when relevant, and to target the most helpful sections of the documentation (API reference, tutorials, examples).
+
+The above implementation plans are relevant to these plugins: {relevant_plugins}.
+Note that you MUST NOT use the plugins that are not listed above.
+
+You MUST only output the queries in the following JSON format (with json triple backticks):
+```json
+[
+ {{"type": "manim-core", "query": "content of function usage query"}},
+ {{"type": "", "query": "content of plugin-specific query"}},
+ {{"type": "manim-core", "query": "content of API reference query"}}
+ ...
+]
+```"""
+
diff --git a/task_generator/prompts_raw/banned_reasonings.txt b/task_generator/prompts_raw/banned_reasonings.txt
new file mode 100644
index 0000000000000000000000000000000000000000..c7329204ae6125dac677164cb8a066bdd2676449
--- /dev/null
+++ b/task_generator/prompts_raw/banned_reasonings.txt
@@ -0,0 +1,18 @@
+evaluation cannot
+can't assist
+cannot assist
+can't provide
+cannot provide
+can't evaluate
+cannot evaluate
+cannot be evaluated
+cannot be rated
+cannot be completed
+cannot be assessed
+cannot be scored
+cannot be conducted
+unable to evaluate
+do not have the capability
+do not have the ability
+are photographs and not AI-generated
+unable to provide the evaluation
\ No newline at end of file
diff --git a/task_generator/prompts_raw/code_background.txt b/task_generator/prompts_raw/code_background.txt
new file mode 100644
index 0000000000000000000000000000000000000000..2d18f0843f35082777daeba6cbaea8928c5ea277
--- /dev/null
+++ b/task_generator/prompts_raw/code_background.txt
@@ -0,0 +1,2 @@
+PLEASE DO NOT create another color background Rectangles. Default background (Black) is enough.
+PLEASE DO NOT use BLACK color for any text.
diff --git a/task_generator/prompts_raw/code_color_cheatsheet.txt b/task_generator/prompts_raw/code_color_cheatsheet.txt
new file mode 100644
index 0000000000000000000000000000000000000000..cb301dd2d1126df650c8f7acf0432dbf2e387f70
--- /dev/null
+++ b/task_generator/prompts_raw/code_color_cheatsheet.txt
@@ -0,0 +1,23 @@
+MUST include the following color definitions if you use the colors in your code. ONLY USE THE COLORS BELOW.
+
+WHITE = '#FFFFFF'
+RED = '#FF0000'
+GREEN = '#00FF00'
+BLUE = '#0000FF'
+YELLOW = '#FFFF00'
+CYAN = '#00FFFF'
+MAGENTA = '#FF00FF'
+ORANGE = '#FFA500'
+PURPLE = '#800080'
+PINK = '#FFC0CB'
+BROWN = '#A52A2A'
+GRAY = '#808080'
+TEAL = '#008080'
+NAVY = '#000080'
+OLIVE = '#808000'
+MAROON = '#800000'
+LIME = '#00FF00'
+AQUA = '#00FFFF'
+FUCHSIA = '#FF00FF'
+SILVER = '#C0C0C0'
+GOLD = '#FFD700'
\ No newline at end of file
diff --git a/task_generator/prompts_raw/code_disable.txt b/task_generator/prompts_raw/code_disable.txt
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/task_generator/prompts_raw/code_font_size.txt b/task_generator/prompts_raw/code_font_size.txt
new file mode 100644
index 0000000000000000000000000000000000000000..baaa31ee173ae702578059e335850967764dd1a4
--- /dev/null
+++ b/task_generator/prompts_raw/code_font_size.txt
@@ -0,0 +1,5 @@
+If there is title text, font size is highly recommended to be 28.
+If there are side labels, font size is highly recommended to be 24.
+If there are formulas, font size is highly recommended to be 24.
+
+However, if the text has more than 10 words, font size should be reduced further and mutiple lines should be used.
\ No newline at end of file
diff --git a/task_generator/prompts_raw/code_limit.txt b/task_generator/prompts_raw/code_limit.txt
new file mode 100644
index 0000000000000000000000000000000000000000..e1e798ae55374d93accadba57636a2105bc8ac9e
--- /dev/null
+++ b/task_generator/prompts_raw/code_limit.txt
@@ -0,0 +1,4 @@
+Note that the frame width and height are 14.222222222222221 and 8.0 respectively. And the center of the frame is (0, 0, 0).
+It means to avoid putting any object out of the frame, you should limit the x and y coordinates of the objects.
+limit x to be within -7.0 and 7.0 for objects, and limit y to be within -4.0 and 4.0 for objects.
+Place the objects near the center of the frame, without overlapping with each other.
\ No newline at end of file
diff --git a/task_generator/prompts_raw/prompt_animation_fix_error.txt b/task_generator/prompts_raw/prompt_animation_fix_error.txt
new file mode 100644
index 0000000000000000000000000000000000000000..aab84c5add28f9da2c49e0a299187b10a1efae99
--- /dev/null
+++ b/task_generator/prompts_raw/prompt_animation_fix_error.txt
@@ -0,0 +1,50 @@
+You are an expert Manim developer specializing in debugging and error resolution. Analyze the provided code and error message to provide a comprehensive fix and explanation.
+
+
+Text Explanation:
+{text_explanation}
+
+Manim Code Animation to complement the Text Explanation:
+```python
+{manim_code}
+```
+
+Error Message on code running:
+{error_message}
+
+
+You MUST only output the following format (make sure to include the ```python and ``` in the code):
+
+
+Error Type: [Syntax/Runtime/Logic/Other]
+Error Location: [File/Line number/Component]
+Root Cause: [Brief explanation of what caused the error]
+Impact: [What functionality is affected]
+
+
+
+[FIXES_REQUIRED]
+- Fix 1: [Description]
+ - Location: [Where to apply]
+ - Change: [What to modify]
+- Fix 2: [If applicable]
+ ...
+
+[CORRECTED_CODE]
+```python
+# Complete corrected and fully implemented code, don't be lazy
+# Include all necessary imports, definitions, and any additional code for the script to run successfully
+```
+
+
+
+Requirements:
+1. Provide complete error analysis with specific line numbers where possible.
+2. Include exact instructions for every code change.
+3. Ensure that the [CORRECTED_CODE] section contains complete, executable Python code (not just code snippets). Do not assume context from the prompt.
+4. Explain why the error occurred in plain language.
+5. Include verification steps to confirm the error is resolved.
+6. Suggest preventive measures for avoiding similar errors in the future.
+7. If external assets (e.g., images, audio, video) are referenced, remove them.
+8. Preserve all original code that is not causing the reported error. Do not remove or alter any intentional elements unnecessarily.
+9. Follow best practices for code clarity and the current Manim version.
\ No newline at end of file
diff --git a/task_generator/prompts_raw/prompt_animation_rag_query_generation.txt b/task_generator/prompts_raw/prompt_animation_rag_query_generation.txt
new file mode 100644
index 0000000000000000000000000000000000000000..a7c74360e3cded328900e3e5c38b0f18b0628595
--- /dev/null
+++ b/task_generator/prompts_raw/prompt_animation_rag_query_generation.txt
@@ -0,0 +1,29 @@
+You are an expert in Manim (Community Edition) and its plugins. Your task is to transform a topic for a Manim animation scene into queries that can be used to retrieve relevant documentation from both Manim core and any relevant plugins.
+
+Your queries should include keywords related to the specific Manim classes, methods, functions, and *concepts* that are likely to be used to implement the scene, including any plugin-specific functionality. Focus on extracting the core concepts, actions, and vocabulary from the *entire* scene plan. Generate queries that are concise and target different aspects of the documentation (class reference, method usage, animation examples, conceptual explanations) across both Manim core and relevant plugins.
+
+Here is the Topic (and the context):
+
+{topic}. {context}
+
+Based on the topic and the context, generate multiple human-like queries (maximum 5-7) for retrieving relevant documentation. Please ensure that the search targets are different so that the RAG can retrieve a diverse set of documents covering various aspects of the implementation.
+
+**Specifically, ensure that:**
+1. At least 1-2 queries are focused on retrieving information about Manim *function usage* in Manim scenes
+2. If the topic and the context can be linked to the use of plugin functionality, include at least 1 query specifically targeting plugin documentation
+3. Queries should be specific enough to distinguish between core Manim and plugin functionality when relevant
+
+The above text explanations are relevant to these plugins: {relevant_plugins}
+
+Output the queries in the following format:
+```json
+[
+ {{"query": "content of query 1", "type": "manim_core/name_of_the_plugin"}},
+ {{"query": "content of query 2", "type": "manim_core/name_of_the_plugin"}},
+ {{"query": "content of query 3", "type": "manim_core/name_of_the_plugin"}},
+ {{"query": "content of query 4", "type": "manim_core/name_of_the_plugin"}},
+ {{"query": "content of query 5", "type": "manim_core/name_of_the_plugin"}},
+ {{"query": "content of query 6", "type": "manim_core/name_of_the_plugin"}},
+ {{"query": "content of query 7", "type": "manim_core/name_of_the_plugin"}},
+]
+```
\ No newline at end of file
diff --git a/task_generator/prompts_raw/prompt_animation_rag_query_generation_fix_error.txt b/task_generator/prompts_raw/prompt_animation_rag_query_generation_fix_error.txt
new file mode 100644
index 0000000000000000000000000000000000000000..fe326331e1f29dc569e4e9e8e3064480de485071
--- /dev/null
+++ b/task_generator/prompts_raw/prompt_animation_rag_query_generation_fix_error.txt
@@ -0,0 +1,33 @@
+You are an expert in Manim (Community Edition) and its plugins. Your task is to transform a complete implementation plan for a Manim animation scene into queries that can be used to retrieve relevant documentation from both Manim core and any relevant plugins. The implementation plan will describe the scene's vision, technical implementation, and animation strategy.
+
+Here is the Text Explanation (Implementation Plan) as the context:
+
+{text_explanation}
+
+The error message will describe a problem encountered while running Manim code. Your queries should include keywords related to the specific Manim classes, methods, functions, and *concepts* that are likely related to the error, including any plugin-specific functionality. Focus on extracting the core concepts, actions, and vocabulary from the error message itself and the code snippet that produced the error. Generate queries that are concise and target different aspects of the documentation (class reference, method usage, animation examples, conceptual explanations) across both Manim core and relevant plugins.
+
+Here is the error message and the code snippet:
+
+**Error Message:**
+{error}
+
+**Code Snippet:**
+{code}
+
+Based on the error message and the code snippet, generate multiple human-like queries (maximum 5-7) for retrieving relevant documentation to fix this error. Please ensure that the search targets are different so that the RAG can retrieve a diverse set of documents covering various aspects of the error and its potential solutions.
+
+**Specifically, ensure that:**
+1. At least 1-2 queries are focused on retrieving information about Manim *function or class usage* that might be causing the error.
+2. If the error message or code suggests the use of plugin functionality, include at least 1 query specifically targeting plugin documentation related to the error.
+3. Queries should be specific enough to distinguish between core Manim and plugin functionality when relevant.
+
+Output the queries in the following format:
+[
+ {{"query": "content of query 1", "type": "manim_core/name_of_the_plugin"}},
+ {{"query": "content of query 2", "type": "manim_core/name_of_the_plugin"}},
+ {{"query": "content of query 3", "type": "manim_core/name_of_the_plugin"}},
+ {{"query": "content of query 4", "type": "manim_core/name_of_the_plugin"}},
+ {{"query": "content of query 5", "type": "manim_core/name_of_the_plugin"}},
+ {{"query": "content of query 6", "type": "manim_core/name_of_the_plugin"}},
+ {{"query": "content of query 7", "type": "manim_core/name_of_the_plugin"}},
+]
\ No newline at end of file
diff --git a/task_generator/prompts_raw/prompt_animation_simple.txt b/task_generator/prompts_raw/prompt_animation_simple.txt
new file mode 100644
index 0000000000000000000000000000000000000000..90ad63cdcd3a51b254ad7e54ef138797e0822300
--- /dev/null
+++ b/task_generator/prompts_raw/prompt_animation_simple.txt
@@ -0,0 +1,30 @@
+Given a topic and the context, you need to explain the topic by text.
+
+Also generate a Manim script that visually illustrates a key aspect of {topic} without including explanatory text in the animation itself.
+Your text can mention the animation, but it should not be the main focus.
+Context about the topic {topic}: {description}.
+
+The animation should focus on:
+* Illustrating the significant part of the theorem or concept – Use geometric figures, graphs, number lines, or any relevant visualization.
+* Providing an intuitive example – Instead of proving the theorem, show a concrete example or transformation that visually supports understanding.
+* Separately, provide a written explanation of the theorem as text that can be displayed outside the animation.
+
+Ensure that:
+
+* The animation is concise.
+* The Manim code is compatible with the latest version of community manim.
+* The visual elements are clear and enhance understanding.
+
+Please provide the only output as:
+
+1. A text explanation of the theorem.
+2. A complete Manim script that generates the animation. Only give the code.
+
+Output format:
+
+(Text Explanation Output)
+--- (split by ---)
+(Manim Code Output)
+
+Please do not include any other text or headers in your output.
+Only use one --- to split the text explanation and the Manim code.
\ No newline at end of file
diff --git a/task_generator/prompts_raw/prompt_best_practices.txt b/task_generator/prompts_raw/prompt_best_practices.txt
new file mode 100644
index 0000000000000000000000000000000000000000..da36c51c7ab1f32512de74821b85e45721bb2a6a
--- /dev/null
+++ b/task_generator/prompts_raw/prompt_best_practices.txt
@@ -0,0 +1,16 @@
+# Best practices for generating educational videos with manim
+
+1. Specify positions as relative to other objects whenever it makes sense.
+ * For example, if you want to place a label for a geometric object.
+2. Objects should be of different color from the black background.
+3. Keep the text on screen concise.
+ * On-screen elements should focus on showcasing the concept, examples and visuals. Labels and illustrative text are still encouraged.
+ * For explanations and observations, prefer narrations over on-screen text.
+ * You should still show calculations and algorithms in full on screen.
+ * For examples and practice problems, it is reasonable to show more text, especially key statements.
+ * Longer text should appear smaller to fit on screen.
+4. To control the timing of objects appearing:
+ * `add` has instantaneous effect, best used for the initial setup of the scene.
+ * Animations are best used during narration.
+ * Make sure the animations make sense. If an object is already on screen, it makes no sense to fade it in or create it again.
+5. Use TeX or MathTeX whenever you want to display math, including symbols and formulas.
diff --git a/task_generator/prompts_raw/prompt_code_generation.txt b/task_generator/prompts_raw/prompt_code_generation.txt
new file mode 100644
index 0000000000000000000000000000000000000000..096e2b126e8d2aa73f61de2b78d21802aa5d0a43
--- /dev/null
+++ b/task_generator/prompts_raw/prompt_code_generation.txt
@@ -0,0 +1,175 @@
+You are an expert Manim (Community Edition) developer for educational content. Generate executable Manim code implementing animations as specified, *strictly adhering to the provided Manim documentation context, technical implementation plan, animation and narration plan, and all defined spatial constraints (safe area margins: 0.5 units, minimum spacing: 0.3 units)*.
+
+Think of reusable animation components for a clean, modular, and maintainable library, *prioritizing code structure and best practices as demonstrated in the Manim documentation context*. *Throughout code generation, rigorously validate all spatial positioning and animations against the defined safe area margins and minimum spacing constraints. If any potential constraint violation is detected, generate a comment in the code highlighting the issue for manual review and correction.*
+
+Input Context:
+
+Topic: {topic}
+Description: {description}
+
+Scene Outline:
+{scene_outline}
+
+Scene Technical Implementation:
+{scene_implementation}
+
+**Code Generation Guidelines:**
+
+1. **Scene Class:** Class name `Scene{scene_number}`, where `{scene_number}` is replaced by the scene number (e.g., `Scene1`, `Scene2`). The scene class should at least inherit from `VoiceoverScene`. However, you can add more Manim Scene classes on top of VoiceoverScene for multiple inheritance if needed.
+2. **Imports:** Include ALL necessary imports explicitly at the top of the file, based on used Manim classes, functions, colors, and constants. Do not rely on implicit imports. Double-check for required modules, classes, functions, colors, and constants, *ensuring all imports are valid and consistent with the Manim Documentation*. **Include imports for any used Manim plugins.**
+3. **Speech Service:** Initialize `KokoroService()`. You MUST import like this: `from src.utils.kokoro_voiceover import KokoroService` as this is our custom voiceover service.
+4. **Reusable Animations:** Implement functions for each animation sequence to create modular and reusable code. Structure code into well-defined functions, following function definition patterns from Manim Documentation.
+5. **Voiceover:** Use `with self.voiceover(text="...")` for speech synchronization, precisely matching the narration script and animation timings from the Animation and Narration Plan.
+6. **Comments:** Add clear and concise comments for complex animations, spatial logic (positioning, arrangements), and object lifecycle management. *Use comments extensively to explain code logic, especially for spatial positioning, animation sequences, and constraint enforcement, mirroring commenting style in Manim Documentation*. **Add comments to explain the purpose and usage of any Manim plugins.**
+7. **Error Handling & Constraint Validation:** Implement basic error handling if error handling strategies are suggested or exemplified in the Manim Documentation. **Critically, during code generation, implement explicit checks to validate if each object's position and animation adheres to the safe area margins (0.5 units) and minimum spacing (0.3 units).**
+8. **Performance:** Follow Manim best practices for efficient code and rendering performance, as recommended in the Manim Documentation.
+9. **Manim Plugins:** You are allowed and encouraged to use established, well-documented Manim plugins if they simplify the code, improve efficiency, or provide functionality not readily available in core Manim.
+ * **If a plugin is used:**
+ * Include the necessary import statement at the top of the file.
+ * Add a comment indicating the plugin used and its purpose: `### Plugin: - `.
+ * Ensure all plugin usage adheres to the plugin's documentation.
+10. **No External Assets:** No external files (images, audio, video). *Use only Manim built-in elements and procedural generation, or elements provided by approved Manim plugins. No external assets are allowed*.
+11. **No Main Function:** Only scene class. No `if __name__ == "__main__":`.
+12. **Spatial Accuracy (Paramount):** Achieve accurate spatial positioning as described in the technical implementation plan, *strictly using relative positioning methods (`next_to`, `align_to`, `shift`, VGroups) and enforcing safe area margins and minimum 0.3 unit spacing, as documented in Manim Documentation Context*. *Spatial accuracy and constraint adherence are the highest priorities in code generation.*
+13. **VGroup Structure:** Implement VGroup hierarchy precisely as defined in the Technical Implementation Plan, using documented VGroup methods for object grouping and manipulation.
+14. **Spacing & Margins (Strict Enforcement):** Adhere strictly to safe area margins (0.5 units) and minimum spacing (0.3 units) requirements for *all* objects and VGroups throughout the scene and all animations. Prevent overlaps and ensure all objects stay within the safe area. *Rigorously enforce spacing and margin requirements using `buff` parameters, relative positioning, and explicit constraint validation checks during code generation, and validate against safe area guidelines from Manim Documentation Context*.
+15. **Background:** Default background (Black) is sufficient. Do not create custom color background Rectangles.
+16. **Text Color:** Do not use BLACK color for any text. Use predefined colors (BLUE_C, BLUE_D, GREEN_C, GREEN_D, GREY_A, GREY_B, GREY_C, LIGHTER_GRAY, LIGHT_GRAY, GOLD_C, GOLD_D, PURPLE_C, TEAL_C, TEAL_D, WHITE).
+17. **Default Colors:** You MUST use the provided color definitions if you use colors in your code. ONLY USE THE COLORS PREVIOUSLY DEFINED.
+18. **Animation Timings and Narration Sync:** Implement animations with precise `run_time` values and synchronize them with the narration script according to the Animation and Narration Plan. Use `Wait()` commands with specified durations for transition buffers.
+19. **Don't be lazy on code generation:** Generate full, complete code including all helper functions. Ensure that the output is comprehensive and the code is fully functional, incorporating all necessary helper methods and complete scene implementation details.
+20. **LaTeX Package Handling:** If the technical implementation plan specifies the need for additional LaTeX packages:
+ * Create a `TexTemplate` object.
+ * Use `myTemplate = TexTemplate()`
+ * Use `myTemplate.add_to_preamble(r"\\usepackage{{package_name}}")` to add the required package.
+ * Pass this template to the `Tex` or `MathTex` object: `tex = Tex(..., tex_template=myTemplate)`.
+
+**Example Code Style and Structure to Emulate:**
+
+* **Helper Classes:** Utilize helper classes (like `Scene2_Helper`) to encapsulate object creation and scene logic, promoting modularity and reusability.
+* **Stage-Based `construct` Method:** Structure the `construct` method into logical stages (e.g., Stage 1, Stage 2, Stage 3) with comments to organize the scene flow.
+* **Reusable Object Creation Functions:** Define reusable functions within helper classes for creating specific Manim objects (e.g., `create_axes`, `create_formula_tex`, `create_explanation_text`).
+* **Clear Comments and Variable Names:** Use clear, concise comments to explain code sections and logic. Employ descriptive variable names (e.g., `linear_function_formula`, `logistic_plot`) for better readability.
+* **Text Elements:** Create text elements using `Tex` or `MathTex` for formulas and explanations, styling them with `color` and `font_size` as needed.
+* **Manim Best Practices:** Follow Manim best practices, including using `VoiceoverScene`, `KokoroService`, common Manim objects, animations, relative positioning, and predefined colors.
+
+You MUST generate the Python code in the following format (from to
):
+
+```python
+from manim import *
+from manim import config as global_config
+from manim_voiceover import VoiceoverScene
+from src.utils.kokoro_voiceover import KokoroService # You MUST import like this as this is our custom voiceover service.
+
+# plugins imports, don't change the import statements
+from manim_circuit import *
+from manim_physics import *
+from manim_chemistry import *
+from manim_dsa import *
+from manim_ml import *
+
+# Helper Functions/Classes (Implement and use helper classes and functions for improved code reusability and organization)
+class Scene{scene_number}_Helper: # Example: class Scene1_Helper:
+ # Helper class containing utility functions for scene {scene_number}.
+ def __init__(self, scene):
+ self.scene = scene
+ # ... (add any necessary initializations)
+
+ # Reusable object creation functions (Implement object creation functions for modularity and reusability as per plan)
+ def get_center_of_edges(self, polygon, buff=SMALL_BUFF*3):
+ # Calculate the center points of each edge in a polygon (Triangle, Square, etc.) with an optional buffer.
+ # Get the vertices of the polygon
+ vertices = polygon.get_vertices()
+ n_vertices = len(vertices)
+ # Initialize list to store edge centers
+ coords_vertices = []
+ # Calculate center point and normal for each edge
+ for i in range(n_vertices):
+ # Get current and next vertex (wrapping around to first vertex)
+ v1 = vertices[i]
+ v2 = vertices[(i + 1) % n_vertices]
+ # Calculate edge center
+ edge_center = (v1 + v2) / 2
+ # Calculate edge vector and normalize
+ edge_vector = v2 - v1
+ edge_length = np.linalg.norm(edge_vector)
+ normal = np.array([-edge_vector[1], edge_vector[0], 0]) / edge_length
+ # Add buffer in the normal direction
+ coords_vertices.append(edge_center + normal * buff)
+
+ return coords_vertices
+
+ def create_formula_tex(self, formula_str, color):
+ # Example function to create a MathTex formula with a specified color.
+ # Check if a custom TexTemplate is needed (from the technical plan).
+ if hasattr(self.scene, 'tex_template'):
+ formula = MathTex(formula_str, color=color, tex_template=self.scene.tex_template)
+ else:
+ formula = MathTex(formula_str, color=color)
+ return formula
+
+ # ... (add more helper functions as needed for object creation and scene logic)
+
+
+class Scene{scene_number}(VoiceoverScene, MovingCameraScene): # Note: You can add more Manim Scene classes on top of current templates for multiple inheritance if needed.
+ # Reminder: This scene class is fully self-contained. There is no dependency on the implementation from previous or subsequent scenes.
+ def construct(self):
+ # Initialize speech service
+ self.set_speech_service(KokoroService())
+
+ # Instantiate helper class (as per plan)
+ helper = Scene{scene_number}_Helper(self) # Example: helper = Scene1_Helper(self)
+
+ # Check for LaTeX packages and create TexTemplate if needed.
+ # This section should be generated based on the technical implementation plan.
+ # For example, if the plan includes: "Requires: \\usepackage{{amsmath}}"
+ # Then generate:
+ #
+ # my_template = TexTemplate()
+ # my_template.add_to_preamble(r"\\usepackage{{amsmath}}")
+ # self.tex_template = my_template
+
+ # --- Stage 1: Scene Setup (adapt stage numbers and descriptions to your scene, following plan) ---
+ with self.voiceover(text="[Narration for Stage 1 - from Animation and Narration Plan]") as tracker: # Voiceover for Stage 1
+ # Object Creation using helper functions (as per plan)
+ axes = helper.create_axes() # Example: axes = helper.create_axes()
+ formula = helper.create_formula_tex("...", BLUE_C) # Example: formula = helper.create_formula_tex("...", BLUE_C)
+ explanation = helper.create_explanation_text("...") # Example: explanation = helper.create_explanation_text("...")
+
+ # Positioning objects (relative positioning, constraint validation - as per plan)
+ formula.to_corner(UL) # Example positioning
+ axes.move_to(ORIGIN) # Example positioning
+ explanation.next_to(axes, RIGHT) # Example positioning
+
+ # Animations for Stage 1 (synced with voiceover - as per plan)
+ self.play(Write(formula), Write(axes), run_time=tracker.duration) # Example animations
+ self.wait(0.5) # Transition buffer
+
+ # --- Stage 2: ... (Implement Stage 2, Stage 3, etc. in a similar modular and structured way, following plan) ---
+ with self.voiceover(text="[Narration for Stage 2 - from Animation and Narration Plan]") as tracker: # Voiceover for Stage 2
+ # ... (Object creation, positioning, and animations for Stage 2, using helper functions and constraint validation)
+ pass # Replace with actual Stage 2 code
+
+ # ... (Implement remaining stages in a similar modular and structured way, following the Animation and Narration Plan and Technical Implementation Plan, and rigorously validating spatial constraints in each stage)
+
+ self.wait(1) # Scene end transition buffer
+```
+
+
+Notes:
+The `get_center_of_edges` helper function is particularly useful for:
+1. Finding the midpoint of polygon edges for label placement
+2. Calculating offset positions for side labels that don't overlap with the polygon
+3. Creating consistent label positioning across different polygon sizes and orientations
+
+Example usage in your scene:
+```python
+def label_triangle_sides(self, triangle, labels=["a", "b", "c"]):
+ # Helper function to label triangle sides.
+ edge_centers = self.helper.get_center_of_edges(triangle)
+ labeled_sides = VGroup()
+ for center, label in zip(edge_centers, labels):
+ tex = MathTex(label).move_to(center)
+ labeled_sides.add(tex)
+ return labeled_sides
+```
\ No newline at end of file
diff --git a/task_generator/prompts_raw/prompt_context_learning_animation_narration.txt b/task_generator/prompts_raw/prompt_context_learning_animation_narration.txt
new file mode 100644
index 0000000000000000000000000000000000000000..3aca08d1d99c18212dde6996344d8db96075712a
--- /dev/null
+++ b/task_generator/prompts_raw/prompt_context_learning_animation_narration.txt
@@ -0,0 +1,5 @@
+Here are some example animation and narration plans to help guide your planning:
+
+{examples}
+
+Please follow a similar structure while maintaining creativity and relevance to the current scene.
\ No newline at end of file
diff --git a/task_generator/prompts_raw/prompt_context_learning_code.txt b/task_generator/prompts_raw/prompt_context_learning_code.txt
new file mode 100644
index 0000000000000000000000000000000000000000..5360359bdad8a0e657ba1ab7d907d19bfbe1e8f7
--- /dev/null
+++ b/task_generator/prompts_raw/prompt_context_learning_code.txt
@@ -0,0 +1,5 @@
+Here are some example Manim code implementations to help guide your code generation:
+
+{examples}
+
+Please follow similar patterns and best practices while implementing the current scene.
\ No newline at end of file
diff --git a/task_generator/prompts_raw/prompt_context_learning_scene_plan.txt b/task_generator/prompts_raw/prompt_context_learning_scene_plan.txt
new file mode 100644
index 0000000000000000000000000000000000000000..5c43363872196eda42d1926da2f6c108f00aa7da
--- /dev/null
+++ b/task_generator/prompts_raw/prompt_context_learning_scene_plan.txt
@@ -0,0 +1,5 @@
+Here are some example scene plans to help guide your scene planning:
+
+{examples}
+
+Please follow a similar structure while maintaining creativity and relevance to the current topic.
\ No newline at end of file
diff --git a/task_generator/prompts_raw/prompt_context_learning_technical_implementation.txt b/task_generator/prompts_raw/prompt_context_learning_technical_implementation.txt
new file mode 100644
index 0000000000000000000000000000000000000000..61935ff1499d2d94ac8fa69ab50dde8bcbc6ac34
--- /dev/null
+++ b/task_generator/prompts_raw/prompt_context_learning_technical_implementation.txt
@@ -0,0 +1,5 @@
+Here are some example technical implementation plans to help guide your implementation:
+
+{examples}
+
+Please follow a similar structure while maintaining creativity and relevance to the current scene.
\ No newline at end of file
diff --git a/task_generator/prompts_raw/prompt_context_learning_vision_storyboard.txt b/task_generator/prompts_raw/prompt_context_learning_vision_storyboard.txt
new file mode 100644
index 0000000000000000000000000000000000000000..561e826b2643af7440c5225a40164604710980be
--- /dev/null
+++ b/task_generator/prompts_raw/prompt_context_learning_vision_storyboard.txt
@@ -0,0 +1,5 @@
+Here are some example vision and storyboard plans to help guide your planning:
+
+{examples}
+
+Please follow a similar structure while maintaining creativity and relevance to the current scene.
\ No newline at end of file
diff --git a/task_generator/prompts_raw/prompt_detect_plugins.txt b/task_generator/prompts_raw/prompt_detect_plugins.txt
new file mode 100644
index 0000000000000000000000000000000000000000..f4517f55551b84b9806143ed5bcc64577a7aeec5
--- /dev/null
+++ b/task_generator/prompts_raw/prompt_detect_plugins.txt
@@ -0,0 +1,33 @@
+You are a Manim plugin detection system. Your task is to analyze a video topic and description to determine which Manim plugins would be most relevant for the actual animation implementation needs.
+
+Topic:
+{topic}
+
+Description:
+{description}
+
+Available Plugins:
+{plugin_descriptions}
+
+Instructions:
+1. Analyze the topic and description, focusing specifically on what needs to be animated
+2. Review each plugin's capabilities and determine if they provide specific tools needed for the animations described
+3. Only select plugins that provide functionality directly needed for the core animations
+4. Consider these criteria for each plugin:
+ - Does the plugin provide specific tools or components needed for the main visual elements?
+ - Are the plugin's features necessary for implementing the core animations?
+ - Would the animation be significantly more difficult to create without this plugin?
+5. Exclude plugins that:
+ - Only relate to the general topic area but don't provide needed animation tools
+ - Might be "nice to have" but aren't essential for the core visualization
+ - Could be replaced easily with basic Manim shapes and animations
+
+Your response must follow the output format below:
+
+[brief description of your thinking process]
+
+
+```json
+["plugin_name1", "plugin_name2"]
+```
+
\ No newline at end of file
diff --git a/task_generator/prompts_raw/prompt_fix_error.txt b/task_generator/prompts_raw/prompt_fix_error.txt
new file mode 100644
index 0000000000000000000000000000000000000000..d7033ad92d9924c1130efa4defd398daf9654953
--- /dev/null
+++ b/task_generator/prompts_raw/prompt_fix_error.txt
@@ -0,0 +1,43 @@
+You are an expert Manim developer specializing in debugging and error resolution. Based on the provided implementation plan and Manim code, analyze the error message to provide a comprehensive fix and explanation.
+
+Implementation Plan of the Scene:
+{implementation_plan}
+
+Manim Code:
+```python
+{manim_code}
+```
+
+Error Message:
+{error_message}
+
+Requirements:
+1. Provide complete error analysis with specific line numbers where possible.
+2. Include exact instructions for every code change.
+3. Explain why the error occurred in plain language.
+4. If external assets (e.g., images, audio, video) are referenced, remove them.
+5. **If voiceover is present in the original code, ensure it remains preserved in the corrected code.**
+6. Preserve all original code that is not causing the reported error. Do not remove or alter any intentional elements unnecessarily.
+7. Follow best practices for code clarity and the current Manim version.
+
+You MUST only output the following format (from to ). You MUST NOT come up with any other format like JSON.
+
+
+Error Type: [Syntax/Runtime/Logic/Other]
+Error Location: [File/Line number/Component]
+Root Cause: [Brief explanation of what caused the error]
+Impact: [What functionality is affected]
+Solution:
+[FIXES_REQUIRED]
+- Fix 1: [Description]
+ - Location: [Where to apply]
+ - Change: [What to modify]
+- Fix 2: [If applicable]
+...
+
+
+```python
+# Complete corrected and fully implemented Python code
+# Include all necessary imports, definitions, and any additional code for the script to run successfully
+```
+
\ No newline at end of file
diff --git a/task_generator/prompts_raw/prompt_manim_cheatsheet.txt b/task_generator/prompts_raw/prompt_manim_cheatsheet.txt
new file mode 100644
index 0000000000000000000000000000000000000000..6b985c02c3636d4fc13969672aa4a9c56ffaedbd
--- /dev/null
+++ b/task_generator/prompts_raw/prompt_manim_cheatsheet.txt
@@ -0,0 +1,494 @@
+The followings are the inheritance diagram of the Manim library. You can take as reference to select which class to use for the animation.
+
+```
+digraph Animation {
+ "AddTextLetterByLetter"
+ "ShowIncreasingSubsets"
+ "ShowIncreasingSubsets" -> "AddTextLetterByLetter"
+ "AddTextWordByWord";
+ "Succession";
+ "Succession" -> "AddTextWordByWord";
+ "AnimatedBoundary";
+ "VGroup";
+ "VGroup" -> "AnimatedBoundary";
+ "Animation";
+ "AnimationGroup";
+ "Animation" -> "AnimationGroup";
+ "ApplyComplexFunction";
+ "ApplyMethod";
+ "ApplyMethod" -> "ApplyComplexFunction";
+ "ApplyFunction";
+ "Transform";
+ "Transform" -> "ApplyFunction";
+ "ApplyMatrix";
+ "ApplyPointwiseFunction";
+ "ApplyPointwiseFunction" -> "ApplyMatrix";
+ "ApplyMethod";
+ "Transform" -> "ApplyMethod";
+ "ApplyPointwiseFunction";
+ "ApplyMethod" -> "ApplyPointwiseFunction";
+ "ApplyPointwiseFunctionToCenter";
+ "ApplyPointwiseFunction" -> "ApplyPointwiseFunctionToCenter";
+ "ApplyWave";
+ "Homotopy";
+ "Homotopy" -> "ApplyWave";
+ "Broadcast";
+ "LaggedStart";
+ "LaggedStart" -> "Broadcast";
+ "ChangeDecimalToValue";
+ "ChangingDecimal";
+ "ChangingDecimal" -> "ChangeDecimalToValue";
+ "ChangeSpeed";
+ "Animation" -> "ChangeSpeed";
+ "ChangingDecimal";
+ "Animation" -> "ChangingDecimal";
+ "Circumscribe";
+ "Succession" -> "Circumscribe";
+ "ClockwiseTransform";
+ "Transform" -> "ClockwiseTransform";
+ "ComplexHomotopy";
+ "Homotopy" -> "ComplexHomotopy";
+ "CounterclockwiseTransform";
+ "Transform" -> "CounterclockwiseTransform";
+ "Create";
+ "ShowPartial";
+ "ShowPartial" -> "Create";
+ "CyclicReplace";
+ "Transform" -> "CyclicReplace";
+ "DrawBorderThenFill";
+ "Animation" -> "DrawBorderThenFill";
+ "FadeIn";
+ "FadeOut";
+ "FadeToColor";
+ "ApplyMethod" -> "FadeToColor";
+ "FadeTransform";
+ "Transform" -> "FadeTransform";
+ "FadeTransformPieces";
+ "FadeTransform" -> "FadeTransformPieces";
+ "Flash";
+ "AnimationGroup" -> "Flash";
+ "FocusOn";
+ "Transform" -> "FocusOn";
+ "GrowArrow";
+ "GrowFromPoint";
+ "GrowFromPoint" -> "GrowArrow";
+ "GrowFromCenter";
+ "GrowFromPoint" -> "GrowFromCenter";
+ "GrowFromEdge";
+ "GrowFromPoint" -> "GrowFromEdge";
+ "GrowFromPoint";
+ "Transform" -> "GrowFromPoint";
+ "Homotopy";
+ "Animation" -> "Homotopy";
+ "Indicate";
+ "Transform" -> "Indicate";
+ "LaggedStart";
+ "AnimationGroup" -> "LaggedStart";
+ "LaggedStartMap";
+ "LaggedStart" -> "LaggedStartMap";
+ "MaintainPositionRelativeTo";
+ "Animation" -> "MaintainPositionRelativeTo";
+ "Mobject";
+ "MoveAlongPath";
+ "Animation" -> "MoveAlongPath";
+ "MoveToTarget";
+ "Transform" -> "MoveToTarget";
+ "PhaseFlow";
+ "Animation" -> "PhaseFlow";
+ "RemoveTextLetterByLetter";
+ "AddTextLetterByLetter" -> "RemoveTextLetterByLetter";
+ "ReplacementTransform";
+ "Transform" -> "ReplacementTransform";
+ "Restore";
+ "ApplyMethod" -> "Restore";
+ "Rotate";
+ "Transform" -> "Rotate";
+ "Rotating";
+ "Animation" -> "Rotating";
+ "ScaleInPlace";
+ "ApplyMethod" -> "ScaleInPlace";
+ "ShowIncreasingSubsets";
+ "Animation" -> "ShowIncreasingSubsets";
+ "ShowPartial";
+ "Animation" -> "ShowPartial";
+ "ShowPassingFlash";
+ "ShowPartial" -> "ShowPassingFlash";
+ "ShowPassingFlashWithThinningStrokeWidth";
+ "AnimationGroup" -> "ShowPassingFlashWithThinningStrokeWidth";
+ "ShowSubmobjectsOneByOne";
+ "ShowIncreasingSubsets" -> "ShowSubmobjectsOneByOne";
+ "ShrinkToCenter";
+ "ScaleInPlace" -> "ShrinkToCenter";
+ "SmoothedVectorizedHomotopy";
+ "Homotopy" -> "SmoothedVectorizedHomotopy";
+ "SpinInFromNothing";
+ "GrowFromCenter" -> "SpinInFromNothing";
+ "SpiralIn";
+ "Animation" -> "SpiralIn";
+ "Succession";
+ "AnimationGroup" -> "Succession";
+ "Swap";
+ "CyclicReplace" -> "Swap";
+ "TracedPath";
+ "VMobject";
+ "VMobject" -> "TracedPath";
+ "Transform";
+ "Animation" -> "Transform";
+ "TransformAnimations";
+ "Transform" -> "TransformAnimations";
+ "TransformFromCopy";
+ "Transform" -> "TransformFromCopy";
+ "TransformMatchingAbstractBase";
+ "AnimationGroup" -> "TransformMatchingAbstractBase";
+ "TransformMatchingShapes";
+ "TransformMatchingAbstractBase" -> "TransformMatchingShapes";
+ "TransformMatchingTex";
+ "TransformMatchingAbstractBase" -> "TransformMatchingTex";
+ "Uncreate";
+ "Create" -> "Uncreate";
+ "Unwrite";
+ "Write";
+ "Write" -> "Unwrite";
+ "UpdateFromAlphaFunc";
+ "UpdateFromFunc";
+ "UpdateFromFunc" -> "UpdateFromAlphaFunc";
+ "UpdateFromFunc";
+ "Animation" -> "UpdateFromFunc";
+ "VGroup";
+ "VMobject" -> "VGroup";
+ "VMobject";
+ "Mobject" -> "VMobject";
+
+ "Wait";
+ "Animation" -> "Wait";
+ "Wiggle";
+ "Animation" -> "Wiggle";
+ "Write";
+ "DrawBorderThenFill" -> "Write";
+}
+```
+
+
+```
+digraph Camera {
+ "BackgroundColoredVMobjectDisplayer"
+ "Camera"
+ "MappingCamera"
+ "Camera" -> "MappingCamera"
+ "MovingCamera"
+ "Camera" -> "MovingCamera"
+ "MultiCamera"
+ "MovingCamera" -> "MultiCamera"
+ "OldMultiCamera"
+ "Camera" -> "OldMultiCamera"
+ "SplitScreenCamera"
+ "OldMultiCamera" -> "SplitScreenCamera"
+ "ThreeDCamera"
+ "Camera" -> "ThreeDCamera"
+}
+```
+
+```
+digraph MObject {
+ "AbstractImageMobject"
+ "Mobject" -> "AbstractImageMobject"
+ "Angle"
+ "VMobject" -> "Angle"
+ "AnnotationDot"
+ "Dot" -> "AnnotationDot"
+ "AnnularSector"
+ "Arc" -> "AnnularSector"
+ "Annulus"
+ "Circle" -> "Annulus"
+ "Arc"
+ "TipableVMobject" -> "Arc"
+ "ArcBetweenPoints"
+ "Arc" -> "ArcBetweenPoints"
+ "ArcBrace"
+ "Brace" -> "ArcBrace"
+ "ArcPolygon"
+ "VMobject" -> "ArcPolygon"
+ "ArcPolygonFromArcs"
+ "VMobject" -> "ArcPolygonFromArcs"
+ "Arrow"
+ "Line" -> "Arrow"
+ "Arrow3D"
+ "Line3D" -> "Arrow3D"
+ "ArrowCircleFilledTip"
+ "ArrowCircleTip" -> "ArrowCircleFilledTip"
+ "ArrowCircleTip"
+ "ArrowTip" -> "ArrowCircleTip"
+ "Circle" -> "ArrowCircleTip"
+ "ArrowSquareFilledTip"
+ "ArrowSquareTip" -> "ArrowSquareFilledTip"
+ "ArrowSquareTip"
+ "ArrowTip" -> "ArrowSquareTip"
+ "Square" -> "ArrowSquareTip"
+ "ArrowTip"
+ "VMobject" -> "ArrowTip"
+ "ArrowTriangleFilledTip"
+ "ArrowTriangleTip" -> "ArrowTriangleFilledTip"
+ "ArrowTriangleTip"
+ "ArrowTip" -> "ArrowTriangleTip"
+ "Triangle" -> "ArrowTriangleTip"
+ "ArrowVectorField"
+ "VectorField" -> "ArrowVectorField"
+ "Axes"
+ "VGroup" -> "Axes"
+ "CoordinateSystem" -> "Axes"
+ "BackgroundRectangle"
+ "SurroundingRectangle" -> "BackgroundRectangle"
+ "BarChart"
+ "Axes" -> "BarChart"
+ "Brace"
+ "svg_mobject.VMobjectFromSVGPath" -> "Brace"
+ "BraceBetweenPoints"
+ "Brace" -> "BraceBetweenPoints"
+ "BraceLabel"
+ "VMobject" -> "BraceLabel"
+ "BraceText"
+ "BraceLabel" -> "BraceText"
+ "BulletedList"
+ "Tex" -> "BulletedList"
+ "Circle"
+ "Arc" -> "Circle"
+ "Code"
+ "VGroup" -> "Code"
+ "ComplexPlane"
+ "NumberPlane" -> "ComplexPlane"
+ "ComplexValueTracker"
+ "ValueTracker" -> "ComplexValueTracker"
+ "Cone"
+ "Surface" -> "Cone"
+ "CoordinateSystem"
+ "Cross"
+ "VGroup" -> "Cross"
+ "Cube"
+ "VGroup" -> "Cube"
+ "CubicBezier"
+ "VMobject" -> "CubicBezier"
+ "CurvedArrow"
+ "ArcBetweenPoints" -> "CurvedArrow"
+ "CurvedDoubleArrow"
+ "CurvedArrow" -> "CurvedDoubleArrow"
+ "CurvesAsSubmobjects"
+ "VGroup" -> "CurvesAsSubmobjects"
+ "Cutout"
+ "VMobject" -> "Cutout"
+ "Cylinder"
+ "Surface" -> "Cylinder"
+ "DashedLine"
+ "Line" -> "DashedLine"
+ "DashedVMobject"
+ "VMobject" -> "DashedVMobject"
+ "DecimalMatrix"
+ "Matrix" -> "DecimalMatrix"
+ "DecimalNumber"
+ "VMobject" -> "DecimalNumber"
+ "DecimalTable"
+ "Table" -> "DecimalTable"
+ "DiGraph"
+ "GenericGraph" -> "DiGraph"
+ "Difference"
+ "Dodecahedron"
+ "Polyhedron" -> "Dodecahedron"
+ "Dot"
+ "Circle" -> "Dot"
+ "Dot3D"
+ "Sphere" -> "Dot3D"
+ "DoubleArrow"
+ "Arrow" -> "DoubleArrow"
+ "Elbow"
+ "VMobject" -> "Elbow"
+ "Ellipse"
+ "Circle" -> "Ellipse"
+ "Exclusion"
+ "FullScreenRectangle"
+ "ScreenRectangle" -> "FullScreenRectangle"
+ "FunctionGraph"
+ "ParametricFunction" -> "FunctionGraph"
+ "Generic"
+ "GenericGraph"
+ "Generic" -> "GenericGraph"
+ "Graph"
+ "GenericGraph" -> "Graph"
+ "Group"
+ "Mobject" -> "Group"
+ "Icosahedron"
+ "Polyhedron" -> "Icosahedron"
+ "ImageMobject"
+ "AbstractImageMobject" -> "ImageMobject"
+ "ImageMobjectFromCamera"
+ "AbstractImageMobject" -> "ImageMobjectFromCamera"
+ "ImplicitFunction"
+ "VMobject" -> "ImplicitFunction"
+ "Integer"
+ "DecimalNumber" -> "Integer"
+ "IntegerMatrix"
+ "Matrix" -> "IntegerMatrix"
+ "IntegerTable"
+ "Table" -> "IntegerTable"
+ "Intersection"
+ "LabeledDot"
+ "Dot" -> "LabeledDot"
+ "LayoutFunction"
+ "Protocol" -> "LayoutFunction"
+ "Line"
+ "TipableVMobject" -> "Line"
+ "Line3D"
+ "Cylinder" -> "Line3D"
+ "LinearBase"
+ "LogBase"
+ "ManimBanner"
+ "VGroup" -> "ManimBanner"
+ "MarkupText"
+ "svg_mobject.SVGMobject" -> "MarkupText"
+ "MathTable"
+ "Table" -> "MathTable"
+ "MathTex"
+ "SingleStringMathTex" -> "MathTex"
+ "Matrix"
+ "VMobject" -> "Matrix"
+ "Mobject"
+ "Mobject1D"
+ "PMobject" -> "Mobject1D"
+ "Mobject2D"
+ "PMobject" -> "Mobject2D"
+ "MobjectMatrix"
+ "Matrix" -> "MobjectMatrix"
+ "MobjectTable"
+ "Table" -> "MobjectTable"
+ "NumberLine"
+ "Line" -> "NumberLine"
+ "NumberPlane"
+ "Axes" -> "NumberPlane"
+ "Octahedron"
+ "Polyhedron" -> "Octahedron"
+ "PGroup"
+ "PMobject" -> "PGroup"
+ "PMobject"
+ "Mobject" -> "PMobject"
+ "Paragraph"
+ "VGroup" -> "Paragraph"
+ "ParametricFunction"
+ "VMobject" -> "ParametricFunction"
+ "Point"
+ "PMobject" -> "Point"
+ "PointCloudDot"
+ "Mobject1D" -> "PointCloudDot"
+ "PolarPlane"
+ "Axes" -> "PolarPlane"
+ "Polygon"
+ "Polygram" -> "Polygon"
+ "Polygram"
+ "VMobject" -> "Polygram"
+ "Polyhedron"
+ "VGroup" -> "Polyhedron"
+ "Prism"
+ "Cube" -> "Prism"
+ "Protocol"
+ "Generic" -> "Protocol"
+ "Rectangle"
+ "Polygon" -> "Rectangle"
+ "RegularPolygon"
+ "RegularPolygram" -> "RegularPolygon"
+ "RegularPolygram"
+ "Polygram" -> "RegularPolygram"
+ "RightAngle"
+ "Angle" -> "RightAngle"
+ "RoundedRectangle"
+ "Rectangle" -> "RoundedRectangle"
+ "SVGMobject"
+ "VMobject" -> "SVGMobject"
+ "SampleSpace"
+ "Rectangle" -> "SampleSpace"
+ "ScreenRectangle"
+ "Rectangle" -> "ScreenRectangle"
+ "Sector"
+ "AnnularSector" -> "Sector"
+ "SingleStringMathTex"
+ "svg_mobject.SVGMobject" -> "SingleStringMathTex"
+ "Sphere"
+ "Surface" -> "Sphere"
+ "Square"
+ "Rectangle" -> "Square"
+ "Star"
+ "Polygon" -> "Star"
+ "StealthTip"
+ "ArrowTip" -> "StealthTip"
+ "StreamLines"
+ "VectorField" -> "StreamLines"
+ "Surface"
+ "VGroup" -> "Surface"
+ "SurroundingRectangle"
+ "RoundedRectangle" -> "SurroundingRectangle"
+ "Table"
+ "VGroup" -> "Table"
+ "TangentLine"
+ "Line" -> "TangentLine"
+ "Tetrahedron"
+ "Polyhedron" -> "Tetrahedron"
+ "Tex"
+ "MathTex" -> "Tex"
+ "Text"
+ "svg_mobject.SVGMobject" -> "Text"
+ "ThreeDAxes"
+ "Axes" -> "ThreeDAxes"
+ "ThreeDVMobject"
+ "VMobject" -> "ThreeDVMobject"
+ "TipableVMobject"
+ "VMobject" -> "TipableVMobject"
+ "Title"
+ "Tex" -> "Title"
+ "Torus"
+ "Surface" -> "Torus"
+ "Triangle"
+ "RegularPolygon" -> "Triangle"
+ "Underline"
+ "Line" -> "Underline"
+ "Union"
+ "UnitInterval"
+ "NumberLine" -> "UnitInterval"
+ "VDict"
+ "VMobject" -> "VDict"
+ "VGroup"
+ "VMobject" -> "VGroup"
+ "VMobject"
+ "Mobject" -> "VMobject"
+ "VMobjectFromSVGPath"
+ "VMobject" -> "VMobjectFromSVGPath"
+ "ValueTracker"
+ "Mobject" -> "ValueTracker"
+ "Variable"
+ "VMobject" -> "Variable"
+ "Vector"
+ "Arrow" -> "Vector"
+ "VectorField"
+ "VGroup" -> "VectorField"
+ "VectorizedPoint"
+ "VMobject" -> "VectorizedPoint"
+}
+```
+
+```
+digraph Scene {
+ "LinearTransformationScene"
+ "VectorScene"
+ "VectorScene" -> "LinearTransformationScene"
+ "MovingCameraScene"
+ "Scene"
+ "Scene" -> "MovingCameraScene"
+ "RerunSceneHandler"
+ "Scene"
+ "SceneFileWriter"
+ "SpecialThreeDScene"
+ "ThreeDScene"
+ "ThreeDScene" -> "SpecialThreeDScene"
+ "ThreeDScene"
+ "Scene" -> "ThreeDScene"
+ "VectorScene"
+ "Scene" -> "VectorScene"
+ "ZoomedScene"
+ "MovingCameraScene" -> "ZoomedScene"
+}
+```
\ No newline at end of file
diff --git a/task_generator/prompts_raw/prompt_rag_query_generation_code.txt b/task_generator/prompts_raw/prompt_rag_query_generation_code.txt
new file mode 100644
index 0000000000000000000000000000000000000000..17a30d5d7dcc84febe2d6d919062fdcfcb39c0ad
--- /dev/null
+++ b/task_generator/prompts_raw/prompt_rag_query_generation_code.txt
@@ -0,0 +1,25 @@
+You are an expert in generating search queries specifically for **Manim (Community Edition) documentation** (both core Manim and its plugins). Your task is to transform a complete implementation plan for a Manim video scene into effective queries that will retrieve relevant information from Manim documentation. The implementation plan describes the scene's vision, storyboard, technical implementation, and animation/narration strategy.
+
+Here is the complete scene implementation plan:
+
+{implementation_plan}
+
+Based on the complete implementation plan, generate multiple human-like queries (maximum 10) for retrieving relevant documentation. Please ensure that the search targets are different so that the RAG can retrieve a diverse set of documents covering various aspects of the implementation.
+
+**Specifically, ensure that:**
+1. At least some queries are focused on retrieving information about **Manim function usage** in scenes. Frame these queries to target function definitions, usage examples, and parameter details within Manim documentation.
+2. If the implementation suggests using plugin functionality, include at least 1 query specifically targeting **plugin documentation**. Clearly mention the plugin name in these queries to focus the search.
+3. Queries should be specific enough to distinguish between core Manim and plugin functionality when relevant, and to target the most helpful sections of the documentation (API reference, tutorials, examples).
+
+The above implementation plans are relevant to these plugins: {relevant_plugins}.
+Note that you MUST NOT use the plugins that are not listed above.
+
+You MUST only output the queries in the following JSON format (with json triple backticks):
+```json
+[
+ {{"type": "manim-core", "query": "content of function usage query"}},
+ {{"type": "", "query": "content of plugin-specific query"}},
+ {{"type": "manim-core", "query": "content of API reference query"}}
+ ...
+]
+```
\ No newline at end of file
diff --git a/task_generator/prompts_raw/prompt_rag_query_generation_fix_error.txt b/task_generator/prompts_raw/prompt_rag_query_generation_fix_error.txt
new file mode 100644
index 0000000000000000000000000000000000000000..1f9f564730b648adb43e99cfaf184a8b5fbd2049
--- /dev/null
+++ b/task_generator/prompts_raw/prompt_rag_query_generation_fix_error.txt
@@ -0,0 +1,27 @@
+You are an expert in generating search queries specifically for **Manim (Community Edition) documentation** (both core Manim and its plugins). Your task is to transform a Manim error and its associated code into effective queries that will retrieve relevant information from Manim documentation.
+
+Here is the error message:
+{error}
+
+Here is the Manim code that caused the error:
+{code}
+
+Based on the error and code, generate multiple human-like queries (maximum 10) for retrieving relevant documentation. Please ensure that the search targets are different so that the RAG can retrieve a diverse set of documents covering various aspects of the implementation.
+
+**Specifically, ensure that:**
+1. At least some queries are focused on retrieving information about **Manim function usage** in scenes. Frame these queries to target function definitions, usage examples, and parameter details within Manim documentation.
+2. If the error suggests using plugin functionality, include at least 1 query specifically targeting **plugin documentation**. Clearly mention the plugin name in these queries to focus the search.
+3. Queries should be specific enough to distinguish between core Manim and plugin functionality when relevant, and to target the most helpful sections of the documentation (API reference, tutorials, examples).
+
+The above error and code are relevant to these plugins: {relevant_plugins}.
+Note that you MUST NOT use the plugins that are not listed above.
+
+You MUST only output the queries in the following JSON format (with json triple backticks):
+```json
+[
+ {{"type": "manim-core", "query": "content of function usage query"}},
+ {{"type": "", "query": "content of plugin-specific query"}},
+ {{"type": "manim-core", "query": "content of API reference query"}}
+ ...
+]
+```
\ No newline at end of file
diff --git a/task_generator/prompts_raw/prompt_rag_query_generation_narration.txt b/task_generator/prompts_raw/prompt_rag_query_generation_narration.txt
new file mode 100644
index 0000000000000000000000000000000000000000..d1bf63c2e7cc45391f77b56ecd787e1612179886
--- /dev/null
+++ b/task_generator/prompts_raw/prompt_rag_query_generation_narration.txt
@@ -0,0 +1,25 @@
+You are an expert in generating search queries specifically for **Manim (Community Edition) documentation** (both core Manim and its plugins). Your task is to analyze a storyboard and generate effective queries that will retrieve relevant documentation about narration, text animations, and audio-visual synchronization.
+
+Here is the storyboard:
+
+{storyboard}
+
+Based on this storyboard, generate multiple human-like queries (maximum 10) for retrieving relevant documentation about narration and text animation techniques.
+
+**Specifically, ensure that:**
+1. Queries focus on retrieving information about **text animations** and their properties
+2. Include queries about **timing and synchronization** techniques
+3. If the storyboard suggests using plugin functionality, include specific queries targeting those plugin's narration capabilities
+
+The above storyboard is relevant to these plugins: {relevant_plugins}.
+Note that you MUST NOT use the plugins that are not listed above.
+
+You MUST only output the queries in the following JSON format (with json triple backticks):
+```json
+[
+ {{"type": "manim-core", "query": "content of text animation query"}},
+ {{"type": "", "query": "content of plugin-specific query"}},
+ {{"type": "manim-core", "query": "content of timing synchronization query"}}
+ ...
+]
+```
\ No newline at end of file
diff --git a/task_generator/prompts_raw/prompt_rag_query_generation_storyboard.txt b/task_generator/prompts_raw/prompt_rag_query_generation_storyboard.txt
new file mode 100644
index 0000000000000000000000000000000000000000..9f9375076a5f9294dc48fa9321696318db2f739b
--- /dev/null
+++ b/task_generator/prompts_raw/prompt_rag_query_generation_storyboard.txt
@@ -0,0 +1,28 @@
+You are an expert in generating search queries specifically for **Manim (Community Edition) documentation** (both core Manim and its plugins). Your task is to transform a storyboard plan for a Manim video scene into effective queries that will retrieve relevant information from Manim documentation. The storyboard plan describes the scene's visual elements and narrative flow.
+
+Here is the storyboard plan:
+
+{storyboard}
+
+Based on the storyboard plan, generate multiple human-like queries (maximum 10) for retrieving relevant documentation. Please ensure that the search targets are different so that the RAG can retrieve a diverse set of documents covering various aspects of the implementation.
+
+**Specifically, ensure that:**
+1. At least some queries are focused on retrieving information about **Manim core functionalities**, like general visual elements or animations. Frame these queries using Manim terminology (classes, methods, concepts).
+2. If the storyboard suggests using specific visual effects or complex animations that might be plugin-related, include at least 1 query specifically targeting **plugin documentation**. Make sure to mention the plugin name if known or suspected.
+3. Queries should be general enough to explore different possibilities within Manim and its plugins based on the storyboard's visual and narrative descriptions, but also specific enough to target Manim documentation effectively.
+
+The above storyboard might be relevant to these plugins: {relevant_plugins}.
+Note that you MUST NOT use the plugins that are not listed above.
+
+Output the queries in the following format:
+```json
+[
+ {{"query": "content of query 1", "type": "manim_core/{relevant_plugins}"}},
+ {{"query": "content of query 2", "type": "manim_core/{relevant_plugins}"}},
+ {{"query": "content of query 3", "type": "manim_core/{relevant_plugins}"}},
+ {{"query": "content of query 4", "type": "manim_core/{relevant_plugins}"}},
+ {{"query": "content of query 5", "type": "manim_core/{relevant_plugins}"}},
+ {{"query": "content of query 6", "type": "manim_core/{relevant_plugins}"}},
+ {{"query": "content of query 7", "type": "manim_core/{relevant_plugins}"}},
+]
+```
\ No newline at end of file
diff --git a/task_generator/prompts_raw/prompt_rag_query_generation_technical.txt b/task_generator/prompts_raw/prompt_rag_query_generation_technical.txt
new file mode 100644
index 0000000000000000000000000000000000000000..f793afbf2a6dda1e61af5df43d26c9b8537191e8
--- /dev/null
+++ b/task_generator/prompts_raw/prompt_rag_query_generation_technical.txt
@@ -0,0 +1,25 @@
+You are an expert in generating search queries specifically for **Manim (Community Edition) documentation** (both core Manim and its plugins). Your task is to analyze a storyboard plan and generate effective queries that will retrieve relevant technical documentation about implementation details.
+
+Here is the storyboard plan:
+
+{storyboard}
+
+Based on this storyboard plan, generate multiple human-like queries (maximum 10) for retrieving relevant technical documentation.
+
+**Specifically, ensure that:**
+1. Queries focus on retrieving information about **core Manim functionality** and implementation details
+2. Include queries about **complex animations and effects** described in the storyboard
+3. If the storyboard suggests using plugin functionality, include specific queries targeting those plugin's technical documentation
+
+The above storyboard plan is relevant to these plugins: {relevant_plugins}
+Note that you MUST NOT use the plugins that are not listed above.
+
+You MUST only output the queries in the following JSON format (with json triple backticks):
+```json
+[
+ {{"type": "manim-core", "query": "content of core functionality query"}},
+ {{"type": "", "query": "content of plugin-specific query"}},
+ {{"type": "manim-core", "query": "content of animation technique query"}}
+ ...
+]
+```
\ No newline at end of file
diff --git a/task_generator/prompts_raw/prompt_rag_query_generation_vision_storyboard.txt b/task_generator/prompts_raw/prompt_rag_query_generation_vision_storyboard.txt
new file mode 100644
index 0000000000000000000000000000000000000000..22232cccb467f4b6b1a8c9c282b02c67a76e3cce
--- /dev/null
+++ b/task_generator/prompts_raw/prompt_rag_query_generation_vision_storyboard.txt
@@ -0,0 +1,27 @@
+You are an expert in generating search queries specifically for **Manim (Community Edition) documentation** (both core Manim and its plugins). Your task is to analyze a scene plan for a Manim animation and generate effective queries that will retrieve relevant documentation about visual elements and scene composition.
+
+Here is the scene plan:
+
+{scene_plan}
+
+Based on this scene plan, generate multiple human-like queries (maximum 10) for retrieving relevant documentation about visual elements and scene composition techniques.
+
+**Specifically, ensure that:**
+1. Queries focus on retrieving information about **visual elements** like shapes, objects, and their properties
+2. Include queries about **scene composition techniques** like layout, positioning, and grouping
+3. If the scene plan suggests using plugin functionality, include specific queries targeting those plugin's visual capabilities
+4. Queries should be high-level, aiming to discover what Manim features can be used, rather than focusing on low-level implementation details.
+ - For example, instead of "how to set the color of a circle", ask "what visual properties of shapes can I control in Manim?".
+
+The above scene plan is relevant to these plugins: {relevant_plugins}.
+Note that you MUST NOT use the plugins that are not listed above.
+
+You MUST only output the queries in the following JSON format (with json triple backticks):
+```json
+[
+ {{"type": "manim-core", "query": "content of visual element query"}},
+ {{"type": "", "query": "content of plugin-specific query"}},
+ {{"type": "manim-core", "query": "content of composition technique query"}}
+ ...
+]
+```
\ No newline at end of file
diff --git a/task_generator/prompts_raw/prompt_scene_animation_narration.txt b/task_generator/prompts_raw/prompt_scene_animation_narration.txt
new file mode 100644
index 0000000000000000000000000000000000000000..567b4c406c1675102ba8403fb93841a48ef6286a
--- /dev/null
+++ b/task_generator/prompts_raw/prompt_scene_animation_narration.txt
@@ -0,0 +1,94 @@
+You are an expert in educational video production and Manim animation, skilled in creating engaging and pedagogically effective learning experiences.
+**Reminder:** This animation and narration plan is entirely self-contained; there is no dependency on any previous or subsequent scene implementations. However, the narration should flow smoothly as part of a larger, single video.
+
+Your task is to create a **detailed animation and narration plan for Scene {scene_number}**, ensuring it is not just visually appealing but also serves a clear educational purpose within the overall video topic.
+
+Remember, the narration should not simply describe what's happening visually, but rather **teach a concept step-by-step**, guiding the viewer to a deeper understanding. Animations should be spatially coherent, contribute to a clear visual flow, and strictly respect safe area margins (0.5 units) and minimum spacing (0.3 units). **Consider the scene number {scene_number} and the overall scene context to ensure smooth transitions and a logical flow within the larger video narrative.**
+
+Topic: {topic}
+Description: {description}
+
+Scene Overview:
+{scene_outline}
+
+Scene Vision and Storyboard:
+{scene_vision_storyboard}
+
+Technical Implementation Plan:
+{technical_implementation_plan}
+
+The following manim plugins are relevant to the scene:
+{relevant_plugins}
+
+**Spatial Constraints (Strictly Enforced Throughout Animations):**
+* **Safe area margins:** 0.5 units. *Maintain objects and VGroups within margins.*
+* **Minimum spacing:** 0.3 units. *Ensure minimum spacing between all objects and VGroups.*
+
+**Animation Timing and Pacing Requirements:**
+* Specify `run_time` for all animations.
+* Use `Wait()` for transition buffers, specifying durations and **pedagogical purpose**.
+* Coordinate animation timings with narration cues for synchronized pedagogical presentation.
+
+**Visual Flow and Pedagogical Clarity:**
+* Ensure animations create a clear and logical visual flow, **optimized for learning and concept understanding.**
+* Use animation pacing and transition buffers to visually separate ideas and **enhance pedagogical clarity.**
+* Maintain spatial coherence for predictable and understandable animations, strictly adhering to spatial constraints.
+
+**Diagrams/Sketches (Optional but Highly Recommended for Complex Scenes):**
+* For complex animations, include diagrams/sketches to visualize animation flow and object movements. This aids clarity and reduces errors.
+
+Your plan must demonstrate a strong understanding of pedagogical narration and how animations can be used to effectively teach concepts, while strictly adhering to spatial constraints and timing requirements.
+
+You MUST generate a **detailed and comprehensive** animation and narration plan for **Scene {scene_number}**, in the following format, similar to the example provided (from ```xml to ```):
+
+```xml
+
+
+[ANIMATION_STRATEGY]
+1. **Pedagogical Animation Plan:** Provide a detailed plan for all animations in the scene, explicitly focusing on how each animation contributes to **teaching the core concepts** of this scene.
+ - **Parent VGroup transitions (if applicable):**
+ - If VGroups are used, specify transitions (`Shift`, `Transform`, `FadeIn`, `FadeOut`) with `Animation` type, direction, magnitude, target VGroup, and `run_time`.
+ - **Explain the pedagogical rationale** for each VGroup transition. How does it guide the viewer's attention or contribute to understanding the scene's learning objectives? Ensure spatial coherence and respect for constraints.
+ - **Element animations within VGroups and for individual Mobjects:**
+ - Specify animation types (`Create`, `Write`, `FadeIn`, `Transform`, `Circumscribe`, `AnimationGroup`, `Succession`) for elements.
+ - For each element animation, specify `Animation` type, target object(s), and `run_time`. Detail sequences and timing for `AnimationGroup` or `Succession`.
+ - **Explain the pedagogical purpose** of each element animation. How does it break down complex information, highlight key details, or improve visual clarity for learning? Ensure spatial coherence and minimum spacing.
+ - **Coordinate element animations with VGroup transitions:**
+ - Clearly describe the synchronization between element animations and VGroup transitions (if any).
+ - Specify relative timing and `run_time` to illustrate coordination.
+ - **Explain how this animation sequence and coordination creates a pedagogical flow**, guiding the viewer's eye and attention logically through the learning material.
+
+2. **Scene Flow - Pedagogical Pacing and Clarity:** Detail the overall flow of the scene, emphasizing pedagogical effectiveness.
+ - **Overall animation sequence, spatial progression for learning:**
+ - Describe the complete animation sequence, broken down into pedagogical sub-sections (e.g., "Introducing the Problem", "Step-by-step Solution", "Concept Reinforcement").
+ - Outline the spatial progression of objects and VGroups, focusing on how it supports the **pedagogical narrative** and concept development.
+ - Ensure a clear and logical visual flow optimized for learning, respecting spatial constraints.
+ - **Transition buffers for pedagogical pauses:**
+ - Specify `Wait()` times between animation sections for visual separation and **learner processing time**.
+ - For each `Wait()`, specify duration and **explain the pedagogical reason** for this buffer (e.g., "Allow viewers time to process the formula", "Create a pause for reflection before moving to the next concept").
+ - **Coordinate animation timing with narration for engagement and comprehension:**
+ - Describe how animation timings are coordinated with the narration script to **maximize viewer engagement and comprehension**.
+ - Specify animation cues within the narration script and explain how these cues are synchronized with animations to **reinforce learning points** at the optimal moment.
+
+[NARRATION]
+- **Pedagogical Narration Script:**
+ - Provide the full narration script for Scene {scene_number}.
+ - **Embed precise animation timing cues** within the narration script (as described before).
+ - **The script should be written as if delivered by a knowledgeable and engaging lecturer.** It should:
+ - **Clearly explain concepts step-by-step.**
+ - **Use analogies and real-world examples to enhance understanding.**
+ - **Pose questions to encourage active thinking.**
+ - **Summarize key points and transitions.**
+ - **Be detailed and knowledge-rich, not just visually descriptive.**
+ - **Connect smoothly with the previous and subsequent scenes, acting as a segment within a single, cohesive video.
+ - Avoid repetitive introductions or conclusions.**
+ - Consider using phrases like "Building on what we saw in the previous part..." or "Let's now move on to..." to create a sense of continuity.
+ - Reference the scene number when appropriate (e.g., "Now, let's explore...").
+ - **Crucially, the narration should seamlessly integrate with the animations to create a cohesive and effective learning experience.**
+- **Narration Sync - Pedagogical Alignment:**
+ - Detail the synchronization strategy between narration and animations, emphasizing **pedagogical alignment**.
+ - Explain how narration timing is aligned with animation start/end times to **guide viewer attention to key learning elements precisely when they animate.**
+ - Emphasize how narration cues and animation timings work together to **create a synchronized audiovisual presentation that maximizes learning and retention.**
+
+
+```
diff --git a/task_generator/prompts_raw/prompt_scene_implementation.txt b/task_generator/prompts_raw/prompt_scene_implementation.txt
new file mode 100644
index 0000000000000000000000000000000000000000..9a6dd3aecc827b426c3dcb2dc03ed95243d295a4
--- /dev/null
+++ b/task_generator/prompts_raw/prompt_scene_implementation.txt
@@ -0,0 +1,140 @@
+You are an expert in educational video production and Manim (Community Edition) animation development. Your task is to create a detailed implementation plan for Scene {scene_number}.
+
+
+Topic: {topic}
+Description: {description}
+
+
+
+Scene Overview:
+{scene_outline}
+
+
+
+
+[SCENE_VISION]
+1. **Overall Narrative**:
+ - Describe the overall story or message of the scene. What is the key takeaway for the viewer?
+ - How does this scene fit into the larger narrative of the video?
+ - What is the desired emotional impact on the viewer?
+
+2. **Learning Objectives**:
+ - What specific knowledge or skills should the viewer gain from this scene?
+ - How will the visual elements and animations support these learning objectives?
+ - What are the key concepts that need to be emphasized?
+
+[STORYBOARD]
+1. **Visual Flow**:
+ - Describe the sequence of visual elements and animations in the scene.
+ - Provide a rough sketch or description of the key visual moments.
+ - How will the scene transition between different ideas or concepts?
+ - What is the pacing of the scene? Are there moments of pause or rapid action?
+
+[TECHNICAL_IMPLEMENTATION]
+1. **High-Level Components (VGroups)**:
+ - **Identify the main conceptual sections of the scene.** Think of this like outlining chapters in a story or sections in a presentation.
+ - **Define the purpose of each high-level component.** What should the viewer learn or understand from each section?
+ - **Describe how these components relate to each other and the overall scene flow.** How will you transition between these sections to create a cohesive narrative?
+ - **Provide a brief rationale for your choice of high-level components.** Why did you choose these specific sections?
+
+2. **VGroup Hierarchy**:
+ - **For each high-level component, define a parent VGroup.** This VGroup will act as a container for all elements within that section.
+ - **Break down each parent VGroup into nested VGroups for sub-components as needed.** Think about logical groupings of elements.
+ - **Specify the relative positioning of these VGroups within the scene using `next_to()`, `align_to()`, and `shift()` where possible.** How will the parent VGroups be arranged on the screen relative to each other? (e.g., stacked vertically, side-by-side, etc.) Prioritize relative positioning using the following references:
+ - `ORIGIN`: the center of the scene
+ - scene margins (e.g., corners, edges)
+ - other VGroups as references.
+ - **MUST NOT use absolute coordinates.**
+ - **Define the scale relationships between different levels of the VGroup hierarchy.** Will sub-VGroups inherit scale from parent VGroups? How will scaling be managed to maintain visual consistency?
+ - **Provide a brief rationale for your VGroup hierarchy.** Why did you choose this specific structure?
+
+ For each VGroup level (from high-level down to sub-components):
+ - Name: [Descriptive name for the VGroup, e.g., "TitleSection", "ProblemStatementGroup", "Explanation1Group"]
+ - Purpose: [What is the purpose of this VGroup? What should the viewer learn or understand from this VGroup?]
+ - Contents: [List all child VGroups and individual elements (Text, MathTex, Shapes, etc.) that belong to this VGroup.]
+ - Positioning:
+ * Reference: [Specify what this VGroup is positioned relative to. Do not use absolute coordinates.]
+ * Alignment: [How is it aligned relative to the reference? Use `align_to()` with options like `UP`, `DOWN`, `LEFT`, `RIGHT`, `ORIGIN`, etc.]
+ * Spacing: [Describe any spacing considerations relative to sibling VGroups or elements within the parent. Use `buff` argument in `next_to()` or `arrange()`. Refer to the defined minimum spacing value.]
+ - Scale: [Specify the scale of this VGroup relative to its parent VGroup. Use relative scaling factors (e.g., 1.0 for same scale, 0.8 for smaller).]
+ - Rationale: [Explain the reasoning behind the structure and organization of this VGroup. Why did you group these elements together?]
+
+3. **Element Specification**:
+ For each individual element (Text, MathTex, Shapes, etc.) within a VGroup:
+ - Name: [Descriptive name for the element, e.g., "ProblemTitleText", "Equation1", "HighlightCircle"]
+ - Type: [Manim object type. Examples: Text, MathTex, Circle, Rectangle, Arrow, Line, etc.]
+ - Parent VGroup: [Specify the VGroup this element belongs to. This establishes the hierarchical relationship.]
+ - Positioning:
+ * Reference: [Specify what this element is positioned relative to. Use its parent VGroup, other elements, `ORIGIN`, or scene margins as references. Do not use absolute coordinates.]
+ * Alignment: [How is it aligned within its parent VGroup? Use `align_to()` or `next_to()` with appropriate directions, e.g. `UP`, `DOWN`, `LEFT`, `RIGHT`, `ORIGIN`, `UL`, `UR`, `DL`, `DR`]
+ * Spacing: [If applicable, describe spacing relative to other elements using `buff` in `next_to()`. Refer to the defined minimum spacing value.]
+ - Style Properties:
+ * Color: [Hex code or named color (e.g., "RED", "BLUE"). Use hex codes for specific colors. e.g., #FF0000 for red]
+ * Opacity: [Value between 0 and 1. 1 for fully opaque, 0 for fully transparent.]
+ * Stroke Width: [Specify stroke width using levels: `thin`, `medium`, or `thick`.]
+ * Font: [Font family name, if applicable.]
+ * Font Size: [Specify font size using levels: `heading1`, `heading2`, `heading3`, `heading4`, `heading5`, `heading6`, or `body`. Refer to the defined font size levels.]
+ * Fill Color: [Hex code for fill color, if applicable.]
+ * ... [Include any other relevant style properties]
+ - Z-Index: [Integer value for layering order within the VGroup. Higher values are on top.]
+ - Required Imports: [List specific Manim classes that need to be imported to create this element. e.g., `from manim import Text, Circle`]
+
+[ANIMATION_STRATEGY]
+1. **VGroup Transitions**:
+ - **Define how parent VGroups will transition onto and off of the scene, and between different sections.** Describe the movement patterns for these high-level groups. Examples: 'Slide in from left', 'Fade in and scale up', 'Move to top of screen'.
+ - **Specify the timing and coordination of VGroup transitions.** How long will each transition take? Will transitions overlap or be sequential?
+ - **Describe any transformation sequences applied to VGroups during transitions.** Will VGroups rotate, scale, or change shape during transitions?
+
+2. **Element Animations**:
+ - **Define the animations for individual elements within each VGroup.** What animations will bring each element to life? Examples: 'Write in text', 'Draw a circle', 'Highlight an equation', 'Fade in an image'.
+ - **Group related element animations using Manim's animation grouping features (e.g., `AnimationGroup`, `Succession`).** Explain how these groups will be used to create cohesive animation sequences.
+ - **Coordinate element animations with parent VGroup movements and transitions.** Ensure element animations are synchronized with the overall scene flow.
+ - **Specify the timing of element animations relative to VGroup transitions and other element animations.** Create a timeline or sequence of animations.
+
+3. **Scene Flow**:
+ - **Describe the overall animation sequence for the entire scene.** Outline the order in which VGroups and elements will be animated.
+ - **Specify transition buffers or pauses between major sections of the scene.** How much time will be left between animations for the viewer to process information?
+ - **Consider how the animation timing will coordinate with the narration (if narration is planned).** Animations should complement and reinforce the spoken content.
+
+[NARRATION]
+- **Narration Script:** [Provide the full script for the narration, including timing cues or markers for when specific animations should occur. The script should be clear, detailed, and engaging, and should align with the visual elements and animations.]
+- **Narration Sync:** [Describe how the narration should be synchronized with the animations. Specify how timing cues in the narration script will be used to trigger animations. Are there specific points where the narration and animations should be perfectly synchronized? Explain how you will achieve this synchronization.]
+
+[VIEWER_EXPERIENCE]
+1. **Cognitive Load**:
+ - How will you manage the amount of information presented at any given time?
+ - Are there any complex concepts that need to be broken down into smaller steps?
+ - How will you use visual cues to guide the viewer's attention?
+
+2. **Pacing**:
+ - Is the pacing of the scene appropriate for the content?
+ - Are there moments where the viewer needs time to pause and reflect?
+ - How will you use animation timing to control the pace of the scene?
+
+3. **Accessibility**:
+ - How will you ensure that the scene is accessible to viewers with different needs?
+ - Are there any specific considerations for color contrast or text readability?
+
+[TECHNICAL_CHECKS]
+- **VGroup boundary validation:** Ensure all elements are contained within their intended VGroup boundaries and are not overflowing unexpectedly.
+- **Hierarchy scale consistency:** Verify that scaling is applied consistently throughout the VGroup hierarchy and that text and elements remain readable at all scales.
+- **Animation coordination between levels:** Check that animations at different VGroup levels are coordinated and do not clash or look disjointed.
+- **Performance optimization for nested groups:** Consider the performance implications of deeply nested VGroups and optimize structure and animations for smooth playback.
+- **Text readability:** Ensure all text elements are legible in terms of size, color contrast, and positioning.
+- **Color contrast:** Verify sufficient color contrast between text and background, and between different visual elements for accessibility.
+- **Animation smoothness:** Check for any jerky or abrupt animations and refine timing and easing for smoother transitions.
+
+
+
+Requirements:
+1. All elements must stay within safe area margins
+2. Maintain minimum spacing between objects: [value] (This value is defined in the project settings)
+3. Use relative positioning when possible, leveraging `next_to()`, `align_to()`, and `shift()`. Only reference positions relative to `ORIGIN`, scene margins, or other object reference points. Do not use absolute coordinates.
+4. Include transition buffers between animations
+5. Specify z-index for overlapping elements
+6. All colors must use hex codes or named colors
+7. Define scale relative to base unit
+8. No external dependencies
+9. Currently, there are no images or other assets available locally or remotely for you to use in the scene. Only include elements that can be generated through manim.
+10. **Do not generate any code in this plan, except for illustrative examples where necessary. This plan is for outlining the scene and should not include any python code.**
+11. **The purpose of this plan is to be a detailed guide for a human to implement the scene in manim.**
\ No newline at end of file
diff --git a/task_generator/prompts_raw/prompt_scene_plan.txt b/task_generator/prompts_raw/prompt_scene_plan.txt
new file mode 100644
index 0000000000000000000000000000000000000000..45a36e6b0d3f35636eb7eee44f8eb3932a30c867
--- /dev/null
+++ b/task_generator/prompts_raw/prompt_scene_plan.txt
@@ -0,0 +1,49 @@
+You are an expert in educational video production, instructional design, and {topic}. Please design a high-quality video to provide in-depth explanation on {topic}.
+
+**Video Overview:**
+
+Topic: {topic}
+Description: {description}
+
+**Scene Breakdown:**
+
+Plan individual scenes. For each scene please provide the following:
+
+* **Scene Title:** Short, descriptive title (2-5 words).
+* **Scene Purpose:** Learning objective of this scene. How does it connect to previous scenes?
+* **Scene Description:** Detailed description of scene content.
+* **Scene Layout:** Detailedly describe the spatial layout concept. Consider safe area margins and minimum spacing between objects.
+
+Please generate the scene plan for the video in the following format:
+
+```xml
+
+
+ Scene Title: [Title]
+ Scene Purpose: [Learning objective, connection to previous scene]
+ Scene Description: [Brief content description]
+ Scene Layout: [Spatial layout concept, consider safe area and spacing]
+
+
+
+ ...
+
+...
+
+```
+
+**Spatial Constraints:**
+* **Safe area margins:** 0.5 units on all sides from the scene edges. *All objects must be positioned within these margins.*
+* **Minimum spacing:** 0.3 units between any two Manim objects (measured edge to edge). *Ensure adequate spacing to prevent overlaps and maintain visual clarity.*
+
+Requirements:
+1. Scenes must build progressively, starting from foundational concepts and advancing to more complex ideas to ensure a logical flow of understanding for the viewer. Each scene should naturally follow from the previous one, creating a cohesive learning narrative. Start with simpler scene layouts and progressively increase complexity in later scenes.
+2. The total number of scenes should be between 3 and 7.
+3. Learning objectives should be distributed evenly across the scenes.
+4. The total video duration must be under 15 minutes.
+5. It is essential to use the exact output format, tags, and headers as specified in the prompt.
+6. Maintain consistent formatting throughout the entire scene plan.
+7. **No External Assets:** Do not import any external files (images, audio, video). *Use only Manim built-in elements and procedural generation.
+8. **Focus on in-depth explanation of the theorem. Do not include any promotional elements (like YouTube channel promotion, subscribe messages, or external resources) or quiz sessions. Detailed example questions are acceptable and encouraged.**
+
+Note: High-level plan. Detailed scene specifications will be generated later, ensuring adherence to safe area margins and minimum spacing. The spatial constraints defined above will be strictly enforced in subsequent planning stages.
\ No newline at end of file
diff --git a/task_generator/prompts_raw/prompt_scene_technical_implementation.txt b/task_generator/prompts_raw/prompt_scene_technical_implementation.txt
new file mode 100644
index 0000000000000000000000000000000000000000..5d8aa99e5d7a0e977e3d4e38ab1b5f6e3f704d8f
--- /dev/null
+++ b/task_generator/prompts_raw/prompt_scene_technical_implementation.txt
@@ -0,0 +1,97 @@
+You are an expert in educational video production and Manim (Community Edition), adept at translating pedagogical narration plans into robust and spatially accurate Manim code.
+**Reminder:** This technical implementation plan is fully self-contained. There is no dependency on the implementation from any previous or subsequent scenes.
+
+Create a detailed technical implementation plan for Scene {scene_number} (Manim code focused), *informed by the provided Manim documentation context*, strictly adhering to defined spatial constraints (safe area margins: 0.5 units, minimum spacing: 0.3 units), and **addressing potential text bounding box overflow issues**.
+
+Topic: {topic}
+Description: {description}
+
+Scene Overview:
+{scene_outline}
+
+Scene Vision and Storyboard:
+{scene_vision_storyboard}
+
+The following manim plugins are relevant to the scene:
+{relevant_plugins}
+
+**Spatial Constraints (Strictly Enforced):**
+* **Safe area margins:** 0.5 units on all sides from the scene edges. All objects must be positioned within these margins.
+* **Minimum spacing:** 0.3 units between any two Manim objects (measured edge to edge). This prevents overlaps and maintains visual clarity.
+
+**Positioning Requirements:**
+1. All positioning MUST be relative (`next_to`, `align_to`, `shift`) from ORIGIN, safe margins, or other objects. **No absolute coordinates are allowed.**
+2. Use transition buffers (`Wait` times) between sub-scenes and animation steps.
+
+**Diagrams/Sketches (Highly Recommended):**
+* Include diagrams/sketches (even text-based) for complex layouts to visualize spatial relationships, improve clarity, and reduce spatial errors.
+
+**Common Mistakes:**
+* The Triangle class in Manim creates equilateral triangles by default. To create a right-angled triangle, use the Polygon class instead.
+
+**Manim Plugins:**
+* You may use established, well-documented Manim plugins if they offer significant advantages in terms of code clarity, efficiency, or functionality not readily available in core Manim.
+* **If a plugin is used:**
+ * Clearly state the plugin name and version (if applicable).
+ * Provide a brief justification for using the plugin (e.g., "Using `manim-plugin-name` for its advanced graph layout capabilities").
+ * Ensure all plugin usage adheres to the plugin's documentation.
+ * Include a comment in the plan: `### Plugin: - `.
+
+**Focus:**
+* Creating *pedagogically sound and spatially correct Manim code*.
+* Detailed technical descriptions, referencing Manim documentation.
+* Strict adherence to spatial constraints and relative positioning.
+
+You MUST generate the technical implementation plan for the scene in the following format (from ```xml to ```):
+
+```xml
+
+0. **Dependencies**:
+ - **Manim API Version**: Target the latest stable Manim release, using only documented API elements.
+ - **Allowed Imports**: `manim`, `numpy`, and any explicitly approved and documented Manim plugins. No external assets (e.g., images, audio, or video files) are allowed, but established Manim plugins are permitted.
+
+1. **Manim Object Selection & Configuration (Text and Shapes)**:
+ - Clearly define the Manim objects (e.g., `Tex`, `MathTex`, `Circle`, `Line`, etc.) used to construct the scene. Also include any objects provided by used plugins.
+ - Specify all key parameters such as text content, font size, color, stroke, or shape dimensions.
+ - **Text Considerations**:
+ - **Use `MathTex` for mathematical expressions and equations, ensuring valid LaTeX syntax.** For example: `MathTex("x^2 + y^2 = r^2")`.
+ - **Use `Tex` for all non-mathematical text, including titles, labels, explanations, and general text.** For example: `Tex("This is a circle")`.
+ - **If you need to include regular text *within* a `MathTex` environment (e.g., for explanations alongside a formula), use the `\\text{{}}` command.** For example: `MathTex(r"\\text{{Area of circle}} = \\pi r^2")`.
+ - **Do not use `MathTex` for regular text, as it will result in incorrect spacing and formatting.**
+ - **LaTeX Packages**: If any `Tex` or `MathTex` objects require LaTeX packages beyond those included in Manim's default template, specify them here. For example: "Requires: `\\usepackage{{amssymb}}`". Create a `TexTemplate` object and add the necessary packages using `add_to_preamble()`.
+ - **Font Size Recommendations**:
+ - If there is title text, font size is highly recommended to be 28.
+ - If there are side labels or formulas, font size is highly recommended to be 24.
+ - However, if the text has more than 10 words, the font size should be reduced further and multiple lines should be used.
+ - Confirm all objects begin within the safe area (0.5 units from all edges) and maintain at least 0.3 units spacing to avoid overlaps.
+
+2. **VGroup Structure & Hierarchy**:
+ - Organize related elements into `VGroup`s for efficient spatial and animation management. If a plugin provides a specialized group-like object, consider using it.
+ - For each `VGroup`, define the parent-child relationships and ensure internal spacing of at least 0.3 units.
+ - Clearly document the purpose for each grouping (e.g., "formula_group" for mathematical expressions).
+
+3. **Spatial Positioning Strategy**:
+ - Mandate the exclusive use of relative positioning methods (`next_to`, `align_to`, `shift`), based on ORIGIN, safe margins, or other objects.
+ - For every object, specify:
+ - The reference object (or safe edge) used for positioning.
+ - The specific method (and direction/aligned edge) along with a `buff` value (minimum 0.3 units).
+ - Outline the layout in sequential stages, inserting visual checkpoints to verify that every element continues to respect safe margins and spacing.
+ - Highlight measures to safeguard text bounding boxes, especially for multi-line text.
+ - Reference the font size recommendations under "Text Considerations" to ensure appropriate sizing and prevent overflow.
+
+4. **Animation Methods & Object Lifecycle Management**:
+ - Define clear animation sequences using documented methods such as `Create`, `Write`, `FadeIn`, `Transform`, and corresponding removal animations (`FadeOut`, `Uncreate`). Include animation methods from plugins if they are used.
+ - For each animation, specify parameters like `run_time`, `lag_ratio`, and the use of `Wait()` for transition buffers.
+ - Ensure every object's appearance and removal is managed to prevent clutter and maintain scene clarity.
+
+5. **Code Structure & Reusability**:
+ - Propose modular functions for creating and animating common objects to promote code reusability.
+ - Organize the overall code structure into logical sections: dependencies, object definitions, individual layout stages, and the main `construct` method.
+ - Include inline comments to document the rationale for configuration choices, referencing the Manim Documentation *and the plugin documentation where applicable*.
+
+***Mandatory Safety Checks***:
+ - **Safe Area Enforcement**: All objects, including text bounding boxes, must remain within 0.5 unit margins.
+ - **Minimum Spacing Validation**: Confirm a minimum of 0.3 units spacing between every pair of objects.
+ - **Transition Buffers**: Use explicit `Wait()` calls to separate animation steps and sub-scenes.
+
+```
diff --git a/task_generator/prompts_raw/prompt_scene_vision_storyboard.txt b/task_generator/prompts_raw/prompt_scene_vision_storyboard.txt
new file mode 100644
index 0000000000000000000000000000000000000000..f35d1e324bce182e495b53a0c98fe184d74d15a0
--- /dev/null
+++ b/task_generator/prompts_raw/prompt_scene_vision_storyboard.txt
@@ -0,0 +1,71 @@
+You are an expert in educational video production and Manim animation.
+**Reminder:** Each scene's vision and storyboard plan is entirely self-contained. There is no dependency on any implementation from previous or subsequent scenes. However, the narration will treat all scenes as part of a single, continuous video.
+
+Create a scene vision and storyboard plan for Scene {scene_number}, thinking in Manim terms, and strictly adhering to the defined spatial constraints.
+
+Topic: {topic}
+Description: {description}
+
+Scene Overview:
+{scene_outline}
+
+The following manim plugins are relevant to the scene:
+{relevant_plugins}
+
+**Spatial Constraints (Strictly Enforced):**
+* **Safe area margins:** 0.5 units on all sides from the scene edges. *All objects must be positioned within these margins.*
+* **Minimum spacing:** 0.3 units between any two Manim objects (measured edge to edge). *Ensure a minimum spacing of 0.3 units to prevent overlaps and maintain visual clarity. This spacing must be maintained between all objects in the scene, including text, shapes, and graphs.*
+
+**Positioning Requirements:**
+1. Safe area margins (0.5 units).
+2. Minimum spacing between objects (0.3 units).
+3. Relative positioning (`next_to`, `align_to`, `shift`) from `ORIGIN`, margins, or object references. **No absolute coordinates are allowed.** All positioning MUST be relative and clearly specified using reference points and relative positioning methods.
+4. Transition buffers (`Wait` times) between sub-scenes and animation steps for visual clarity and pacing.
+
+**Diagrams/Sketches (Optional but Recommended for Complex Scenes):**
+* For complex scenes, consider including a simple diagram or sketch (even text-based) of the intended layout to visually clarify spatial relationships and ensure adherence to spacing and margin constraints.
+
+**Focus:**
+* Focus on clear visual communication of the scene's learning objective through effective use of Manim objects and animations, while strictly adhering to the defined spatial constraints.
+* Provide detailed visual descriptions in Manim terms to guide human implementation.
+* Prioritize explanation and visualization of the theorem. Do not include any promotional elements or quiz sessions.
+* Minimize text usage - rely primarily on visual elements, mathematical notation, and animations to convey concepts. Use text sparingly and only when necessary for clarity.
+
+**Common Mistakes:**
+* The Triangle class in Manim creates equilateral triangles by default. To create a right-angled triangle, use the Polygon class instead.
+
+**Manim Plugins:**
+* Consider using established Manim plugins if they significantly simplify the implementation or offer visual elements not readily available in core Manim. If a plugin is used, clearly indicate this in the storyboard with a note like "**Plugin Suggestion:** Consider using the `manim-plugin-name` plugin for [brief explanation of benefit]."
+
+You MUST generate the scene vision and storyboard plan for the scene in the following format (from ```xml to ```):
+
+```xml
+
+[SCENE_VISION]
+1. **Scene Overview**:
+ - Scene story, key takeaway, video role. *Consider how this scene fits within the overall video narrative.*
+ - **Visual learning objectives for viewers:** Think about *specific Manim object types* that best represent the learning objective. Example: "Visualize roots as `Dot` objects on an `Axes` graph." Be specific about Manim object classes (e.g., `MathTex`, `Shapes`, `Graphs`, `Axes`, `VGroup`). If a plugin provides a relevant object type, mention it (e.g., "Visualize X using `PluginObject` from `manim-plugin-name`").
+ - How Manim visuals & animations support learning? Consider `MathTex`, `Shapes`, `Graphs`, `Axes`, `VGroup`. Focus on spatial arrangement and clarity, ensuring adherence to safe area margins and minimum spacing (0.3 units). Consider using `VGroup` to group related formula components for easier animation and spatial control. Example: "Use `VGroup` to group related formula components for easier animation and spatial control, ensuring a minimum spacing of 0.3 units between VGroup and other scene elements." If a plugin offers a more efficient way to achieve a visual effect, mention it.
+ - Key concepts to emphasize visually using visual hierarchy and spatial arrangement in Manim, while respecting safe area margins and minimum spacing (0.3 units). **Use `MathTex` for mathematical expressions and equations. Use `Tex` for general text, titles, labels, and any non-mathematical text. When mixing text with mathematical symbols in `MathTex`, use the `\\text{{}}` command (e.g., `MathTex(r"\\text{{Area}} = \\pi r^2")`)**
+
+[STORYBOARD]
+1. **Visual Flow & Pacing (Manim Animation Sequence)**:
+ - Describe the sequence of Manim visuals and animations (`Text`, `Circle`, `Arrow`, `Create`, `FadeIn`, `Transform`, etc.). Be specific about animation types and their parameters (e.g., `run_time`). If a plugin provides a specific animation type, mention it (e.g., "Use `PluginAnimation` from `manim-plugin-name`").
+ - Key visual moments: composition and arrangement of Manim elements, ensuring all elements are within safe area margins and maintain a minimum 0.3 unit spacing. Example: "`MathTex` formula center (`.move_to(ORIGIN)`) with `Write` animation, ensuring 0.3 unit spacing from scene edges and other elements."
+ - Visual transitions between ideas using Manim animations (`Transform`, `Shift`, `FadeOutAndShift`, etc.). Specify transition animations and their timings.
+ - Scene pacing (pauses, action) and Manim animation timing's role. Use `Wait()` for transition buffers and visual clarity.
+ - **Sub-scene Breakdown**: Divide the scene into logical sub-scenes, each focusing on a specific step in the explanation or visualization.
+ - For each sub-scene, start with a **Visual Element**: The primary visual component that drives the explanation (e.g., mathematical notation, diagram, graph). If this element comes from a plugin, clearly state this (e.g., "Visual Element: `PluginObject` from `manim-plugin-name`").
+ - Detail the **Animation Sequence**: Describe step-by-step the Manim animations and visual elements for each sub-scene. Be specific about:
+ - **Text Usage Guidelines:**
+ - **Use `MathTex` *only* for mathematical expressions and equations.**
+ - **Use `Tex` for all other text, including labels, explanations, and titles.**
+ - **When mixing text with mathematical symbols in `MathTex`, wrap the text portions in `\\text{{}}`. Example: `MathTex(r"\\text{{Area of circle}} = \\pi r^2")`.**
+ - Manim object classes (`MathTex`, `Circle`, `Arrow`, `Axes`, `Plot`, `Line`, `VGroup`, etc.), prioritizing mathematical notation and visual elements over text. Include plugin object classes where appropriate.
+ - Animation types (`Create`, `Write`, `FadeIn`, `Transform`, `FadeOut`, `Circumscribe`, `FocusOn`, etc.) and their parameters (e.g., `run_time`). Include plugin animation types where appropriate.
+ - Positioning of objects using relative positioning methods (`.next_to()`, `.align_to()`, `.shift()`, `.to_corner()`, `.move_to(ORIGIN)`, etc.) and references to other objects or scene elements. **No absolute coordinates allowed.**
+ - Color and style specifications (e.g., `color=BLUE`, `stroke_width=2`, `dashed=True`).
+ - Explicitly mention safe area margins and minimum spacing (0.3 units) for all objects within each sub-scene.
+
+
+```
\ No newline at end of file
diff --git a/task_generator/prompts_raw/prompt_teaching_framework.txt b/task_generator/prompts_raw/prompt_teaching_framework.txt
new file mode 100644
index 0000000000000000000000000000000000000000..7766c61523d6474f860f79618de49409bfbfb967
--- /dev/null
+++ b/task_generator/prompts_raw/prompt_teaching_framework.txt
@@ -0,0 +1,180 @@
+# Comprehensive Educational Video Content Framework
+
+## 1. Pre-Production Planning
+
+### A. Learning Objectives
+- **Knowledge Level (Remember & Understand)**
+ Define specific, measurable learning outcomes that can be clearly assessed and evaluated. These outcomes should be concrete and observable, allowing instructors to verify that learning has occurred. Each outcome should be written using precise language that leaves no ambiguity about what constitutes success. For example, \"After watching this video, learners will be able to define and explain the concept of variables in programming\" provides a clear benchmark for assessment.
+
+ Action verbs are essential tools for crafting effective learning objectives. Choose verbs like define, list, describe, explain, and identify that clearly indicate the expected cognitive processes. These verbs should align with Bloom's Taxonomy to ensure appropriate cognitive engagement. When applicable, ensure all objectives align with relevant curriculum standards to maintain educational consistency and meet institutional requirements.
+
+- **Comprehension Level (Analyze & Evaluate)**
+ Develop objectives that emphasize deeper understanding and connections between concepts. These objectives should go beyond simple recall to require analysis and evaluation of the material. Students should be able to make meaningful connections between different aspects of the content and explain their relationships. For example, \"Learners will be able to compare different data types and explain when to use each\" demonstrates this deeper level of understanding.
+
+ Critical thinking elements should be deliberately incorporated into each objective. Create scenarios that challenge students to apply their knowledge in new contexts. These scenarios should require careful analysis and reasoned decision-making to solve problems effectively. Design learning experiences that encourage students to question assumptions and develop analytical skills.
+
+- **Application Level (Apply & Create)**
+ Develop practical skills that directly translate to real-world applications and scenarios. These objectives should focus on hands-on experience and tangible outcomes that demonstrate mastery. For example, \"Learners will be able to write a basic program using variables and proper naming conventions\" provides a clear, actionable goal that can be demonstrated through practical work.
+
+ Include hands-on exercises that allow students to practice and refine their skills in a supported environment. These exercises should gradually increase in complexity to build confidence and competence. Provide real-world context by incorporating authentic scenarios and problems that students might encounter in their future careers or daily lives. This connection to reality helps maintain engagement and demonstrates the immediate value of the learning.
+
+- **Target Audience Analysis**
+ Conduct thorough demographic research to understand your learners' backgrounds, ages, and educational levels. This analysis should include assessment of prior knowledge and experience with the subject matter. Consider the technical capabilities of your audience, including their access to necessary tools and technologies.
+
+ Evaluate different learning preferences and styles within your target audience. This understanding helps in designing varied content that appeals to visual, auditory, and kinesthetic learners. Consider cultural and linguistic factors that might impact learning effectiveness. Create content that is inclusive and accessible to learners from diverse backgrounds. Account for varying levels of technical proficiency and ensure your content can be accessed across different devices and platforms.
+
+### B. Content Structure
+
+- **Hook (5-10% of duration)**
+ Begin each video with a compelling problem or scenario that immediately captures attention and creates interest. This hook should be relevant to the content while being unexpected or intriguing enough to maintain viewer engagement. Use surprising facts or statistics that challenge common assumptions or demonstrate the importance of the topic.
+
+ Share relevant real-world applications that demonstrate immediate value to the learner. For example, \"What if you could automate your daily tasks with just a few lines of code?\" creates immediate interest by connecting to practical benefits. The hook should create an emotional connection and generate curiosity about the upcoming content. Consider using storytelling elements or real-world problems that your audience can relate to.
+
+- **Context (10-15%)**
+ Provide clear explanations of how the content relates to real-world situations and problems. This context should help learners understand why the material is relevant to their lives or career goals. Make explicit connections to previous knowledge and experiences that learners can build upon.
+
+ Address the fundamental question of \"Why should I learn this?\" by demonstrating practical applications and benefits. This explanation should be concrete and specific to your audience's needs and interests. Set clear expectations for learning outcomes so students understand what they will gain from the content. Provide a roadmap for the learning journey ahead, including how this content connects to future topics and skills.
+
+- **Core Content (60-70%)**
+ Organize material in a logical progression that builds from fundamental concepts to more complex applications. This progression should be carefully planned to avoid overwhelming learners while maintaining engagement. Include multiple examples that demonstrate concepts from different angles and perspectives.
+
+ Use varied teaching methods to accommodate different learning styles and maintain interest. These methods might include demonstrations, animations, code examples, and interactive elements. Implement frequent knowledge checks throughout the content to ensure understanding and maintain engagement. Break complex topics into manageable chunks that can be easily processed and remembered.
+
+- **Practice/Application (10-15%)**
+ Create guided practice opportunities that allow learners to apply new knowledge in a supported environment. These practice sessions should include clear instructions and immediate feedback mechanisms. Design interactive elements that engage learners and require active participation rather than passive viewing.
+
+ Develop problem-solving scenarios that challenge learners to apply concepts in realistic situations. These scenarios should gradually increase in complexity as learners gain confidence. Include opportunities for peer learning and collaboration when possible. Provide scaffolded support that can be gradually removed as learners become more proficient.
+
+- **Summary (5-10%)**
+ Conclude each video with a comprehensive recap of key points and main takeaways. This summary should reinforce the most important concepts and their practical applications. Preview upcoming topics to create anticipation and show how current learning connects to future content.
+
+ Provide specific action items that learners can implement immediately to reinforce their learning. These should be concrete, achievable tasks that build confidence and competence. Share additional resources for further learning, including reference materials, practice exercises, and advanced topics. Create clear connections between the current content and future learning objectives.
+
+## 2. Instructional Design Elements
+
+### A. Cognitive Load Management
+
+- **Chunking Strategies**
+ Break complex content into manageable segments of 3-5 minutes each. These chunks should focus on single concepts or closely related ideas that form a coherent unit. Use clear transitions between segments to maintain flow while allowing for cognitive processing.
+
+ Implement progressive complexity by building from basic concepts to more advanced applications. This progression should be carefully planned to avoid overwhelming learners. Include strategic pauses and processing time between segments to allow for reflection and integration of new information. Use visual and verbal cues to signal transitions between different concepts or levels of complexity.
+
+- **Visual Organization**
+ Develop a consistent visual hierarchy that guides learners through the content effectively. This hierarchy should use size, color, and placement to indicate the relative importance of different elements. Implement clean, uncluttered designs that minimize distractions and focus attention on key concepts.
+
+ Apply color coding consistently to help learners identify and remember related concepts. This coding should be intentional and meaningful, not merely decorative. Use white space effectively to create visual breathing room and help separate different concepts. Ensure that visual elements support rather than compete with the learning objectives.
+
+- **Information Processing**
+ Carefully limit the introduction of new concepts to 5-7 per video to prevent cognitive overload. This limitation helps ensure that learners can effectively process and retain the information presented. Develop and use mnemonics and memory aids that help learners organize and remember key concepts.
+
+ Provide visual anchors that learners can reference throughout the content. These anchors should help maintain context and show relationships between concepts. Include strategic review points that reinforce previous learning before introducing new material. Create clear connections between new information and existing knowledge to facilitate better retention.
+
+### B. Engagement Techniques
+
+- **Storytelling Elements**
+ Develop a clear narrative flow that carries learners through the content naturally. This narrative should have a beginning, middle, and end that maintains interest and supports learning objectives. Use character-driven examples that learners can relate to and remember.
+
+ Include elements of conflict and resolution to create tension and maintain engagement. These elements should be relevant to the learning objectives and help illustrate key concepts. Maintain an emotional connection through relatable scenarios and authentic problems. Create story arcs that span multiple videos or modules to maintain long-term engagement.
+
+- **Visual Support**
+ Create relevant graphics and animations that enhance understanding of key concepts. These visual elements should be purposeful and directly support learning objectives, not merely decorative. Implement a consistent visual style across all content to maintain professionalism and reduce cognitive load.
+
+ Develop clear infographics that break down complex concepts into understandable components. These should use visual hierarchy and design principles effectively. Use motion and animation thoughtfully to direct attention to important elements and demonstrate processes. Ensure all visual elements are accessible and effectively communicate their intended message.
+
+- **Interactive Components**
+ Design and embed quiz questions that check understanding at key points in the content. These questions should be strategically placed to maintain engagement and reinforce learning. Include deliberate pause points that encourage reflection and active processing of information.
+
+ Create coding challenges or practical exercises that allow immediate application of concepts. These should be scaffolded appropriately for the learner's skill level. Provide multiple opportunities for feedback, both automated and instructor-guided when possible. Design interactive elements that encourage experimentation and learning from mistakes.
+
+## 3. Content Delivery Framework
+
+### A. Teaching Sequence
+
+1. **Activate**
+ Begin each learning session by connecting to familiar concepts that students already understand. This activation of prior knowledge creates a foundation for new learning and helps students feel confident. Use carefully chosen analogies and metaphors that bridge the gap between known and new concepts. These comparisons should be relevant to your audience's experience and background.
+
+ Create explicit connections to previous learning modules or related concepts. These connections help students build a coherent mental model of the subject matter. Assess prior knowledge through quick activities or questions that reveal students' current understanding. Use this assessment to adjust your teaching approach and address any misconceptions early in the lesson.
+
+2. **Present**
+ Deliver clear, structured explanations of new concepts that build upon activated knowledge. These explanations should use precise language while remaining accessible to your target audience. Employ multiple representation methods, including verbal explanations, visual diagrams, and interactive demonstrations. This variety helps accommodate different learning styles and reinforces understanding.
+
+ Provide step-by-step demonstrations that break complex processes into manageable parts. Each step should be clearly explained and connected to the overall objective. Include real-world examples that illustrate practical applications of the concepts. These examples should be relevant to your audience's interests and career goals.
+
+3. **Guide**
+ Develop worked examples that demonstrate expert problem-solving processes and thinking strategies. These examples should include explicit explanations of decision-making and common pitfalls to avoid. Share expert thinking processes by \"thinking aloud\" through problem-solving steps. This transparency helps students understand the metacognitive aspects of learning.
+
+ Create scaffolded learning experiences that gradually reduce support as students gain confidence. Begin with highly structured guidance and progressively move toward independent work. Address common misconceptions and errors proactively, explaining why they occur and how to avoid them. Provide clear strategies for troubleshooting and problem-solving.
+
+4. **Practice**
+ Design guided exercises that allow students to apply new knowledge with appropriate support. These exercises should be carefully sequenced to build confidence and competence gradually. Include opportunities for independent practice that reinforce learning and build autonomy. Ensure these practice sessions are aligned with learning objectives and provide clear success criteria.
+
+ Create peer learning opportunities that allow students to learn from and teach others. These interactions can reinforce understanding and develop communication skills. Implement immediate feedback mechanisms that help students understand their progress and areas for improvement. This feedback should be specific, constructive, and actionable.
+
+5. **Apply**
+ Develop real-world projects that require students to integrate and apply their learning in authentic contexts. These projects should be challenging but achievable, with clear connections to practical applications. Create case studies that illustrate complex scenarios and require critical thinking and problem-solving skills. These studies should reflect realistic situations students might encounter in their careers.
+
+ Design problem-solving scenarios that encourage creative application of knowledge and skills. These scenarios should have multiple possible solutions to encourage innovative thinking. Provide opportunities for creative applications that allow students to extend their learning in personally meaningful ways. Support experimentation and risk-taking in a safe learning environment.
+
+### B. Presentation Techniques
+
+- **Transitions**
+ Implement clear verbal cues that signal shifts between concepts or activities. These cues help students maintain orientation and prepare for new information. Design visual transition elements that support cognitive processing and maintain engagement. These elements should be consistent throughout your content to establish familiar patterns.
+
+ Create concept maps that show relationships between different topics and ideas. These maps help students understand how current learning connects to broader concepts. Use progress indicators that help students track their advancement through the material. These indicators should provide a sense of accomplishment and motivation.
+
+- **Multiple Representations**
+ Combine text and graphics effectively to convey information through multiple channels. This combination should be purposeful and coordinated to enhance understanding. Integrate audio and visual elements that complement each other and reinforce key concepts. Ensure these elements work together without creating cognitive overload.
+
+ Develop interactive elements that encourage active engagement with the content. These elements should provide immediate feedback and support learning objectives. Include physical demonstrations when appropriate to illustrate concepts in tangible ways. These demonstrations should be clear, visible, and directly relevant to learning goals.
+
+## 4. Assessment Integration
+
+### A. Knowledge Verification
+- **Formative Assessment**
+ Implement regular quick checks for understanding throughout the learning process. These checks should be low-stakes and provide immediate feedback to both learner and instructor. Design self-assessment prompts that encourage students to reflect on their own learning progress. These prompts should help students develop metacognitive skills and self-awareness.
+
+ Create opportunities for peer discussion and feedback that deepen understanding through explanation and debate. These discussions should be structured to ensure productive exchanges and learning outcomes. Develop reflection questions that help students connect new learning to existing knowledge and future applications. These questions should promote deep thinking and personal connection to the material.
+
+- **Summative Assessment**
+ Design project-based assessments that evaluate comprehensive understanding and practical application. These projects should integrate multiple concepts and skills learned throughout the course. Guide students in developing portfolios that demonstrate their learning journey and achievements. These portfolios should include examples of both process and product.
+
+ Create opportunities for skill demonstration that allow students to show mastery in authentic contexts. These demonstrations should reflect real-world applications and standards. Develop knowledge application assessments that require students to transfer learning to new situations. These assessments should evaluate both understanding and adaptability.
+
+### B. Learning Reinforcement
+- **Review Strategies**
+ Implement spaced repetition techniques that optimize long-term retention of information. This approach should strategically revisit concepts at increasing intervals. Create concept mapping exercises that help students visualize and understand relationships between ideas. These maps should become increasingly complex as understanding develops.
+
+ Guide students in knowledge synthesis activities that combine multiple concepts into coherent understanding. These activities should help students see the bigger picture and make meaningful connections. Design application scenarios that require students to apply knowledge in new and challenging contexts. These scenarios should build confidence and demonstrate practical relevance.
+
+## 5. Technical Considerations
+
+### A. Video Production Elements
+- **Duration Guidelines**
+ Optimize video length to maintain engagement while effectively covering necessary content. The ideal duration of 6-12 minutes balances attention span with comprehensive coverage. Implement concept-based segmentation that breaks longer topics into digestible chunks. This segmentation should follow natural breaking points in the material.
+
+ Consider attention span patterns when planning content structure and pacing. Include variety and interaction to maintain engagement throughout longer sessions. Adapt content length to platform-specific requirements and viewing habits. Consider mobile viewing habits and platform limitations in your planning.
+
+- **Quality Standards**
+ Ensure professional audio quality through proper equipment and recording techniques. This includes clear voice recording, minimal background noise, and appropriate volume levels. Maintain consistent lighting that enhances visibility and reduces viewer fatigue. Pay attention to both subject lighting and screen content visibility.
+
+ Create clear visual presentations that effectively communicate key concepts. This includes appropriate font sizes, color contrast, and visual hierarchy. Maintain appropriate pacing that allows for processing time while maintaining engagement. Consider your audience's needs and learning objectives when determining pace.
+
+### B. Accessibility Features
+- **Universal Design**
+ Create content that accommodates multiple learning modalities and preferences. This includes providing information through visual, auditory, and interactive channels. Ensure screen reader compatibility by following accessibility best practices and standards. This includes proper heading structure and alt text for images.
+
+ Implement appropriate color contrast considerations for all visual elements. This ensures content is accessible to viewers with various visual abilities. Provide alternative text descriptions for all important images and graphics. These descriptions should convey the same information as the visual elements.
+
+## 6. Follow-up Resources
+
+### A. Supporting Materials
+- **Resource Types**
+ Develop comprehensive practice exercises that reinforce learning and build confidence. These exercises should range from basic to advanced, accommodating different skill levels. Create well-documented code samples that demonstrate best practices and common patterns. These samples should include comments explaining key concepts and decisions.
+
+ Compile detailed reference guides that support independent learning and problem-solving. These guides should be easily searchable and regularly updated. Design cheat sheets that provide quick access to essential information and common procedures. These should be concise while including all crucial information.
+
+### B. Implementation Guide
+- **Learning Pathways**
+ Create clear prerequisite maps that show relationships between different topics and skills. This mapping helps students understand learning dependencies and plan their progress. Provide advanced topic suggestions that help motivated learners extend their knowledge. These suggestions should include resources and guidance for self-directed learning.
+
+ Develop skill progression guides that show clear paths from beginner to advanced levels. These guides should include milestones and checkpoints for measuring progress. Suggest project ideas that allow practical application of learned skills. These projects should be scalable to different skill levels and interests.
\ No newline at end of file
diff --git a/task_generator/prompts_raw/prompt_visual_fix_error.txt b/task_generator/prompts_raw/prompt_visual_fix_error.txt
new file mode 100644
index 0000000000000000000000000000000000000000..19024df9dca606f1f0b6fad2693cb3c7523a64c8
--- /dev/null
+++ b/task_generator/prompts_raw/prompt_visual_fix_error.txt
@@ -0,0 +1,20 @@
+You are an expert in Manim animations. Your task is to ensure that the rendered animation frame (image) aligns with the intended teaching content based on the provided implementation plan.
+
+Instructions:
+Evaluate whether the object coordinates and positions in the image match the described plan and educational purpose.
+The implementation plan serves as a reference, but your primary goal is to verify that the rendered animation frame supports effective teaching.
+For example:
+* If the object is supposed to be at the top of the screen, but it is at the bottom, you need to adjust the position.
+* If the object is supposed to be at the left side but it is too far to the left, you need to adjust the position.
+* If the two objects are not supposed to be overlapped but it is overlapped, you need to adjust the positions.
+
+If adjustments are needed, provide the complete code of the adjusted version.
+If the current code is correct, return it as is.
+
+Manim Implementation Plan:
+{implementation}
+
+Generated Code:
+{generated_code}
+
+Return the complete code of the adjusted version if the code needs to be updated. If the code is correct, only return "" as output.
diff --git a/task_generator/prompts_raw/prompt_visual_self_reflection.txt b/task_generator/prompts_raw/prompt_visual_self_reflection.txt
new file mode 100644
index 0000000000000000000000000000000000000000..3832b3b452da79696b6dc2690af646f8104dc189
--- /dev/null
+++ b/task_generator/prompts_raw/prompt_visual_self_reflection.txt
@@ -0,0 +1,47 @@
+You are an expert in Manim animations and educational video quality assessment. Your task is to analyze a rendered Manim video and its corresponding audio narration to identify areas for visual and auditory improvement, ensuring alignment with the provided implementation plan and enhancing the video's teaching effectiveness.
+
+Please analyze the provided Manim video and listen to the accompanying audio narration. Conduct a thorough self-reflection focusing on the following aspects:
+
+**1. Visual Presentation and Clarity (Automated VLM Analysis & Expert Human-like Judgment):**
+
+* **Object Overlap:** Does the video exhibit any visual elements (text, shapes, equations, etc.) overlapping in a way that obscures information or makes the animation difficult to understand? If possible, Detect regions of significant overlap and highlight them in your reflection.
+* **Out-of-Bounds Objects:** Are any objects positioned partially or entirely outside of the visible frame of the video? Identify and report objects that appear to be clipped or outside the frame boundaries.
+* **Incorrect Object Positioning:** Based on your understanding of good visual design and the scene's educational purpose, are objects placed in positions that are illogical, distracting, or misaligned with their intended locations or relationships to other elements as described in the implementation plan? Consider:
+ * **Logical Flow:** Does the spatial arrangement support the intended visual flow and narrative progression of the scene?
+ * **Alignment and Balance:** Is the scene visually balanced? Are elements aligned in a way that is aesthetically pleasing and contributes to clarity, or does the layout appear haphazard or unbalanced?
+ * **Proximity and Grouping:** Are related elements positioned close enough to be visually grouped, and are unrelated elements sufficiently separated to avoid visual clutter?
+* **General Visual Clarity & Effectiveness:** Consider broader aspects of visual communication. Are there any other issues that detract from the video's clarity, impact, or overall effectiveness? This could include:
+ * **Visual Clutter:** Is the scene too busy or visually overwhelming at any point? Are there too many elements on screen simultaneously?
+ * **Poor Spacing/Layout:** Is the spacing between elements inconsistent or inefficient, making the scene feel cramped or unbalanced? Are margins and padding used effectively?
+ * **Ineffective Use of Color:** Are color choices distracting, clashing, or not contributing to the animation's message? Are colors used consistently and purposefully to highlight key information?
+ * **Pacing Issues (Visual):** Is the visual animation too fast or too slow in certain sections, hindering comprehension? Are visual transitions smooth and well-timed?
+ * **Animation Clarity:** Are the animations themselves clear and helpful in conveying the intended information? Do animations effectively guide the viewer's eye and focus attention?
+
+**2. Narration Quality:**
+
+* **Narration Clarity and Pacing:** Is the narration clear, concise, and easy to understand? Is the pacing of the narration appropriate for the visual content and the target audience? Does the narration effectively support the visual explanations?
+* **Narration Sync with Visuals:** Does the narration effectively synchronize with the on-screen visuals? Use VLM to analyze the video and identify instances where the narration is misaligned with the animations or visual elements it is describing. Report specific timings of misalignment.
+
+**3. Alignment with Implementation Plan:**
+
+* **Visual Fidelity:** Does the rendered video accurately reflect the visual elements and spatial arrangements described in the provided Manim Implementation Plan? Identify any deviations.
+* **Animation Fidelity:** Do the animations in the video match the animation methods and sequences outlined in the Implementation Plan? Report any discrepancies.
+
+Manim Implementation Plan:
+{implementation}
+
+Generated Code:
+{generated_code}
+
+Output Format 1:
+If any issues are identified in visual presentation, audio quality, narration, or plan alignment, please provide a detailed reflection on the issues and how to improve the video's visual and auditory quality, narration effectiveness, and code correctness. Then, you must return the updated Python code that directly addresses these issues. The code must be complete and executable.
+
+
+[Detailed reflection on visual, auditory, narration, and plan alignment issues and improvement suggestions. Include specific timings for narration/visual sync issues and descriptions of object overlap/out-of-bounds problems if detected by VLM. Be specific about code changes needed for improvement.]
+
+
+[Improved Python Code - Complete and Executable - Directly Addressing Reflection Points]
+
+
+Output Format 2:
+If no issues are found and the video and audio are deemed high quality, visually clear, narratively effective, and fully aligned with the implementation plan, please explicitly only return "" as output.
\ No newline at end of file