|
--- |
|
title: "Scriptura" |
|
short_description: "MultiAgent System for Screenplay Creation and Editing" |
|
emoji: 🎞️ |
|
colorFrom: blue |
|
colorTo: green |
|
sdk: gradio |
|
sdk_version: 5.32.1 |
|
app_file: app.py |
|
pinned: true |
|
license: mit |
|
tag: agent-demo-track |
|
--- |
|
|
|
# Scriptura: A MultiAgent System for Screenplay Creation and Editing |
|
|
|
The explanation video is available [here](https://www.youtube.com/watch?v=I0201ruB1Uo) |
|
|
|
The screenplay used in the video as sample is available [here](https://www.studiobinder.com/blog/best-free-movie-scripts-online/) |
|
|
|
## Introduction |
|
|
|
**Scriptura** is a multi-agent AI framework based on HF-SmolAgents that streamlines the creation of screenplays, storyboards, and soundtracks by automating the stages of analysis, summarization, and multimodal enrichment—freeing authors to focus on pure creativity. |
|
|
|
At its heart: |
|
|
|
* Qwen3-32B serves as the primary orchestrating agent, coordinating workflows and managing high-level reasoning across the system. |
|
* Gemma-3-27B-IT acts as a specialized assistant for multimodal tasks, supporting both text and audio inputs to refine narrative elements and prepare them for downstream generation. |
|
|
|
For media generation, Scriptura integrates: |
|
|
|
* MusicGen models (per the AudioCraft MusicGen specification), deployed via Hugging Face Spaces, enabling the agent to produce original soundtracks and sound effects from text prompts or combined text + audio samples. |
|
* FLUX (black-forest-labs/FLUX.1-dev) for on-the-fly image creation, ideal for storyboards, concept art, and visual references that seamlessly tie into the narrative flow. |
|
|
|
Optionally, Scriptura can query external sources (e.g., via a DuckDuckGo API integration) to pull in reference scripts, sound samples, or research materials, ensuring that every draft is not only creatively rich but also contextually informed. |
|
|
|
--- |
|
|
|
## Agent Capabilities |
|
|
|
Scriptura provides a rich set of agents and tools to cover the full screenplay production and enrichment pipeline: |
|
|
|
- **Text Analysis & Summarization** |
|
- Automatically extracts key themes, character arcs, and plot points |
|
- Segments and summarizes scenes for rapid iteration |
|
|
|
- **Multimodal Ingestion** |
|
- Supports PDF, DOCX, ODT, TXT and image uploads |
|
- Transcribes audio files using OpenAI Whisper |
|
|
|
- **Image Generation** |
|
- On-the-fly storyboard and concept art creation via FLUX (black-forest-labs/FLUX.1-dev) |
|
|
|
- **Audio Generation** |
|
- Produces original soundtracks and SFX with MusicGen (AudioCraft spec) |
|
- Allows sample-conditioned audio generation |
|
|
|
- **Captioning & Metadata** |
|
- Auto-generates captions and descriptions for images using Gemma-3-27B-IT |
|
|
|
- **Optional Web Research** |
|
- Queries DuckDuckGo to fetch example scripts, sound samples, or contextual references |
|
|
|
--- |
|
|
|
## Agent Flow |
|
|
|
Here’s an example flow demonstrating how you could use the agent. |
|
|
|
 |
|
|
|
--- |
|
|
|
## Code Overview |
|
|
|
```bash |
|
. |
|
├── app.py # Entry point: defines Gradio interface and routing logic |
|
├── system_prompt.txt # System-level prompt template for the CodeAgent |
|
├── requirements.txt # Python dependencies (Gradio, SmolAgents, OpenAI, etc.) |
|
└── README.md # Project documentation |
|
``` |
|
|
|
* **app.py** |
|
|
|
* **Agent** class: loads Qwen3-32B model, registers all tools |
|
* **respond()**: orchestrates between Gradio inputs and CodeAgent |
|
* Decorated `@tool` functions for image download, media generation, transcription, captioning |
|
* Gradio `ChatInterface` setup with text/file support and “Enable web search” toggle |
|
|
|
* **system\_prompt.txt** |
|
|
|
* Injects the agent’s “way of thinking,” including reasoning structure and error handling |
|
|
|
* **requirements.txt** |
|
|
|
* Lists all required libraries (Gradio, SmolAgents, OpenAI, HuggingFace, PDFPlumber, etc.) |
|
|
|
--- |
|
|
|
## Deployment & Access |
|
|
|
### Hugging Face Spaces |
|
|
|
1. Include `app.py`, `system_prompt.txt`, and `requirements.txt` in the root of your Space. |
|
2. Configure `OPENAI_API_KEY` and `HF_TOKEN` as Secrets in your Space’s settings. |
|
3. Make sure the Space is set to use **Python 3.10 or higher**. |
|
4. Select **Gradio** as the SDK (version 5.32.1). |
|
5. Pin or share the Space link to collaborate with your team. |
|
|
|
> **Note:** If you choose to clone this repository and run it locally, make sure to set your own `OPENAI_API_KEY` and `HF_TOKEN` environment variables before launching. |
|
|
|
--- |
|
## Use Cases |
|
|
|
**Independent Writer** |
|
* Upload a screenplay and quickly get a summary, a list of characters, and locations. |
|
* Create visual storyboards of key narrative moments via FLUX (PNG/JPEG outputs). |
|
* Generate brief soundtracks or sound effects to accompany script presentations (MP3/WAV). |
|
|
|
**Film Production Company** |
|
* Import multiple screenplays (PDF, DOCX) and automatically receive reports on characters, locations, and potential copyright issues. |
|
* Use the web search feature to find reference scripts or specific sound effects from free/paid sources. |
|
* Develop visual storyboards and audio prototypes to share with directors, artists, and investors. |
|
|
|
**Translation and Adaptation Agency** |
|
* Upload foreign-language scripts and obtain a structured text version with extracted entities (JSON/CSV). |
|
* Generate contextual images for cultural adaptation (e.g., images matching the original setting via FLUX). |
|
* Produce reference audio via MusicGen to test culturally appropriate music for the target audience. |
|
|
|
**Digital Humanities Course** |
|
* Demonstrate how to build a text-mining tool applied to performing arts, combining NLP, image, and audio pipelines. |
|
* Allow students to analyze real scripts, generate abstracts, scene maps, and visual/audio prototypes in a hands-on environment. |
|
* Explore Transformer models (DeepSeek), OCR, speech-to-text, and AI-driven media generation as part of the curriculum. |
|
|
|
--- |
|
|
|
## Contributors: |
|
|
|
* Code development and implementation made by **luke9705**; |
|
* Ideas creation, testing and videomaking conducted by **OrianIce**; |
|
* Research and testing by **Loren1214**; |
|
* Code revisions by **DDPM**. |
|
|
|
--- |
|
## Sources |
|
The following libraries, models, and tools power Scriptura’s agents and multimodal capabilities: |
|
|
|
- **Qwen3-32B** – primary orchestrating LLM for high-level reasoning and workflow management |
|
- **Gradio** – interactive web UI framework |
|
- **smolagents** – lightweight multi-agent orchestrator from Hugging Face |
|
- **huggingface_hub** – model & dataset management |
|
- **duckduckgo-search** – optional web research integration |
|
- **openai** – Whisper transcription, GPT-based reasoning |
|
- **anthropic** – Claude-style LLM support |
|
- **pdfplumber** – PDF text extraction |
|
- **docx2txt** – DOCX parsing |
|
- **odfpy** – ODT parsing |
|
- **pandas** – data handling |
|
- **Pillow (PIL)** – image processing |
|
- **requests** – HTTP client for external APIs |
|
- **numpy** – numerical operations |
|
- **MusicGen (AudioCraft)** – soundtrack and SFX generation |
|
- **FLUX (black-forest-labs/FLUX.1-dev)** – on-the-fly image generation |
|
- **Gemma-3-27B-IT** – multimodal captioning and metadata |