File size: 7,208 Bytes
f16cdd3
c9b71c9
a44ddc0
4a4fd49
52194ae
 
f16cdd3
 
 
52194ae
f16cdd3
3ee00d0
f16cdd3
 
5eeb352
eaddcd0
5eeb352
eaddcd0
5eeb352
945ea8f
3ee00d0
945ea8f
39b7a18
945ea8f
39b7a18
5eeb352
 
 
bcc38c8
39b7a18
5eeb352
 
 
945ea8f
39b7a18
bcc38c8
5eeb352
 
3ee00d0
bcc38c8
5eeb352
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0601340
5eeb352
 
 
 
 
0601340
3ee00d0
0601340
3ee00d0
 
5eeb352
 
8e3eab5
5eeb352
 
 
 
 
 
 
 
 
 
 
3ee00d0
5eeb352
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3ee00d0
5eeb352
 
 
 
 
 
 
 
 
 
 
 
0601340
3ee00d0
 
bcc38c8
3ee00d0
5eeb352
 
 
bcc38c8
3ee00d0
5eeb352
 
 
945ea8f
3ee00d0
5eeb352
 
 
945ea8f
3ee00d0
5eeb352
 
 
945ea8f
3ee00d0
945ea8f
5eeb352
945ea8f
2d95aa4
 
 
 
0601340
3ee00d0
5eeb352
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
---
title: "Scriptura"
short_description: "MultiAgent System for Screenplay Creation and Editing"
emoji: 🎞️
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.32.1
app_file: app.py
pinned: true
license: mit
tag: agent-demo-track
---

# Scriptura: A MultiAgent System for Screenplay Creation and Editing

The explanation video is available [here](https://www.youtube.com/watch?v=I0201ruB1Uo)

The screenplay used in the video as sample is available [here](https://www.studiobinder.com/blog/best-free-movie-scripts-online/)

## Introduction

**Scriptura** is a multi-agent AI framework based on HF-SmolAgents that streamlines the creation of screenplays, storyboards, and soundtracks by automating the stages of analysis, summarization, and multimodal enrichment—freeing authors to focus on pure creativity.

At its heart:

* Qwen3-32B serves as the primary orchestrating agent, coordinating workflows and managing high-level reasoning across the system.
* Gemma-3-27B-IT acts as a specialized assistant for multimodal tasks, supporting both text and audio inputs to refine narrative elements and prepare them for downstream generation.

For media generation, Scriptura integrates:

* MusicGen models (per the AudioCraft MusicGen specification), deployed via Hugging Face Spaces, enabling the agent to produce original soundtracks and sound effects from text prompts or combined text + audio samples.
* FLUX (black-forest-labs/FLUX.1-dev) for on-the-fly image creation, ideal for storyboards, concept art, and visual references that seamlessly tie into the narrative flow.

Optionally, Scriptura can query external sources (e.g., via a DuckDuckGo API integration) to pull in reference scripts, sound samples, or research materials, ensuring that every draft is not only creatively rich but also contextually informed.

---

## Agent Capabilities

Scriptura provides a rich set of agents and tools to cover the full screenplay production and enrichment pipeline:

- **Text Analysis & Summarization**  
  - Automatically extracts key themes, character arcs, and plot points  
  - Segments and summarizes scenes for rapid iteration  

- **Multimodal Ingestion**  
  - Supports PDF, DOCX, ODT, TXT and image uploads  
  - Transcribes audio files using OpenAI Whisper  

- **Image Generation**  
  - On-the-fly storyboard and concept art creation via FLUX (black-forest-labs/FLUX.1-dev)  

- **Audio Generation**  
  - Produces original soundtracks and SFX with MusicGen (AudioCraft spec)  
  - Allows sample-conditioned audio generation  

- **Captioning & Metadata**  
  - Auto-generates captions and descriptions for images using Gemma-3-27B-IT  

- **Optional Web Research**  
  - Queries DuckDuckGo to fetch example scripts, sound samples, or contextual references  

---

## Agent Flow

Here’s an example flow demonstrating how you could use the agent.

![image/png](https://cdn-uploads.huggingface.co/production/uploads/683eca9c72e8702dc425b51f/FFhfD2gCL-BjRC1eT-ELB.png)

---

## Code Overview

```bash
.
├── app.py               # Entry point: defines Gradio interface and routing logic
├── system_prompt.txt    # System-level prompt template for the CodeAgent
├── requirements.txt     # Python dependencies (Gradio, SmolAgents, OpenAI, etc.)
└── README.md            # Project documentation
```

* **app.py**

  * **Agent** class: loads Qwen3-32B model, registers all tools
  * **respond()**: orchestrates between Gradio inputs and CodeAgent
  * Decorated `@tool` functions for image download, media generation, transcription, captioning
  * Gradio `ChatInterface` setup with text/file support and “Enable web search” toggle

* **system\_prompt.txt**

  * Injects the agent’s “way of thinking,” including reasoning structure and error handling

* **requirements.txt**

  * Lists all required libraries (Gradio, SmolAgents, OpenAI, HuggingFace, PDFPlumber, etc.)

---

## Deployment & Access

### Hugging Face Spaces

1. Include `app.py`, `system_prompt.txt`, and `requirements.txt` in the root of your Space.  
2. Configure `OPENAI_API_KEY` and `HF_TOKEN` as Secrets in your Space’s settings.  
3. Make sure the Space is set to use **Python 3.10 or higher**.  
4. Select **Gradio** as the SDK (version 5.32.1).  
5. Pin or share the Space link to collaborate with your team.

> **Note:** If you choose to clone this repository and run it locally, make sure to set your own `OPENAI_API_KEY` and `HF_TOKEN` environment variables before launching.

---
## Use Cases

**Independent Writer**  
* Upload a screenplay and quickly get a summary, a list of characters, and locations.  
* Create visual storyboards of key narrative moments via FLUX (PNG/JPEG outputs).  
* Generate brief soundtracks or sound effects to accompany script presentations (MP3/WAV).

**Film Production Company**  
* Import multiple screenplays (PDF, DOCX) and automatically receive reports on characters, locations, and potential copyright issues.  
* Use the web search feature to find reference scripts or specific sound effects from free/paid sources.  
* Develop visual storyboards and audio prototypes to share with directors, artists, and investors.

**Translation and Adaptation Agency**  
* Upload foreign-language scripts and obtain a structured text version with extracted entities (JSON/CSV).  
* Generate contextual images for cultural adaptation (e.g., images matching the original setting via FLUX).  
* Produce reference audio via MusicGen to test culturally appropriate music for the target audience.

**Digital Humanities Course**  
* Demonstrate how to build a text-mining tool applied to performing arts, combining NLP, image, and audio pipelines.  
* Allow students to analyze real scripts, generate abstracts, scene maps, and visual/audio prototypes in a hands-on environment.  
* Explore Transformer models (DeepSeek), OCR, speech-to-text, and AI-driven media generation as part of the curriculum.

---

## Contributors:

* Code development and implementation made by **luke9705**;
* Ideas creation, testing and videomaking conducted by **OrianIce**;
* Research and testing by **Loren1214**;
* Code revisions by **DDPM**.

---
## Sources
The following libraries, models, and tools power Scriptura’s agents and multimodal capabilities:

- **Qwen3-32B** – primary orchestrating LLM for high-level reasoning and workflow management  
- **Gradio** – interactive web UI framework  
- **smolagents** – lightweight multi-agent orchestrator from Hugging Face  
- **huggingface_hub** – model & dataset management  
- **duckduckgo-search** – optional web research integration  
- **openai** – Whisper transcription, GPT-based reasoning  
- **anthropic** – Claude-style LLM support  
- **pdfplumber** – PDF text extraction  
- **docx2txt** – DOCX parsing  
- **odfpy** – ODT parsing  
- **pandas** – data handling  
- **Pillow (PIL)** – image processing  
- **requests** – HTTP client for external APIs  
- **numpy** – numerical operations  
- **MusicGen (AudioCraft)** – soundtrack and SFX generation  
- **FLUX (black-forest-labs/FLUX.1-dev)** – on-the-fly image generation  
- **Gemma-3-27B-IT** – multimodal captioning and metadata