Update README.md
Browse files
README.md
CHANGED
@@ -1,5 +1,5 @@
|
|
1 |
---
|
2 |
-
title:
|
3 |
emoji: 🏆
|
4 |
colorFrom: yellow
|
5 |
colorTo: blue
|
@@ -8,130 +8,138 @@ sdk_version: 5.32.1
|
|
8 |
app_file: app.py
|
9 |
pinned: false
|
10 |
license: mit
|
11 |
-
tag:
|
12 |
---
|
13 |
|
14 |
-
#
|
15 |
|
16 |
-
##
|
17 |
|
18 |
-
|
19 |
-
The aim of our agent is to support authors in their creative process for scenarios and storyboards.
|
20 |
|
21 |
-
|
|
|
|
|
|
|
|
|
22 |
|
23 |
-
|
|
|
|
|
|
|
24 |
|
25 |
-
|
26 |
|
27 |
-
|
28 |
-
|
29 |
-
**B**
|
30 |
-
|
31 |
-
The agent receives as input a text file containing the script,
|
32 |
-
either in plain text format or in structured formats (e.g. PDF, DOCX),
|
33 |
-
which it then converts into plain text for processing.
|
34 |
-
|
35 |
-
**C**
|
36 |
-
|
37 |
-
The agent extracts a summary of the overall content of the scenario,
|
38 |
-
identifying the main narrative lines and the time frame.
|
39 |
-
|
40 |
-
This will help creating a big picture version of the draft for the next steps
|
41 |
-
|
42 |
-
**D**
|
43 |
-
|
44 |
-
The agent will identify the main entities (characters, locations, events) and key themes in the script.
|
45 |
-
|
46 |
-
It will also generate a small abstract (~5 sentences)
|
47 |
-
with enough details to understand the overall plot and tone.
|
48 |
-
|
49 |
-
**E**
|
50 |
-
|
51 |
-
The agent checks whether the input text matches a known or published script.
|
52 |
-
|
53 |
-
If it does,
|
54 |
-
it will check the license and availability of rights to understand if it is possible to operate on it.
|
55 |
-
|
56 |
-
In case of any limitations, the agent will warn the user about restrictions.
|
57 |
-
|
58 |
-
**F**
|
59 |
-
|
60 |
-
The agent will perform an analysis of the main points of the sctipt:
|
61 |
-
|
62 |
-
- Characters: extract and catalog the names of the characters,
|
63 |
-
classifying them by role (protagonist, antagonist, secondary characters),
|
64 |
-
gender and age/physical description.
|
65 |
-
|
66 |
-
- Locations: Detect the places where the scenes take place
|
67 |
-
(interiors, exteriors, historical periods, geographical location) and catalogue them.
|
68 |
-
|
69 |
-
- Plot points: Isolate key plot points
|
70 |
-
|
71 |
-
- Vibes (Look and Feel): Understand the style (dramatic, comic, thriller, horror)
|
72 |
-
and the overall sensation (suspense, irony, melancholy).
|
73 |
-
|
74 |
-
|
75 |
-
**G**
|
76 |
-
|
77 |
-
Define the agent goal.
|
78 |
-
|
79 |
-
Having achieved a comprehensive summary, the agent will ask for the final goal:
|
80 |
-
|
81 |
-
- Remake / Rewrite
|
82 |
-
- Change of medium (movie, tv series, ...)
|
83 |
-
- Other purposes (Workshop, Interactive presentation, Didactic analysis, ...)
|
84 |
|
|
|
85 |
|
86 |
-
**
|
|
|
|
|
|
|
87 |
|
88 |
-
|
|
|
|
|
|
|
89 |
|
90 |
-
|
91 |
-
|
92 |
-
|
|
|
93 |
|
94 |
-
**
|
|
|
|
|
|
|
95 |
|
96 |
-
|
|
|
|
|
|
|
97 |
|
98 |
-
|
99 |
-
|
|
|
|
|
100 |
|
101 |
-
-
|
102 |
-
-
|
103 |
-
-
|
|
|
|
|
|
|
104 |
|
|
|
|
|
|
|
|
|
105 |
|
106 |
-
**TODO: add sound and bias analysis?**
|
107 |
|
|
|
108 |
|
109 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
110 |
|
111 |
-
|
|
|
112 |
|
|
|
|
|
|
|
|
|
113 |
|
|
|
|
|
|
|
|
|
114 |
|
115 |
-
|
|
|
|
|
|
|
116 |
|
117 |
-
|
118 |
-
-
|
119 |
-
-
|
120 |
-
-
|
121 |
-
and classification using a Story Understanding model
|
122 |
-
- Tone analysis and Sentiment analysis for understanding vibes
|
123 |
-
- Image generation models (Stable Diffusion, DALL·E 3), with prompts generated by the model
|
124 |
|
|
|
|
|
125 |
|
126 |
-
### Code overview
|
127 |
|
|
|
|
|
128 |
|
129 |
-
### Use cases
|
130 |
|
|
|
131 |
### Contributors:
|
132 |
- Code Implementation made by luke9705 and DDPM;
|
133 |
- Ideas creation and testing conducted by OrianIce and Loren1214.
|
134 |
|
|
|
135 |
### Sources
|
136 |
|
137 |
- Russell, S., & Norvig, P. (2021). *Artificial Intelligence: A Modern Approach* (3rd ed.). Pearson.
|
|
|
1 |
---
|
2 |
+
title: Scriptura
|
3 |
emoji: 🏆
|
4 |
colorFrom: yellow
|
5 |
colorTo: blue
|
|
|
8 |
app_file: app.py
|
9 |
pinned: false
|
10 |
license: mit
|
11 |
+
tag: agent-demo-track
|
12 |
---
|
13 |
|
14 |
+
# Scriptura
|
15 |
|
16 |
+
## Introduction
|
17 |
|
18 |
+
**Scriptura** is a multi-agent AI system designed to assist authors in creating screenplays, storyboards, and soundtracks. Its main goal is to automate and accelerate the stages of analysis, summarization, and enrichment of narrative text, allowing screenwriters to focus on the creative aspects.
|
|
|
19 |
|
20 |
+
The core stack includes:
|
21 |
+
- **DeepSeek (deepseek-ai/DeepSeek-R1)** as the base model for all text operations (analysis, summarization, generation) via APIs managed by Nebius AI.
|
22 |
+
- **FLUX (black-forest-labs/FLUX.1-dev)** for image generation (storyboards, concept art) integrated into the narrative flow.
|
23 |
+
- **MusicGen (facebook/musicgen-melody)** to create short audio tracks or sound effects, useful for prototyping or presenting.
|
24 |
+
- Optional web search (integrated with DuckDuckGo API) to fetch external resources (original scripts, sound effects, reference materials).
|
25 |
|
26 |
+
**Scriptura** supports inputs in various formats:
|
27 |
+
- **Text**: TXT, PDF, DOCX (automatically converted to structured plain text)
|
28 |
+
- **Images**: JPEG, PNG (for analyzing existing storyboards or screenshots)
|
29 |
+
- **Audio**: MP3, WAV (for transcribing dialogue or analyzing uploaded soundtracks)
|
30 |
|
31 |
+
There are size and duration checks on uploaded files to prevent excessively large inputs.
|
32 |
|
33 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
34 |
|
35 |
+
## Agent Capabilities
|
36 |
|
37 |
+
**Input File Parsing**
|
38 |
+
: - **Formats accepted**: `TXT`, `PDF`, `DOCX`, `JPEG/PNG`, `MP3/WAV`
|
39 |
+
- **Process**: PDF/DOCX → plain text; OCR on images; speech-to-text on audio.
|
40 |
+
- **Why it matters**: Provides structured input for all downstream modules.
|
41 |
|
42 |
+
**Overall Plot Summary**
|
43 |
+
: - **Model**: `DeepSeek-R1`
|
44 |
+
- **Output**: 4–6 sentence summary of main narrative threads (timeframe, tone).
|
45 |
+
- **Mechanics**: API calls to DeepSeek with retry logic for improved coherence.
|
46 |
|
47 |
+
**Entity & Theme Extraction**
|
48 |
+
: - **Technique**: Named Entity Recognition (via **DeepSeek**)
|
49 |
+
- **Extracts**: Characters, locations, key events, recurring themes, narrative tone.
|
50 |
+
- **Output**: JSON/CSV + ~5-sentence abstract.
|
51 |
|
52 |
+
**Rights & Licensing Verification**
|
53 |
+
: - **Web Search ON**: Queries DuckDuckGo API → fetch license info if match.
|
54 |
+
- **Web Search OFF**: May recognize very famous works internally (e.g. “Harry Potter”) but not guaranteed.
|
55 |
+
- **If no match & search OFF**: No licensing check.
|
56 |
|
57 |
+
**Image Generation (Storyboard & Concept Art)**
|
58 |
+
: - **Model**: `FLUX (black-forest-labs/FLUX.1-dev)`
|
59 |
+
- **Trigger**: “Generate Image” / storyboard phase.
|
60 |
+
- **Process**: DeepSeek crafts cinematic prompt → FLUX returns PNG/JPEG + caption.
|
61 |
|
62 |
+
**Audio Generation (Music & Sound Effects)**
|
63 |
+
: - **Model**: `MusicGen (facebook/musicgen-melody)`
|
64 |
+
- **Trigger**: “Generate Audio.”
|
65 |
+
- **Process**: Send prompt → receive MP3/WAV (standalone audio, no text/images).
|
66 |
|
67 |
+
**In-Depth Analysis of Key Points**
|
68 |
+
: - **Extracts**:
|
69 |
+
- Characters (role, gender, description)
|
70 |
+
- Locations (interior/exterior, period, geography)
|
71 |
+
- Plot Points (crucial narrative beats via Story Understanding models)
|
72 |
+
- **Extras**: Semantic toponym extraction → internal scene maps; detect transitions (“Suddenly,” “Meanwhile”).
|
73 |
|
74 |
+
**Optional Web Search**
|
75 |
+
: - **Checkbox** toggles DuckDuckGo API lookups.
|
76 |
+
- **If Enabled**: search preconfigured sites (free & paid) for scripts, sound effects.
|
77 |
+
- **Output**: List of links + short summaries.
|
78 |
|
|
|
79 |
|
80 |
+
---
|
81 |
|
82 |
+
## Agent Flow
|
83 |
+
|
84 |
+
```mermaid
|
85 |
+
flowchart LR
|
86 |
+
A[Start Agent] --> B[Load Input (text, image, audio)]
|
87 |
+
B --> C[Preprocessing: PDF/DOCX → text, OCR, audio transcription]
|
88 |
+
C --> D[Generate Plot Summary (DeepSeek)]
|
89 |
+
D --> E[Extract Entities & Themes (DeepSeek)]
|
90 |
+
E --> F {Web Search Enabled?}
|
91 |
+
F -->|Yes| G[Web Search via DuckDuckGo API]
|
92 |
+
F -->|No| H[Continue Offline Analysis]
|
93 |
+
H --> I[Rights & Licensing Check]
|
94 |
+
I --> J[Deep Analysis: characters, locations, plot points]
|
95 |
+
J --> K {Image Generation Requested?}
|
96 |
+
K -->|Yes| L[API Call to FLUX for storyboard/concept art]
|
97 |
+
K -->|No| M[Skip Image Generation]
|
98 |
+
M --> N {Audio Generation Requested?}
|
99 |
+
N -->|Yes| O[API Call to MusicGen for audio tracks]
|
100 |
+
N -->|No| P[Skip Audio Generation]
|
101 |
+
L & O --> Q[Final Output: text, JSON/CSV, images, audio]
|
102 |
+
```
|
103 |
+
---
|
104 |
+
## Deployment & Access and the Code Overview
|
105 |
|
106 |
+
---
|
107 |
+
## Use Cases
|
108 |
|
109 |
+
**Independent Writer**
|
110 |
+
: - Upload a screenplay and quickly get a summary, a list of characters, and locations.
|
111 |
+
- Create visual storyboards of key narrative moments via FLUX (PNG/JPEG outputs).
|
112 |
+
- Generate brief soundtracks or sound effects to accompany script presentations (MP3/WAV).
|
113 |
|
114 |
+
**Film Production Company**
|
115 |
+
: - Import multiple screenplays (PDF, DOCX) and automatically receive reports on characters, locations, and potential copyright issues.
|
116 |
+
- Use the web search feature to find reference scripts or specific sound effects from free/paid sources.
|
117 |
+
- Develop visual storyboards and audio prototypes to share with directors, artists, and investors.
|
118 |
|
119 |
+
**Translation and Adaptation Agency**
|
120 |
+
: - Upload foreign-language scripts and obtain a structured text version with extracted entities (JSON/CSV).
|
121 |
+
- Generate contextual images for cultural adaptation (e.g., images matching the original setting via FLUX).
|
122 |
+
- Produce reference audio via MusicGen to test culturally appropriate music for the target audience.
|
123 |
|
124 |
+
**Digital Humanities Course**
|
125 |
+
: - Demonstrate how to build a text-mining tool applied to performing arts, combining NLP, image, and audio pipelines.
|
126 |
+
- Allow students to analyze real scripts, generate abstracts, scene maps, and visual/audio prototypes in a hands-on environment.
|
127 |
+
- Explore Transformer models (DeepSeek), OCR, speech-to-text, and AI-driven media generation as part of the curriculum.
|
|
|
|
|
|
|
128 |
|
129 |
+
---
|
130 |
+
## Credits
|
131 |
|
|
|
132 |
|
133 |
+
---
|
134 |
+
## Acknowledgements
|
135 |
|
|
|
136 |
|
137 |
+
---
|
138 |
### Contributors:
|
139 |
- Code Implementation made by luke9705 and DDPM;
|
140 |
- Ideas creation and testing conducted by OrianIce and Loren1214.
|
141 |
|
142 |
+
---
|
143 |
### Sources
|
144 |
|
145 |
- Russell, S., & Norvig, P. (2021). *Artificial Intelligence: A Modern Approach* (3rd ed.). Pearson.
|