Update README.md
Browse files
README.md
CHANGED
@@ -48,6 +48,27 @@ SlideDeck AI uses a chain of specialized tools and AI models to achieve its resu
|
|
48 |
3. **Image Asset Tool (`Nebius`):** The agent extracts the image prompts from the JSON plan and sends them to the Flux-Schnell model via Nebius to generate visual assets.
|
49 |
4. **Audio Asset Tool (`Modal`):** The speaker notes from the JSON plan are sent to a custom Text-to-Speech model (https://huggingface.co/hexgrad/Kokoro-82M) deployed on Modal, which returns audio files for each slide.
|
50 |
5. **HTML Builder Tool (`Sambanova`):** Finally, the agent combines the JSON plan (now updated with image and audio URLs) and feeds it to another powerful LLM on Nebius. This model writs the complete, final HTML and CSS for the presentation.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
51 |
---
|
52 |
|
53 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
48 |
3. **Image Asset Tool (`Nebius`):** The agent extracts the image prompts from the JSON plan and sends them to the Flux-Schnell model via Nebius to generate visual assets.
|
49 |
4. **Audio Asset Tool (`Modal`):** The speaker notes from the JSON plan are sent to a custom Text-to-Speech model (https://huggingface.co/hexgrad/Kokoro-82M) deployed on Modal, which returns audio files for each slide.
|
50 |
5. **HTML Builder Tool (`Sambanova`):** Finally, the agent combines the JSON plan (now updated with image and audio URLs) and feeds it to another powerful LLM on Nebius. This model writs the complete, final HTML and CSS for the presentation.
|
51 |
+
---
|
52 |
+
---
|
53 |
+
|
54 |
+
## ⚠️ Known Issues & Future Improvements
|
55 |
+
|
56 |
+
This project was built within the tight timeframe of a hackathon. Here are a few known limitations and how I plan to address them in the future:
|
57 |
+
|
58 |
+
### 1. PDF Export Quality
|
59 |
+
|
60 |
+
* **The Issue:** The downloaded PDF file may not perfectly match the beautiful layout seen in the "Final Presentation" tab. Complex CSS elements like advanced grids or custom fonts can sometimes be rendered incorrectly.
|
61 |
+
* **The Cause:** The PDF is generated using the `weasyprint` library. While powerful, it's not a full web browser engine and can struggle with the very modern and complex CSS that the AI agent generates.
|
62 |
+
* **The Workaround:** For a pixel-perfect view, please use the "Final Presentation" tab directly in the UI. For sharing, you can right-click -> "Save As" on that page to get a self-contained HTML file that will look perfect in any modern browser.
|
63 |
+
* **Roadmap:** The gold-standard solution is to integrate a headless browser like **Playwright**. This would allow the app to take a perfect "screenshot" of the rendered HTML page and save it as a high-fidelity PDF.
|
64 |
+
|
65 |
+
### 2. Presentation Generation Speed
|
66 |
+
|
67 |
+
* **The Issue:** The final step, where the agent builds the HTML code, can take some time (2-3 minutes).
|
68 |
+
* **The Cause:** This step deliberately uses a large, powerful reasoning model (**DeepSeek-R1-0528** via Nebius) to act as an expert front-end developer. This model's strength is its high-quality, complex code generation, which comes at the cost of higher latency. This was a conscious trade-off to prioritize the *quality* of the final presentation over raw speed.
|
69 |
+
* **The Workaround:** Be patient and watch the logs! The UI provides real-time feedback so you know the agent is hard at work "thinking" and "coding" your presentation.
|
70 |
+
* **Roadmap:** The ideal enhancement would be to **stream the model's output**. Instead of waiting for the full HTML file, the code would appear in the "Raw HTML Code" tab token-by-token, creating an amazing "live coding" effect. This would dramatically improve the perceived performance and user experience.
|
71 |
+
|
72 |
---
|
73 |
|
74 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|