Testing org
community
AI & ML interests
None defined yet.

freddyaboultonΒ
posted
an
update
about 20 hours ago

multimodalartΒ
posted
an
update
7 days ago
Post
3422
Self-Forcing - a real-time video distilled model from Wan 2.1 by
@adobe
is out, and they open sourced it π
I've built a live real time demo on Spaces πΉπ¨
multimodalart/self-forcing
I've built a live real time demo on Spaces πΉπ¨
multimodalart/self-forcing

freddyaboultonΒ
posted
an
update
16 days ago
Post
482
Time is running out! β°
Less than 24 hours to participate in the MCP Hackathon and win thousands of dollars in prizes! Don't miss this opportunity to showcase your skills.
Visit Agents-MCP-Hackathon/AI-Marketing-Content-Creator to register!
Less than 24 hours to participate in the MCP Hackathon and win thousands of dollars in prizes! Don't miss this opportunity to showcase your skills.
Visit Agents-MCP-Hackathon/AI-Marketing-Content-Creator to register!

freddyaboultonΒ
posted
an
update
17 days ago
Post
351
π¨ NotebookLM Dethroned?! π¨
Meet Fluxions vui: The new open-source dialogue generation model.
π€― 100M Params, 40k hours audio!
ποΈ Multi-speaker audio
π Non-speech sounds (like [laughs]!)
π MIT License
Is this the future of content creation? Watch the video and decide for yourself!
https://huggingface.co/spaces/fluxions/vui-spacehttps://huggingface.co/fluxions/vui
Meet Fluxions vui: The new open-source dialogue generation model.
π€― 100M Params, 40k hours audio!
ποΈ Multi-speaker audio
π Non-speech sounds (like [laughs]!)
π MIT License
Is this the future of content creation? Watch the video and decide for yourself!
https://huggingface.co/spaces/fluxions/vui-spacehttps://huggingface.co/fluxions/vui

freddyaboultonΒ
posted
an
update
3 months ago
Post
2143
Ever wanted to share your AI creations with friends? β¨
Screenshots are fine, but imagine letting others play with your ACTUAL model!
Introducing Gradio deep links π - now you can share interactive AI apps, not just images.
Add a gr.DeepLinkButton to any app and get shareable URLs that let ANYONE experiment with your models.
Screenshots are fine, but imagine letting others play with your ACTUAL model!
Introducing Gradio deep links π - now you can share interactive AI apps, not just images.
Add a gr.DeepLinkButton to any app and get shareable URLs that let ANYONE experiment with your models.

freddyaboultonΒ
posted
an
update
4 months ago
Post
2005
Privacy matters when talking to AI! π
We've just added a microphone mute button to FastRTC in our latest update (v0.0.14). Now you control exactly what your LLM hears.
Plus lots more features in this release! Check them out:
https://github.com/freddyaboulton/fastrtc/releases/tag/0.0.14
We've just added a microphone mute button to FastRTC in our latest update (v0.0.14). Now you control exactly what your LLM hears.
Plus lots more features in this release! Check them out:
https://github.com/freddyaboulton/fastrtc/releases/tag/0.0.14

freddyaboultonΒ
posted
an
update
4 months ago
Post
3316
Getting WebRTC and Websockets right in python is very tricky. If you've tried to wrap an LLM in a real-time audio layer then you know what I'm talking about.
That's where FastRTC comes in! It makes WebRTC and Websocket streams super easy with minimal code and overhead.
Check out our org: hf.co/fastrtc
That's where FastRTC comes in! It makes WebRTC and Websocket streams super easy with minimal code and overhead.
Check out our org: hf.co/fastrtc

freddyaboultonΒ
posted
an
update
6 months ago
Post
1839
Just created a Gradio space for playing with the new OAI realtime voice API!
freddyaboulton/openai-realtime-voice
freddyaboulton/openai-realtime-voice

freddyaboultonΒ
posted
an
update
6 months ago
Post
1024
Gemini can talk π£οΈ
Check out the new multimodal API from Google on @akhaliq 's anychat or my space. It's very fast and smart π
https://huggingface.co/spaces/freddyaboulton/gemini-voicehttps://huggingface.co/spaces/akhaliq/anychat
Check out the new multimodal API from Google on @akhaliq 's anychat or my space. It's very fast and smart π
https://huggingface.co/spaces/freddyaboulton/gemini-voicehttps://huggingface.co/spaces/akhaliq/anychat

freddyaboultonΒ
posted
an
update
7 months ago
Post
2626
Version 0.0.21 of gradio-pdf now properly loads chinese characters!

freddyaboultonΒ
posted
an
update
7 months ago
Post
1664
Hello Llama 3.2! π£οΈπ¦
Build a Siri-like coding assistant that responds to "Hello Llama" in 100 lines of python! All with Gradio, webRTC π
freddyaboulton/hey-llama-code-editor
Build a Siri-like coding assistant that responds to "Hello Llama" in 100 lines of python! All with Gradio, webRTC π
freddyaboulton/hey-llama-code-editor

freddyaboultonΒ
posted
an
update
7 months ago
Post
1206
Just created a cookbook of real time audio/video spaces created using Gradio and WebRTC β‘οΈ
Use this and the [docs](https://freddyaboulton.github.io/gradio-webrtc/) to get started building the next gen of AI apps!
freddyaboulton/gradio-webrtc-cookbook-6758ba7745aeca7b1be7de0f
Use this and the [docs](https://freddyaboulton.github.io/gradio-webrtc/) to get started building the next gen of AI apps!
freddyaboulton/gradio-webrtc-cookbook-6758ba7745aeca7b1be7de0f

multimodalartΒ
posted
an
update
11 months ago
Post
35198
New feature π₯
Image models and LoRAs now have little previews π€
If you don't know where to start to find them, I invite you to browse cool LoRAs in the profile of some amazing fine-tuners: @artificialguybr , @alvdansen , @DoctorDiffusion , @e-n-v-y , @KappaNeuro @ostris
Image models and LoRAs now have little previews π€
If you don't know where to start to find them, I invite you to browse cool LoRAs in the profile of some amazing fine-tuners: @artificialguybr , @alvdansen , @DoctorDiffusion , @e-n-v-y , @KappaNeuro @ostris

freddyaboultonΒ
posted
an
update
about 1 year ago
Post
1555
@dwancin
Can you please reset your toggle component's space? It's stuck for some reason. Happy to help
dwancin/gradio_toggle
dwancin/gradio_toggle

multimodalartΒ
posted
an
update
about 1 year ago
Post
28400
The first open Stable Diffusion 3-like architecture model is JUST out π£ - but it is not SD3! π€
It is Tencent-Hunyuan/HunyuanDiT by Tencent, a 1.5B parameter DiT (diffusion transformer) text-to-image model πΌοΈβ¨, trained with multi-lingual CLIP + multi-lingual T5 text-encoders for english π€ chinese understanding
Try it out by yourself here βΆοΈ https://huggingface.co/spaces/multimodalart/HunyuanDiT
(a bit too slow as the model is chunky and the research code isn't super optimized for inference speed yet)
In the paper they claim to be SOTA open source based on human preference evaluation!
It is Tencent-Hunyuan/HunyuanDiT by Tencent, a 1.5B parameter DiT (diffusion transformer) text-to-image model πΌοΈβ¨, trained with multi-lingual CLIP + multi-lingual T5 text-encoders for english π€ chinese understanding
Try it out by yourself here βΆοΈ https://huggingface.co/spaces/multimodalart/HunyuanDiT
(a bit too slow as the model is chunky and the research code isn't super optimized for inference speed yet)
In the paper they claim to be SOTA open source based on human preference evaluation!

freddyaboultonΒ
posted
an
update
about 1 year ago
Post
3718
We just released gradio version 4.26.0 ! We *highly* recommend you upgrade your apps to this version to bring in these nice changes:
π₯ Introducing the API recorder. Any gradio app running 4.26.0 and above will have an "API Recorder" that will record your interactions with the app and auto-generate the corresponding python or js code needed to recreate those actions programmatically. It's very neat!
π Enhanced markdown rendering in gr.Chatbot
π’ Fix for slow load times on spaces as well as the UI locking up on rapid generations
See the full changelog of goodies here: https://www.gradio.app/changelog#4-26-0
π₯ Introducing the API recorder. Any gradio app running 4.26.0 and above will have an "API Recorder" that will record your interactions with the app and auto-generate the corresponding python or js code needed to recreate those actions programmatically. It's very neat!
π Enhanced markdown rendering in gr.Chatbot
π’ Fix for slow load times on spaces as well as the UI locking up on rapid generations
See the full changelog of goodies here: https://www.gradio.app/changelog#4-26-0

freddyaboultonΒ
posted
an
update
about 1 year ago
Post
2483
Gradio 4.25.0 is out with some nice improvements and bug fixes!
π§Ή Automatic deletion of gr.State variables stored in the server. Never run out of RAM again. Also adds an unload event you can run when a user closes their browser tab.
π΄ Lazy example caching. You can set cache_examples="lazy" to cache examples when they're first requested as opposed to before the server launches. This can cut down the server's start-up time drastically.
π Fixes a bug with streaming audio outputs
π€ Improvements to gr.ChatInterface like pasting images directly from the clipboard.
See the rest of the changelog here: https://www.gradio.app/changelog#4-25-0
π§Ή Automatic deletion of gr.State variables stored in the server. Never run out of RAM again. Also adds an unload event you can run when a user closes their browser tab.
π΄ Lazy example caching. You can set cache_examples="lazy" to cache examples when they're first requested as opposed to before the server launches. This can cut down the server's start-up time drastically.
π Fixes a bug with streaming audio outputs
π€ Improvements to gr.ChatInterface like pasting images directly from the clipboard.
See the rest of the changelog here: https://www.gradio.app/changelog#4-25-0

freddyaboultonΒ
posted
an
update
over 1 year ago
Post
1770
Tips for saving disk space with Gradio πΎ
Try these out with gradio 4.22.0 ! Code snippet attached.
1. Set delete_cache. The delete_cache parameter will periodically delete files from gradio's cache that are older than a given age. Setting it will also delete all files created by that app when the app shuts down. It is a tuple of two ints, (frequency, age) expressed in seconds. So delete_cache=(3600, 3600), will delete files older than an hour every hour.
2. Use static files. Static files are not copied to the cache and are instead served directly to users of your app. This is useful for components displaying a lot of content that won't change, like a gallery with hundreds of images.
3. Set format="jpeg" for images and galleries. JPEGs take up less disk space than PNGs. This can also speed up the speed of your prediction function as they will be written to the cache faster.
Try these out with gradio 4.22.0 ! Code snippet attached.
1. Set delete_cache. The delete_cache parameter will periodically delete files from gradio's cache that are older than a given age. Setting it will also delete all files created by that app when the app shuts down. It is a tuple of two ints, (frequency, age) expressed in seconds. So delete_cache=(3600, 3600), will delete files older than an hour every hour.
2. Use static files. Static files are not copied to the cache and are instead served directly to users of your app. This is useful for components displaying a lot of content that won't change, like a gallery with hundreds of images.
3. Set format="jpeg" for images and galleries. JPEGs take up less disk space than PNGs. This can also speed up the speed of your prediction function as they will be written to the cache faster.

multimodalartΒ
posted
an
update
over 1 year ago
Post
The Stable Diffusion 3 research paper broken down, including some overlooked details! π
Model
π 2 base model variants mentioned: 2B and 8B sizes
π New architecture in all abstraction levels:
- π½ UNet; β¬οΈ Multimodal Diffusion Transformer, bye cross attention π
- π Rectified flows for the diffusion process
- π§© Still a Latent Diffusion Model
π 3 text-encoders: 2 CLIPs, one T5-XXL; plug-and-play: removing the larger one maintains competitiveness
ποΈ Dataset was deduplicated with SSCD which helped with memorization (no more details about the dataset tho)
Variants
π A DPO fine-tuned model showed great improvement in prompt understanding and aesthetics
βοΈ An Instruct Edit 2B model was trained, and learned how to do text-replacement
Results
β State of the art in automated evals for composition and prompt understanding
β Best win rate in human preference evaluation for prompt understanding, aesthetics and typography (missing some details on how many participants and the design of the experiment)
Paper: https://stabilityai-public-packages.s3.us-west-2.amazonaws.com/Stable+Diffusion+3+Paper.pdf
Model
π 2 base model variants mentioned: 2B and 8B sizes
π New architecture in all abstraction levels:
- π½ UNet; β¬οΈ Multimodal Diffusion Transformer, bye cross attention π
- π Rectified flows for the diffusion process
- π§© Still a Latent Diffusion Model
π 3 text-encoders: 2 CLIPs, one T5-XXL; plug-and-play: removing the larger one maintains competitiveness
ποΈ Dataset was deduplicated with SSCD which helped with memorization (no more details about the dataset tho)
Variants
π A DPO fine-tuned model showed great improvement in prompt understanding and aesthetics
βοΈ An Instruct Edit 2B model was trained, and learned how to do text-replacement
Results
β State of the art in automated evals for composition and prompt understanding
β Best win rate in human preference evaluation for prompt understanding, aesthetics and typography (missing some details on how many participants and the design of the experiment)
Paper: https://stabilityai-public-packages.s3.us-west-2.amazonaws.com/Stable+Diffusion+3+Paper.pdf

multimodalartΒ
posted
an
update
over 1 year ago
Post
βοΈ The TIGERLab's Text2Image arena is here! βοΈ
TIGER-Lab/GenAI-Arena
Like https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard for LLMs: you prompt, two images emerge, vote for the best one π
With enough votes this will lead to an Elo-based leaderboard for text-to-image models, go vote! π³οΈ
TIGER-Lab/GenAI-Arena
TIGER-Lab/GenAI-Arena
Like https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard for LLMs: you prompt, two images emerge, vote for the best one π
With enough votes this will lead to an Elo-based leaderboard for text-to-image models, go vote! π³οΈ
TIGER-Lab/GenAI-Arena