Dev Mode Explorers

community

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

nielsr submitted a paper 3 days ago

Gemini Embedding 2: A Native Multimodal Embedding Model from Gemini

nielsr submitted a paper 9 days ago

Stable Audio 3

DongfuJiang authored a paper 16 days ago

Watch Before You Answer: Learning from Visually Grounded Post-Training

View all activity

alielfilali01

posted an update 1 day ago

Post

Plans in HTML > Plans in Markdown

nielsr

submitted a paper to Daily Papers 3 days ago

Gemini Embedding 2: A Native Multimodal Embedding Model from Gemini

Paper • 2605.27295 • Published 5 days ago • 17

nielsr

submitted a paper to Daily Papers 9 days ago

Stable Audio 3

Paper • 2605.17991 • Published 13 days ago • 18

johko

posted an update 10 days ago

Post

114

One prompt, three answers - which model is from where?

johko/llm-blind-date

I built a little demo where you give three models (Apertus, Llama, Qwen3) the same prompt and in the end you have to guess which is which just based on their answers.

GIve it a try! ;)

fffiloni

posted an update 12 days ago

Post

3299

I built HF Radio on Hugging Face Spaces 📻
fffiloni/HF-Radio

A live community radio for AI-generated songs, powered by tracks created with ACE-Step.

You can tune in, discover community-made songs in many languages, vote on what sounds good, and mark your real favorites as Bangers.

The more people listen, vote, and create, the better the station gets.

Under the hood, it connects a few Hugging Face pieces together:

Spaces for the live app, HF buckets for community tracks, OAuth for signed-in listeners, server-side streaming with ffmpeg, hourly playlist refreshes, moderation, jingles, and community feedback loops.

It’s not just a playlist.

It’s a shared taste experiment:
new songs get a shot every hour, and the community helps decide what deserves another spin.

Come listen.
Find weird gems.
Support the Bangers.
Shape the radio.

—> fffiloni/HF-Radio

DongfuJiang

authored 4 papers 16 days ago

Watch Before You Answer: Learning from Visually Grounded Post-Training

Paper • 2604.05117 • Published Apr 6 • 36

ClawBench: Can AI Agents Complete Everyday Online Tasks?

Paper • 2604.08523 • Published Apr 9 • 263

Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Paper • 2604.12374 • Published Apr 14 • 37

Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction

Paper • 2605.05242 • Published 28 days ago • 116

fffiloni

posted an update 17 days ago

Post

485

Great technical guide by Nico Martin on the Hugging Face blog, showing how to use Transformers.js inside a Chrome extension and run ONNX models from the Hub locally with WebGPU inside a Manifest V3 extension.

The interesting part: this is not just a chatbot in a side panel.

The article walks through the architecture behind a browser agent that can read open tabs, query webpages, search history, and highlight elements directly on the page — with models downloaded from the Hugging Face Hub, cached under the extension origin, and executed locally instead of being called through a remote API for every prompt.

A strong blueprint for building local-first web copilots, reading assistants, and AI-powered browsing workflows.

Article: https://huggingface.co/blog/transformersjs-chrome-extension

fffiloni

posted an update 19 days ago

Post

333

I’ve been reading “What if AI systems weren’t chatbots?”
What if AI systems weren't chatbots? (2605.07896) 👀

The paper asks a simple but important question: what if the chatbot interface is not just a neutral wrapper around AI models, but part of the problem?

A chatbot can make a system feel more capable, more certain, and more “human” than it really is. That matters, because interfaces shape how we trust, use, and delegate to AI systems.

When everything becomes: ask → answer
we can lose sight of the actual workflow:
- parameters
- alternatives
- uncertainty
- intermediate steps
- failure modes
- human control

For creative AI especially — image, video, editing, animation — I’m not sure “chat” should always be the default interface.

Sometimes we need a conversation.
But often we need a canvas, a timeline, sliders, masks, previews, comparisons, and visible pipelines.

This is also why I find many open ML demos interesting: Spaces, Gradio apps, visual tools, small focused interfaces.

They often explore another direction — not just better assistants, but better tools. 🤗

2 replies

Prabhjotschugh

authored a paper 20 days ago

When Less Is More: Simplicity Beats Complexity for Physics-Constrained InSAR Phase Unwrapping

Paper • 2605.00896 • Published Apr 28 • 1

fffiloni

posted an update about 1 month ago

Post

687

Quietly baking Image → Music 🎵 v3 — now running on SOTA open-source models.
👉 fffiloni/image-2-music-v3 | Feel free to test it and share feedback.

Just wiring together: merve/moondream3 * victor/ace-step-jam

Image → prompt → audio | Early version, will evolve | Follow: @fffiloni

nielsr

submitted a paper to Daily Papers about 1 month ago

Scaling Test-Time Compute for Agentic Coding

Paper • 2604.16529 • Published Apr 16 • 12

fffiloni

posted an update about 1 month ago

Post

1820

🚀 RB-Modulation is back on Hugging Face Spaces!

This is an older project that recently broke due to dependency changes, but it’s now fixed and running again ✅

👉 What’s fixed:
- GroundingDINO & LangSAM installation
- compatibility with recent environments
- GPU inference running smoothly again

👉 Try it here:
fffiloni/RB-Modulation

Feel free to give it a try again — feedback welcome!

nielsr

submitted a paper to Daily Papers about 1 month ago

Geometric Context Transformer for Streaming 3D Reconstruction

Paper • 2604.14141 • Published Apr 15 • 21

fffiloni

posted an update about 2 months ago

Post

3184

✨ PASD Magnify is back on Hugging Face Spaces

fffiloni/PASD

PASD isn’t recent, but still delivers strong results — worth restoring rather than replacing.

Getting it to run again wasn’t a simple dependency issue.
It relied on parts of diffusers that no longer exist, while moving to Gradio 6 forced a much newer HF stack — and I couldn’t modify the original source directly.

Recreating the old environment wasn’t practical.
So I patched the downloaded code at runtime before import and made it compatible with today’s stack.

That ended up being the only approach that held without forking or freezing everything to outdated versions.

If you’ve used it before (or are curious), feel free to give it another try.

nielsr

submitted 2 papers to Daily Papers about 2 months ago

A Frame is Worth One Token: Efficient Generative World Modeling with Delta Tokens

Paper • 2604.04913 • Published Apr 6 • 12

MDPBench: A Benchmark for Multilingual Document Parsing in Real-World Scenarios

Paper • 2603.28130 • Published Mar 30 • 11

fffiloni

posted an update about 2 months ago

Post

2874

✅ Back up and running!

My TIGER app is now fully working again, with fixes and full compatibility with Gradio 6 🚀

It lets you:
- 🎙️ Separate multiple speakers from an audio file
- 🎬 Extract each speaker directly from a video
- 🎧 Split audio into dialog, music, and sound effects (DnR)
- 🎥 Apply DnR separation directly on videos

All powered by lightweight TIGER models for fast and efficient speech separation.

Try it here 👉 fffiloni/TIGER-audio-extraction

AI & ML interests

Recent Activity

Team members 144

dev-mode-explorers's activity