AI & ML interests

None defined yet.

Recent Activity

nouamanetaziย 
posted an update 25 days ago
view post
Post
3844
After training ๐’๐ฆ๐จ๐ฅ๐‹๐Œ๐Ÿ‘ on ๐Ÿ‘๐Ÿ–๐Ÿ’ ๐‡๐Ÿ๐ŸŽ๐ŸŽ๐ฌ for nearly a month, I've come to realize something most people overlook: ๐ข๐ง๐Ÿ๐ซ๐š๐ฌ๐ญ๐ซ๐ฎ๐œ๐ญ๐ฎ๐ซ๐ž ๐ข๐ฌ ๐ญ๐ก๐ž ๐ฆ๐š๐ค๐ž-๐จ๐ซ-๐›๐ซ๐ž๐š๐ค ๐Ÿ๐š๐œ๐ญ๐จ๐ซ ๐ข๐ง ๐‹๐‹๐Œ ๐ญ๐ซ๐š๐ข๐ง๐ข๐ง๐ . ๐Ÿ”ฅ

Everyone talks about model architecture and data quality. And yes, those matter immensely. But here's what nobody tells you: when your training run fails at 2 AM because of mysterious ๐๐‚๐‚๐‹ ๐ž๐ซ๐ซ๐จ๐ซ๐ฌ, or when your expensive GPU cluster is running at ๐Ÿ”๐ŸŽ% ๐ž๐Ÿ๐Ÿ๐ข๐œ๐ข๐ž๐ง๐œ๐ฒ, the problem isn't your model. It's most probably a ๐ฆ๐ข๐ฌ๐ฎ๐ฌ๐ž ๐จ๐Ÿ ๐ญ๐ก๐ž ๐ก๐š๐ซ๐๐ฐ๐š๐ซ๐ž. ๐Ÿ› ๏ธ

Questions that seemed simple but had no clear answers: Why is ๐Œ๐จ๐„ ๐ญ๐ซ๐š๐ข๐ง๐ข๐ง๐  ๐ฌ๐ฅ๐จ๐ฐ๐ž๐ซ ๐ญ๐ก๐š๐ง ๐๐ž๐ง๐ฌ๐ž ๐ฆ๐จ๐๐ž๐ฅ๐ฌ? Which ๐๐‚๐‚๐‹ ๐Ÿ๐ฅ๐š๐ ๐ฌ should we actually set? How often should we checkpoint without killing throughput?

That's why we built ๐“๐ก๐ž ๐’๐ฆ๐จ๐ฅ ๐“๐ซ๐š๐ข๐ง๐ข๐ง๐  ๐๐ฅ๐š๐ฒ๐›๐จ๐จ๐ค ๐Ÿ“–: a complete guide covering everything from model architecture and data curation to the SmolLM3 training marathon, post-training techniques, and crucially, the ๐ข๐ง๐Ÿ๐ซ๐š๐ฌ๐ญ๐ซ๐ฎ๐œ๐ญ๐ฎ๐ซ๐ž ๐ฅ๐š๐ฒ๐ž๐ซ that most teams get wrong.

We validated real vs theoretical bandwidth across the entire stack: ๐‡๐๐Œ๐Ÿ‘ ๐ก๐ข๐ญ๐ญ๐ข๐ง๐  ๐Ÿ‘ ๐“๐/๐ฌ, ๐๐•๐‹๐ข๐ง๐ค ๐Ÿ’.๐ŸŽ ๐ซ๐ž๐š๐œ๐ก๐ข๐ง๐  ๐Ÿ•๐Ÿ–๐Ÿ” ๐†๐/๐ฌ, ๐๐‚๐ˆ๐ž ๐†๐ž๐ง๐Ÿ’ ๐š๐ญ ๐Ÿ๐Ÿ’.๐Ÿ ๐†๐/๐ฌ. Then we ran collective operations across ๐Ÿ๐Ÿ๐Ÿ– ๐†๐๐”๐ฌ (16 nodes, 8xH100s each) and measured how performance degrades at scale: all-reduce drops from ๐Ÿ’๐Ÿ–๐ŸŽ ๐†๐/๐ฌ on a single node to ๐Ÿ‘๐Ÿ๐ŸŽ-๐Ÿ‘๐Ÿ“๐ŸŽ ๐†๐/๐ฌ across 16 nodes.

If you've ever wondered why your training runs are slower than they should be, or you're planning to scale up and want to avoid expensive mistakes, this guide might save you weeks of debugging.

๐“๐ก๐ž ๐’๐ฆ๐จ๐ฅ ๐“๐ซ๐š๐ข๐ง๐ข๐ง๐  ๐๐ฅ๐š๐ฒ๐›๐จ๐จ๐ค: https://lnkd.in/e5MKXUHS

Shared with โค๏ธ by the HuggingFace team
tomaarsenย 
posted an update about 1 month ago
view post
Post
3852
๐Ÿค— Sentence Transformers is joining Hugging Face! ๐Ÿค— This formalizes the existing maintenance structure, as I've personally led the project for the past two years on behalf of Hugging Face! Details:

Today, the Ubiquitous Knowledge Processing (UKP) Lab is transferring the project to Hugging Face. Sentence Transformers will remain a community-driven, open-source project, with the same open-source license (Apache 2.0) as before. Contributions from researchers, developers, and enthusiasts are welcome and encouraged. The project will continue to prioritize transparency, collaboration, and broad accessibility.

Read our full announcement for more details and quotes from UKP and Hugging Face leadership: https://huggingface.co/blog/sentence-transformers-joins-hf

We see an increasing wish from companies to move from large LLM APIs to local models for better control and privacy, reflected in the library's growth: in just the last 30 days, Sentence Transformer models have been downloaded >270 million times, second only to transformers.

I would like to thank the UKP Lab, and especially Nils Reimers and Iryna Gurevych, both for their dedication to the project and for their trust in myself, both now and two years ago. Back then, neither of you knew me well, yet you trusted me to take the project to new heights. That choice ended up being very valuable for the embedding & Information Retrieval community, and I think this choice of granting Hugging Face stewardship will be similarly successful.

I'm very excited about the future of the project, and for the world of embeddings and retrieval at large!
  • 1 reply
ยท
Molbapย 
posted an update about 2 months ago
view post
Post
3115
๐Ÿš€ New blog: Maintain the unmaintainable โ€“ 1M+ Python LOC, 400+ models

How do you stop a million-line library built by thousands of contributors from collapsing under its own weight?
At ๐Ÿค— Transformers, we do it with explicit software-engineering tenets, principles that make the codebase hackable at scale.

๐Ÿ” Inside the post:
โ€“ One Model, One File: readability first โ€” you can still open a modeling file and see the full logic, top to bottom.
โ€“ Modular Transformers: visible inheritance that cuts maintenance cost by ~15ร— while keeping models readable.
โ€“ Config-Driven Performance: FlashAttention, tensor parallelism, and attention scheduling are config-level features, not rewrites.

Written with @lysandre ,@pcuenq and @yonigozlan , this is a deep dive into how Transformers stays fast, open, and maintainable.

Read it here โ†’ transformers-community/Transformers-tenets
lysandreย 
posted an update 2 months ago
view post
Post
6771
We're kick-starting the process of Transformers v5, with @ArthurZ and @cyrilvallez !

v5 should be significant: we're using it as a milestone for performance optimizations, saner defaults, and a much cleaner code base worthy of 2025.

Fun fact: v4.0.0-rc-1 came out on Nov 19, 2020, nearly five years ago!
  • 6 replies
ยท
tomaarsenย 
posted an update 3 months ago
view post
Post
5653
ModernBERT goes MULTILINGUAL! One of the most requested models I've seen, The Johns Hopkins University's CLSP has trained state-of-the-art massively multilingual encoders using the ModernBERT architecture: mmBERT.

Model details:
- 2 model sizes:
- jhu-clsp/mmBERT-small
- jhu-clsp/mmBERT-base
- Uses the ModernBERT architecture, but with the Gemma2 multilingual tokenizer (so: flash attention, alternating global/local attention, unpadding/sequence packing, etc.)
- Maximum sequence length of 8192 tokens, on the high end for encoders
- Trained on 1833 languages using DCLM, FineWeb2, and many more sources
- 3 training phases: 2.3T tokens pretraining on 60 languages, 600B tokens mid-training on 110 languages, and 100B tokens decay training on all 1833 languages.
- Both models are MIT Licensed, and the full datasets and intermediary checkpoints are also publicly released

Evaluation details:
- Very competitive with ModernBERT at equivalent sizes on English (GLUE, MTEB v2 English after finetuning)
- Consistently outperforms equivalently sized models on all Multilingual tasks (XTREME, classification, MTEB v2 Multilingual after finetuning)
- In short: beats commonly used multilingual base models like mDistilBERT, XLM-R (multilingual RoBERTa), multilingual MiniLM, etc.
- Additionally: the ModernBERT-based mmBERT is much faster than the alternatives due to its architectural benefits. Easily up to 2x throughput in common scenarios.

Check out the full blogpost with more details. It's super dense & gets straight to the point: https://huggingface.co/blog/mmbert

Based on these results, mmBERT should be the new go-to multilingual encoder base models at 300M and below. Do note that the mmBERT models are "base" models, i.e. they're currently only trained to perform Mask Filling. They'll need to be finetuned for downstream tasks like semantic search, classification, clustering, etc.
Xenovaย 
posted an update 3 months ago
view post
Post
10403
Okay this is insane... WebGPU-accelerated semantic video tracking, powered by DINOv3 and Transformers.js! ๐Ÿคฏ
Demo (+ source code): webml-community/DINOv3-video-tracking

This will revolutionize AI-powered video editors... which can now run 100% locally in your browser, no server inference required (costs $0)! ๐Ÿ˜

How does it work? ๐Ÿค”
1๏ธโƒฃ Generate and cache image features for each frame
2๏ธโƒฃ Create a list of embeddings for selected patch(es)
3๏ธโƒฃ Compute cosine similarity between each patch and the selected patch(es)
4๏ธโƒฃ Highlight those whose score is above some threshold

... et voilร ! ๐Ÿฅณ

You can also make selections across frames to improve temporal consistency! This is super useful if the object changes its appearance slightly throughout the video.

Excited to see what the community builds with it!
  • 1 reply
ยท
albertvillanovaย 
posted an update 3 months ago
view post
Post
3974
Latest smolagents release supports GPT-5: build agents that think, plan, and act.
โšก Upgrade now and put GPT-5 to work!
albertvillanovaย 
posted an update 3 months ago
view post
Post
605
๐Ÿš€ smolagents v1.21.0 is here!
Now with improved safety in the local Python executor: dunder calls are blocked!
โš ๏ธ Still, not fully isolated: for untrusted code, use a remote executor instead: Docker, E2B, Wasm.
โœจ Many bug fixes: more reliable code.
๐Ÿ‘‰ https://github.com/huggingface/smolagents/releases/tag/v1.21.0
Xenovaย 
posted an update 4 months ago
view post
Post
4458
The next generation of AI-powered websites is going to be WILD! ๐Ÿคฏ

In-browser tool calling & MCP is finally here, allowing LLMs to interact with websites programmatically.

To show what's possible, I built a demo using Liquid AI's new LFM2 model, powered by ๐Ÿค— Transformers.js: LiquidAI/LFM2-WebGPU

As always, the demo is open source (which you can find under the "Files" tab), so I'm excited to see how the community builds upon this! ๐Ÿš€
  • 2 replies
ยท
tomaarsenย 
posted an update 4 months ago
view post
Post
4410
๐Ÿ˜Ž I just published Sentence Transformers v5.1.0, and it's a big one. 2x-3x speedups of SparseEncoder models via ONNX and/or OpenVINO backends, easier distillation data preparation with hard negatives mining, and more:

1๏ธโƒฃ Faster ONNX and OpenVINO backends for SparseEncoder models
Usage is as simple as backend="onnx" or backend="openvino" when initializing a SparseEncoder to get started, but I also included utility functions for optimization, dynamic quantization, and static quantization, plus benchmarks.

2๏ธโƒฃ New n-tuple-scores output format from mine_hard_negatives
This new output format is immediately compatible with the MarginMSELoss and SparseMarginMSELoss for training SentenceTransformer, CrossEncoder, and SparseEncoder losses.

3๏ธโƒฃ Gathering across devices
When doing multi-GPU training using a loss that has in-batch negatives (e.g. MultipleNegativesRankingLoss), you can now use gather_across_devices=True to load in-batch negatives from the other devices too! Essentially a free lunch, pretty big impact potential in my evals.

4๏ธโƒฃ Trackio support
If you also upgrade transformers, and you install trackio with pip install trackio, then your experiments will also automatically be tracked locally with trackio. Just open up localhost and have a look at your losses/evals, no logins, no metric uploading.

5๏ธโƒฃ MTEB Documentation
We've added some documentation on evaluating SentenceTransformer models properly with MTEB. It's rudimentary as the documentation on the MTEB side is already great, but it should get you started.

Plus many more smaller features & fixes (crash fixes, compatibility with datasets v4, FIPS compatibility, etc.).

See the full release notes here: https://github.com/UKPLab/sentence-transformers/releases/tag/v5.1.0

Big thanks to all of the contributors for helping with the release, many of the features from this release were proposed by others. I have a big list of future potential features that I'd love to add, but I'm
Wauplinย 
posted an update 4 months ago
view post
Post
3208
Say hello to hf: a faster, friendlier Hugging Face CLI โœจ

We are glad to announce a long-awaited quality-of-life improvement: the Hugging Face CLI has been officially renamed from huggingface-cli to hf!

So... why this change?

Typing huggingface-cli constantly gets old fast. More importantly, the CLIโ€™s command structure became messy as new features were added over time (upload, download, cache management, repo management, etc.). Renaming the CLI is a chance to reorganize commands into a clearer, more consistent format.

We decided not to reinvent the wheel and instead follow a well-known CLI pattern: hf <resource> <action>. Isn't hf auth login easier to type and remember?

The full rationale, implementation details, and migration notes are in the blog post: https://huggingface.co/blog/hf-cli

Xenovaย 
posted an update 4 months ago
view post
Post
3357
Introducing Voxtral WebGPU: State-of-the-art audio transcription directly in your browser! ๐Ÿคฏ
๐Ÿ—ฃ๏ธ Transcribe videos, meeting notes, songs and more
๐Ÿ” Runs on-device, meaning no data is sent to a server
๐ŸŒŽ Multilingual (8 languages)
๐Ÿค— Completely free (forever) & open source

That's right, we're running Mistral's new Voxtral-Mini-3B model 100% locally in-browser on WebGPU, powered by Transformers.js and ONNX Runtime Web! ๐Ÿ”ฅ

Try it out yourself! ๐Ÿ‘‡
webml-community/Voxtral-WebGPU
sayakpaulย 
posted an update 4 months ago
view post
Post
1796
Fast LoRA inference for Flux with Diffusers and PEFT ๐Ÿšจ

There are great materials that demonstrate how to optimize inference for popular image generation models, such as Flux. However, very few cover how to serve LoRAs fast, despite LoRAs being an inseparable part of their adoption.

In our latest post, @BenjaminB and I show different techniques to optimize LoRA inference for the Flux family of models for image generation. Our recipe includes the use of:

1. torch.compile
2. Flash Attention 3 (when compatible)
3. Dynamic FP8 weight quantization (when compatible)
4. Hotswapping for avoiding recompilation during swapping new LoRAs ๐Ÿคฏ

We have tested our recipe with Flux.1-Dev on both H100 and RTX 4090. We achieve at least a *2x speedup* in either of the GPUs. We believe our recipe is grounded in the reality of how LoRA-based use cases are generally served. So, we hope this will be beneficial to the community ๐Ÿค—

Even though our recipe was tested primarily with NVIDIA GPUs, it should also work with AMD GPUs.

Learn the details and the full code here:
https://huggingface.co/blog/lora-fast
albertvillanovaย 
posted an update 5 months ago
view post
Post
737
๐Ÿš€ New in smolagents v1.20.0: Remote Python Execution via WebAssembly (Wasm)

We've just merged a major new capability into the smolagents framework: the CodeAgent can now execute Python code remotely in a secure, sandboxed WebAssembly environment!

๐Ÿ”ง Powered by Pyodide and Deno, this new WasmExecutor lets your agent-generated Python code run safely: without relying on Docker or local execution.

Why this matters:
โœ… Isolated execution = no host access
โœ… No need for Python on the user's machine
โœ… Safer evaluation of arbitrary code
โœ… Compatible with serverless / edge agent workloads
โœ… Ideal for constrained or untrusted environments

This is just the beginning: a focused initial implementation with known limitations. A solid MVP designed for secure, sandboxed use cases. ๐Ÿ’ก

๐Ÿ’ก We're inviting the open-source community to help evolve this executor:
โ€ข Tackle more advanced Python features
โ€ข Expand compatibility
โ€ข Add test coverage
โ€ข Shape the next-gen secure agent runtime

๐Ÿ”— Check out the PR: https://github.com/huggingface/smolagents/pull/1261

Let's reimagine what agent-driven Python execution can look like: remote-first, wasm-secure, and community-built.

This feature is live in smolagents v1.20.0!
Try it out.
Break things. Extend it. Give us feedback.
Let's build safer, smarter agents; together ๐Ÿง โš™๏ธ

๐Ÿ‘‰ https://github.com/huggingface/smolagents/releases/tag/v1.20.0

#smolagents #WebAssembly #Python #AIagents #Pyodide #Deno #OpenSource #HuggingFace #AgenticAI
tomaarsenย 
posted an update 5 months ago
view post
Post
3086
โ€ผ๏ธSentence Transformers v5.0 is out! The biggest update yet introduces Sparse Embedding models, encode methods improvements, Router module for asymmetric models & much more. Sparse + Dense = ๐Ÿ”ฅ hybrid search performance! Details:

1๏ธโƒฃ Sparse Encoder Models
Brand new support for sparse embedding models that generate high-dimensional embeddings (30,000+ dims) where <1% are non-zero:

- Full SPLADE, Inference-free SPLADE, and CSR architecture support
- 4 new modules, 12 new losses, 9 new evaluators
- Integration with @elastic-co , @opensearch-project , @NAVER LABS Europe, @qdrant , @IBM , etc.
- Decode interpretable embeddings to understand token importance
- Hybrid search integration to get the best of both worlds

2๏ธโƒฃ Enhanced Encode Methods & Multi-Processing
- Introduce encode_query & encode_document automatically use predefined prompts
- No more manual pool management - just pass device list directly to encode()
- Much cleaner and easier to use than the old multi-process approach

3๏ธโƒฃ Router Module & Advanced Training
- Router module with different processing paths for queries vs documents
- Custom learning rates for different parameter groups
- Composite loss logging - see individual loss components
- Perfect for two-tower architectures

4๏ธโƒฃ Comprehensive Documentation & Training
- New Training Overview, Loss Overview, API Reference docs
- 6 new training example documentation pages
- Full integration examples with major search engines
- Extensive blogpost on training sparse models

Read the comprehensive blogpost about training sparse embedding models: https://huggingface.co/blog/train-sparse-encoder

See the full release notes here: https://github.com/UKPLab/sentence-transformers/releases/v5.0.0

What's next? We would love to hear from the community! What sparse encoder models would you like to see? And what new capabilities should Sentence Transformers handle - multimodal embeddings, late interaction models, or something else? Your feedback shapes our roadmap!
albertvillanovaย 
posted an update 5 months ago
view post
Post
1777
๐Ÿš€ SmolAgents v1.19.0 is live!
This release brings major improvements to agent flexibility, UI usability, streaming architecture, and developer experience: making it easier than ever to build smart, interactive AI agents. Here's what's new:

๐Ÿ”ง Agent Upgrades
- Support for managed agents in ToolCallingAgent
- Context manager support for cleaner agent lifecycle handling
- Output formatting now uses XML tags for consistency

๐Ÿ–ฅ๏ธ UI Enhancements
- GradioUI now supports reset_agent_memory: perfect for fresh starts in dev & demos.

๐Ÿ”„ Streaming Refactor
- Streaming event aggregation moved off the Model class
- โžก๏ธ Better architecture & maintainability

๐Ÿ“ฆ Output Tracking
- CodeAgent outputs are now stored in ActionStep
- โœ… More visibility and structure to agent decisions

๐Ÿ› Bug Fixes
- Smarter planning logic
- Cleaner Docker logs
- Better prompt formatting for additional_args
- Safer internal functions and final answer matching

๐Ÿ“š Docs Improvements
- Added quickstart examples with tool usage
- One-click Colab launch buttons
- Expanded reference docs (AgentMemory, GradioUI docstrings)
- Fixed broken links and migrated to .md format

๐Ÿ”— Full release notes:
https://github.com/huggingface/smolagents/releases/tag/v1.19.0

๐Ÿ’ฌ Try it out, explore the new features, and let us know what you build!

#smolagents #opensource #AIagents #LLM #HuggingFace