It's in person unfortunately.. And the next stop of this serie is in India, not more convenient for brazilian folks :/
Simon Pagezy
pagezyhf
AI & ML interests
Healthcare ML
Recent Activity
replied to
their
post
about 18 hours ago
Hackathons in Paris on July 5th and 6th!
Hugging Face just wrapped 4 months of deep work with AMD to push kernel-level optimization on their MI300X GPUs. Now, it's time to share everything we learned.
Join us in Paris at STATION F for a hands-on weekend of workshops and a hackathon focused on making open-source LLMs faster and more efficient on AMD.
Prizes, amazing host speakers, ... if you want more details, navigate to https://lu.ma/fmvdjmur!
upvoted
an
article
about 19 hours ago
The Anthropic Ruling: Why AI Training Just Got Legal (But Piracy Didn't)
posted
an
update
1 day ago
Hackathons in Paris on July 5th and 6th!
Hugging Face just wrapped 4 months of deep work with AMD to push kernel-level optimization on their MI300X GPUs. Now, it's time to share everything we learned.
Join us in Paris at STATION F for a hands-on weekend of workshops and a hackathon focused on making open-source LLMs faster and more efficient on AMD.
Prizes, amazing host speakers, ... if you want more details, navigate to https://lu.ma/fmvdjmur!
Organizations

replied to
their
post
about 18 hours ago

posted
an
update
1 day ago
Post
2194
Hackathons in Paris on July 5th and 6th!
Hugging Face just wrapped 4 months of deep work with AMD to push kernel-level optimization on their MI300X GPUs. Now, it's time to share everything we learned.
Join us in Paris at STATION F for a hands-on weekend of workshops and a hackathon focused on making open-source LLMs faster and more efficient on AMD.
Prizes, amazing host speakers, ... if you want more details, navigate to https://lu.ma/fmvdjmur!
Hugging Face just wrapped 4 months of deep work with AMD to push kernel-level optimization on their MI300X GPUs. Now, it's time to share everything we learned.
Join us in Paris at STATION F for a hands-on weekend of workshops and a hackathon focused on making open-source LLMs faster and more efficient on AMD.
Prizes, amazing host speakers, ... if you want more details, navigate to https://lu.ma/fmvdjmur!

posted
an
update
9 days ago
Post
2353
Webinar Alert
Build your first chatbot with a Hugging Face Spaces frontend and Gaudi-powered backend with @bconsolvo ! He will teach you how to build an LLM-powered chatbot using Streamlit and Hugging Face Spaces—integrating a model endpoint hosted on an Intel® Gaudi® accelerator.
Beginners are welcome
https://web.cvent.com/event/70e11f23-7c52-4994-a918-96fa9d5e935f/summary
Build your first chatbot with a Hugging Face Spaces frontend and Gaudi-powered backend with @bconsolvo ! He will teach you how to build an LLM-powered chatbot using Streamlit and Hugging Face Spaces—integrating a model endpoint hosted on an Intel® Gaudi® accelerator.
Beginners are welcome
https://web.cvent.com/event/70e11f23-7c52-4994-a918-96fa9d5e935f/summary

reacted to
AdinaY's
post with 🔥
about 2 months ago
Post
1877
Xiaomi just entered the open source as a new player🔥 And dropped MiMo - a 7B model trained from scratch for reasoning.
XiaomiMiMo/MiMo-7B-RL
✨ 7B - Base/RL/SFT/RL zero
✨ Surpasses 32B models in math & code
✨ Apache 2.0 licensed
XiaomiMiMo/MiMo-7B-RL
✨ 7B - Base/RL/SFT/RL zero
✨ Surpasses 32B models in math & code
✨ Apache 2.0 licensed

reacted to
prithivMLmods's
post with 🔥
2 months ago
Post
1300
Dropping the domain-specific downstream image classification content moderation models, including the anime image type classification, GeoSceneNet, indoor-outdoor scene classification, and black-and-white vs. colored image classification models, along with the datasets. 🔥
╰┈➤Models :
+ GeoSceneNet : prithivMLmods/Multilabel-GeoSceneNet
+ IndoorOutdoorNet : prithivMLmods/IndoorOutdoorNet
+ B&W vs Colored : prithivMLmods/BnW-vs-Colored-Detection
+ Anime Image Type : prithivMLmods/Anime-Classification-v1.0
+ Multilabel Portrait : prithivMLmods/Multilabel-Portrait-SigLIP2
╰┈➤Datasets :
- GeoSceneNet : prithivMLmods/Multilabel-GeoSceneNet-16K
- IndoorOutdoorNet : prithivMLmods/IndoorOutdoorNet-20K
- BnW vs Colored : prithivMLmods/BnW-vs-Colored-10K
- Multilabel Portrait : prithivMLmods/Multilabel-Portrait-18K
╰┈➤Collections :
> Multilabel Image Classification Datasets : prithivMLmods/multilabel-image-classification-datasets-6809aa64637f45d4c47fa6ca
> Model Collection : prithivMLmods/siglip2-content-filters-models-v2-68053a958c42ef17a3a3f4d1
For raw ZIP files or more information about the datasets, visit: https://www.kaggle.com/prithivsakthiur/datasets
╰┈➤Models :
+ GeoSceneNet : prithivMLmods/Multilabel-GeoSceneNet
+ IndoorOutdoorNet : prithivMLmods/IndoorOutdoorNet
+ B&W vs Colored : prithivMLmods/BnW-vs-Colored-Detection
+ Anime Image Type : prithivMLmods/Anime-Classification-v1.0
+ Multilabel Portrait : prithivMLmods/Multilabel-Portrait-SigLIP2
╰┈➤Datasets :
- GeoSceneNet : prithivMLmods/Multilabel-GeoSceneNet-16K
- IndoorOutdoorNet : prithivMLmods/IndoorOutdoorNet-20K
- BnW vs Colored : prithivMLmods/BnW-vs-Colored-10K
- Multilabel Portrait : prithivMLmods/Multilabel-Portrait-18K
╰┈➤Collections :
> Multilabel Image Classification Datasets : prithivMLmods/multilabel-image-classification-datasets-6809aa64637f45d4c47fa6ca
> Model Collection : prithivMLmods/siglip2-content-filters-models-v2-68053a958c42ef17a3a3f4d1
Note: The anime scene type dataset is not mentioned in the list because it is private and only accessible to members of the DeepGHS organization.
For raw ZIP files or more information about the datasets, visit: https://www.kaggle.com/prithivsakthiur/datasets

posted
an
update
2 months ago
Post
1991
If you haven't had the chance to test the latest open model from Meta, Llama 4 Maverick, go try it on AMD MI 300 on Hugging Face!
amd/llama4-maverick-17b-128e-mi-amd
amd/llama4-maverick-17b-128e-mi-amd

reacted to
fdaudens's
post with 🤯➕
3 months ago

reacted to
as-cle-bert's
post with 🔥
3 months ago
Post
2965
Llama-4 is out and I couldn't resist but to cook something with it... So I came up with 𝐋𝐥𝐚𝐦𝐚𝐑𝐞𝐬𝐞𝐚𝐫𝐜𝐡𝐞𝐫 (https://llamaresearcher.com), your deep-research AI companion!🔎
The workflow behind 𝗟𝗹𝗮𝗺𝗮𝗥𝗲𝘀𝗲𝗮𝗿𝗰𝗵𝗲𝗿 is simple:
💬 You submit a query
🛡️ Your query is evaluated by Llama 3 guard model, which deems it safe or unsafe
🧠 If your query is safe, it is routed to the Researcher Agent
⚙️ The Researcher Agent expands the query into three sub-queries, with which to search the web
🌐 The web is searched for each of the sub-queries
📊 The retrieved information is evaluated for relevancy against your original query
✍️ The Researcher Agent produces an essay based on the information it gathered, paying attention to referencing its sources
The agent itself is also built with easy-to-use and intuitive blocks:
🦙 LlamaIndex provides the agentic architecture and the integrations with the language models
⚡Groq makes Llama-4 available with its lightning-fast inference
🔎 Linkup allows the agent to deep-search the web and provides sourced answers
💪 FastAPI does the heavy loading with wrapping everything within an elegant API interface
⏱️ Redis is used for API rate limiting
🎨 Gradio creates a simple but powerful user interface
Special mention also to Lovable, which helped me build the first draft of the landing page for LlamaResearcher!💖
If you're curious and you want to try LlamaResearcher, you can - completely for free and without subscription - for 30 days from now ➡️ https://llamaresearcher.com
And if you're like me, and you like getting your hands in code and build stuff on your own machine, I have good news: this is all open-source, fully reproducible locally and Docker-ready🐋
Just go to the GitHub repo: https://github.com/AstraBert/llama-4-researcher and don't forget to star it, if you find it useful!⭐
As always, have fun and feel free to leave your feedback✨
The workflow behind 𝗟𝗹𝗮𝗺𝗮𝗥𝗲𝘀𝗲𝗮𝗿𝗰𝗵𝗲𝗿 is simple:
💬 You submit a query
🛡️ Your query is evaluated by Llama 3 guard model, which deems it safe or unsafe
🧠 If your query is safe, it is routed to the Researcher Agent
⚙️ The Researcher Agent expands the query into three sub-queries, with which to search the web
🌐 The web is searched for each of the sub-queries
📊 The retrieved information is evaluated for relevancy against your original query
✍️ The Researcher Agent produces an essay based on the information it gathered, paying attention to referencing its sources
The agent itself is also built with easy-to-use and intuitive blocks:
🦙 LlamaIndex provides the agentic architecture and the integrations with the language models
⚡Groq makes Llama-4 available with its lightning-fast inference
🔎 Linkup allows the agent to deep-search the web and provides sourced answers
💪 FastAPI does the heavy loading with wrapping everything within an elegant API interface
⏱️ Redis is used for API rate limiting
🎨 Gradio creates a simple but powerful user interface
Special mention also to Lovable, which helped me build the first draft of the landing page for LlamaResearcher!💖
If you're curious and you want to try LlamaResearcher, you can - completely for free and without subscription - for 30 days from now ➡️ https://llamaresearcher.com
And if you're like me, and you like getting your hands in code and build stuff on your own machine, I have good news: this is all open-source, fully reproducible locally and Docker-ready🐋
Just go to the GitHub repo: https://github.com/AstraBert/llama-4-researcher and don't forget to star it, if you find it useful!⭐
As always, have fun and feel free to leave your feedback✨

posted
an
update
5 months ago
Post
1759
We published https://huggingface.co/blog/deepseek-r1-aws!
If you are using AWS, give a read. It is a running document to showcase how to deploy and fine-tune DeepSeek R1 models with Hugging Face on AWS.
We're working hard to enable all the scenarios, whether you want to deploy to Inference Endpoints, Sagemaker or EC2; with GPUs or with Trainium & Inferentia.
We have full support for the distilled models, DeepSeek-R1 support is coming soon!! I'll keep you posted.
Cheers
If you are using AWS, give a read. It is a running document to showcase how to deploy and fine-tune DeepSeek R1 models with Hugging Face on AWS.
We're working hard to enable all the scenarios, whether you want to deploy to Inference Endpoints, Sagemaker or EC2; with GPUs or with Trainium & Inferentia.
We have full support for the distilled models, DeepSeek-R1 support is coming soon!! I'll keep you posted.
Cheers

reacted to
m-ric's
post with 🚀
5 months ago
Post
4123
𝗧𝗵𝗲 𝗛𝘂𝗯 𝘄𝗲𝗹𝗰𝗼𝗺𝗲𝘀 𝗲𝘅𝘁𝗲𝗿𝗻𝗮𝗹 𝗶𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗽𝗿𝗼𝘃𝗶𝗱𝗲𝗿𝘀!
✅ Hosting our own inference was not enough: now the Hub 4 new inference providers: fal, Replicate, SambaNova Systems, & Together AI.
Check model cards on the Hub: you can now, in 1 click, use inference from various providers (cf video demo)
Their inference can also be used through our Inference API client. There, you can use either your custom provider key, or your HF token, then billing will be handled directly on your HF account, as a way to centralize all expenses.
💸 Also, PRO users get 2$ inference credits per month!
Read more in the announcement 👉 https://huggingface.co/blog/inference-providers
✅ Hosting our own inference was not enough: now the Hub 4 new inference providers: fal, Replicate, SambaNova Systems, & Together AI.
Check model cards on the Hub: you can now, in 1 click, use inference from various providers (cf video demo)
Their inference can also be used through our Inference API client. There, you can use either your custom provider key, or your HF token, then billing will be handled directly on your HF account, as a way to centralize all expenses.
💸 Also, PRO users get 2$ inference credits per month!
Read more in the announcement 👉 https://huggingface.co/blog/inference-providers

reacted to
merve's
post with 👍
5 months ago
Post
5425
Oof, what a week! 🥵 So many things have happened, let's recap!
merve/jan-24-releases-6793d610774073328eac67a9
Multimodal 💬
- We have released SmolVLM -- tiniest VLMs that come in 256M and 500M, with it's retrieval models ColSmol for multimodal RAG 💗
- UI-TARS are new models by ByteDance to unlock agentic GUI control 🤯 in 2B, 7B and 72B
- Alibaba DAMO lab released VideoLlama3, new video LMs that come in 2B and 7B
- MiniMaxAI released Minimax-VL-01, where decoder is based on MiniMax-Text-01 456B MoE model with long context
- Dataset: Yale released a new benchmark called MMVU
- Dataset: CAIS released Humanity's Last Exam (HLE) a new challenging MM benchmark
LLMs 📖
- DeepSeek-R1 & DeepSeek-R1-Zero: gigantic 660B reasoning models by DeepSeek, and six distilled dense models, on par with o1 with MIT license! 🤯
- Qwen2.5-Math-PRM: new math models by Qwen in 7B and 72B
- NVIDIA released AceMath and AceInstruct, new family of models and their datasets (SFT and reward ones too!)
Audio 🗣️
- Llasa is a new speech synthesis model based on Llama that comes in 1B,3B, and 8B
- TangoFlux is a new audio generation model trained from scratch and aligned with CRPO
Image/Video/3D Generation ⏯️
- Flex.1-alpha is a new 8B pre-trained diffusion model by ostris similar to Flux
- tencent released Hunyuan3D-2, new 3D asset generation from images
Multimodal 💬
- We have released SmolVLM -- tiniest VLMs that come in 256M and 500M, with it's retrieval models ColSmol for multimodal RAG 💗
- UI-TARS are new models by ByteDance to unlock agentic GUI control 🤯 in 2B, 7B and 72B
- Alibaba DAMO lab released VideoLlama3, new video LMs that come in 2B and 7B
- MiniMaxAI released Minimax-VL-01, where decoder is based on MiniMax-Text-01 456B MoE model with long context
- Dataset: Yale released a new benchmark called MMVU
- Dataset: CAIS released Humanity's Last Exam (HLE) a new challenging MM benchmark
LLMs 📖
- DeepSeek-R1 & DeepSeek-R1-Zero: gigantic 660B reasoning models by DeepSeek, and six distilled dense models, on par with o1 with MIT license! 🤯
- Qwen2.5-Math-PRM: new math models by Qwen in 7B and 72B
- NVIDIA released AceMath and AceInstruct, new family of models and their datasets (SFT and reward ones too!)
Audio 🗣️
- Llasa is a new speech synthesis model based on Llama that comes in 1B,3B, and 8B
- TangoFlux is a new audio generation model trained from scratch and aligned with CRPO
Image/Video/3D Generation ⏯️
- Flex.1-alpha is a new 8B pre-trained diffusion model by ostris similar to Flux
- tencent released Hunyuan3D-2, new 3D asset generation from images

reacted to
burtenshaw's
post with 🔥
5 months ago
Post
49581
We’re launching a FREE and CERTIFIED course on Agents!
We're thrilled to announce the launch of the Hugging Face Agents course on Learn! This interactive, certified course will guide you through building and deploying your own AI agents.
Here's what you'll learn:
- Understanding Agents: We'll break down the fundamentals of AI agents, showing you how they use LLMs to perceive their environment (observations), reason about it (thoughts), and take actions. Think of a smart assistant that can book appointments, answer emails, or even write code based on your instructions.
- Building with Frameworks: You'll dive into popular agent frameworks like LangChain, LlamaIndex and smolagents. These tools provide the building blocks for creating complex agent behaviors.
- Real-World Applications: See how agents are used in practice, from automating SQL queries to generating code and summarizing complex documents.
- Certification: Earn a certification by completing the course modules, implementing a use case, and passing a benchmark assessment. This proves your skills in building and deploying AI agents.
Audience
This course is designed for anyone interested in the future of AI. Whether you're a developer, data scientist, or simply curious about AI, this course will equip you with the knowledge and skills to build your own intelligent agents.
Enroll today and start building the next generation of AI agent applications!
https://bit.ly/hf-learn-agents
We're thrilled to announce the launch of the Hugging Face Agents course on Learn! This interactive, certified course will guide you through building and deploying your own AI agents.
Here's what you'll learn:
- Understanding Agents: We'll break down the fundamentals of AI agents, showing you how they use LLMs to perceive their environment (observations), reason about it (thoughts), and take actions. Think of a smart assistant that can book appointments, answer emails, or even write code based on your instructions.
- Building with Frameworks: You'll dive into popular agent frameworks like LangChain, LlamaIndex and smolagents. These tools provide the building blocks for creating complex agent behaviors.
- Real-World Applications: See how agents are used in practice, from automating SQL queries to generating code and summarizing complex documents.
- Certification: Earn a certification by completing the course modules, implementing a use case, and passing a benchmark assessment. This proves your skills in building and deploying AI agents.
Audience
This course is designed for anyone interested in the future of AI. Whether you're a developer, data scientist, or simply curious about AI, this course will equip you with the knowledge and skills to build your own intelligent agents.
Enroll today and start building the next generation of AI agent applications!
https://bit.ly/hf-learn-agents

reacted to
mlabonne's
post with 🤗
5 months ago
Post
6619
🆕 LLM Course 2025 edition!
I updated the LLM Scientist roadmap and added a ton of new information and references. It covers training, datasets, evaluation, quantization, and new trends like test-time compute scaling.
The LLM Course has been incredibly popular (41.3k stars!) and I've been touched to receive many, many messages about how it helped people in their careers.
I know how difficult this stuff can be, so I'm super proud of the impact it had. I want to keep updating it in 2025, especially with the LLM Engineer roadmap.
Thanks everyone, hope you'll enjoy it!
💻 LLM Course: https://huggingface.co/blog/mlabonne/llm-course
I updated the LLM Scientist roadmap and added a ton of new information and references. It covers training, datasets, evaluation, quantization, and new trends like test-time compute scaling.
The LLM Course has been incredibly popular (41.3k stars!) and I've been touched to receive many, many messages about how it helped people in their careers.
I know how difficult this stuff can be, so I'm super proud of the impact it had. I want to keep updating it in 2025, especially with the LLM Engineer roadmap.
Thanks everyone, hope you'll enjoy it!
💻 LLM Course: https://huggingface.co/blog/mlabonne/llm-course

reacted to
merve's
post with ❤️
5 months ago
Post
2643
Everything that happened this week in open AI, a recap 🤠
merve/jan-17-releases-678a673a9de4a4675f215bf5
👀 Multimodal
- MiniCPM-o 2.6 is a new sota any-to-any model by OpenBMB
(vision, speech and text!)
- VideoChat-Flash-Qwen2.5-2B is new video multimodal models by OpenGVLab that come in sizes 2B & 7B in resolutions 224 & 448
- ByteDance released larger SA2VA that comes in 26B parameters
- Dataset: VRC-Bench is a new diverse benchmark for multimodal LLM reasoning performance
💬 LLMs
- MiniMax-Text-01 is a new huge language model (456B passive 45.9B active params) by MiniMaxAI with context length of 4M tokens 🤯
- Dataset: Sky-T1-data-17k is a diverse dataset used to train Sky-T1-32B
- kyutai released Helium-1-Preview-2B is a new small multilingual LM
- Wayfarer-12B is a new LLM able to write D&D 🧙🏻♂️
- ReaderLM-v2 is a new HTML parsing model by Jina AI
- Dria released, Dria-Agent-a-3B, new agentic coding model (Pythonic function calling) based on Qwen2.5 Coder
- Unsloth released Phi-4, faster and memory efficient Llama 3.3
🖼️ Vision
- MatchAnything is a new foundation model for matching
- FitDit is a high-fidelity VTON model based on DiT architecture
🗣️ Audio
- OuteTTS-0.3-1B is a new multilingual text-to-speech model with voice cloning and emotion control capabilities
📖 Retrieval
- lightblue released a new reranker based on Qwen2.5 LB-reranker-0.5B-v1.0 that can handle 95+ languages
- cde-small-v2 is a new sota small retrieval model by
@jxm
👀 Multimodal
- MiniCPM-o 2.6 is a new sota any-to-any model by OpenBMB
(vision, speech and text!)
- VideoChat-Flash-Qwen2.5-2B is new video multimodal models by OpenGVLab that come in sizes 2B & 7B in resolutions 224 & 448
- ByteDance released larger SA2VA that comes in 26B parameters
- Dataset: VRC-Bench is a new diverse benchmark for multimodal LLM reasoning performance
💬 LLMs
- MiniMax-Text-01 is a new huge language model (456B passive 45.9B active params) by MiniMaxAI with context length of 4M tokens 🤯
- Dataset: Sky-T1-data-17k is a diverse dataset used to train Sky-T1-32B
- kyutai released Helium-1-Preview-2B is a new small multilingual LM
- Wayfarer-12B is a new LLM able to write D&D 🧙🏻♂️
- ReaderLM-v2 is a new HTML parsing model by Jina AI
- Dria released, Dria-Agent-a-3B, new agentic coding model (Pythonic function calling) based on Qwen2.5 Coder
- Unsloth released Phi-4, faster and memory efficient Llama 3.3
🖼️ Vision
- MatchAnything is a new foundation model for matching
- FitDit is a high-fidelity VTON model based on DiT architecture
🗣️ Audio
- OuteTTS-0.3-1B is a new multilingual text-to-speech model with voice cloning and emotion control capabilities
📖 Retrieval
- lightblue released a new reranker based on Qwen2.5 LB-reranker-0.5B-v1.0 that can handle 95+ languages
- cde-small-v2 is a new sota small retrieval model by
@jxm

posted
an
update
5 months ago
Post
463
Learn how to deploy multiple LoRA adapters on Vertex AI with this blogpost, using Hugging Face Deep Learning Containers on GCP.
https://medium.com/google-cloud/open-models-on-vertex-ai-with-hugging-face-serving-multiple-lora-adapters-on-vertex-ai-e3ceae7b717c
https://medium.com/google-cloud/open-models-on-vertex-ai-with-hugging-face-serving-multiple-lora-adapters-on-vertex-ai-e3ceae7b717c

replied to
singhsidhukuldeep's
post
6 months ago
Sharing the paper link if anyone want to check https://huggingface.co/papers/2412.04637

reacted to
singhsidhukuldeep's
post with 🤯
6 months ago
Post
1659
Excited to share insights from Walmart's groundbreaking semantic search system that revolutionizes e-commerce product discovery!
The team at Walmart Global Technology(the team that I am a part of 😬) has developed a hybrid retrieval system that combines traditional inverted index search with neural embedding-based search to tackle the challenging problem of tail queries in e-commerce.
Key Technical Highlights:
• The system uses a two-tower BERT architecture where one tower processes queries and another processes product information, generating dense vector representations for semantic matching.
• Product information is enriched by combining titles with key attributes like category, brand, color, and gender using special prefix tokens to help the model distinguish different attribute types.
• The neural model leverages DistilBERT with 6 layers and projects the 768-dimensional embeddings down to 256 dimensions using a linear layer, achieving optimal performance while reducing storage and computation costs.
• To improve model training, they implemented innovative negative sampling techniques combining product category matching and token overlap filtering to identify challenging negative examples.
Production Implementation Details:
• The system uses a managed ANN (Approximate Nearest Neighbor) service to enable fast retrieval, achieving 99% recall@20 with just 13ms latency.
• Query embeddings are cached with preset TTL (Time-To-Live) to reduce latency and costs in production.
• The model is exported to ONNX format and served in Java, with custom optimizations like fixed input shapes and GPU acceleration using NVIDIA T4 processors.
Results:
The system showed significant improvements in both offline metrics and live experiments, with:
- +2.84% improvement in NDCG@10 for human evaluation
- +0.54% lift in Add-to-Cart rates in live A/B testing
This is a fantastic example of how modern NLP techniques can be successfully deployed at scale to solve real-world e-
The team at Walmart Global Technology(the team that I am a part of 😬) has developed a hybrid retrieval system that combines traditional inverted index search with neural embedding-based search to tackle the challenging problem of tail queries in e-commerce.
Key Technical Highlights:
• The system uses a two-tower BERT architecture where one tower processes queries and another processes product information, generating dense vector representations for semantic matching.
• Product information is enriched by combining titles with key attributes like category, brand, color, and gender using special prefix tokens to help the model distinguish different attribute types.
• The neural model leverages DistilBERT with 6 layers and projects the 768-dimensional embeddings down to 256 dimensions using a linear layer, achieving optimal performance while reducing storage and computation costs.
• To improve model training, they implemented innovative negative sampling techniques combining product category matching and token overlap filtering to identify challenging negative examples.
Production Implementation Details:
• The system uses a managed ANN (Approximate Nearest Neighbor) service to enable fast retrieval, achieving 99% recall@20 with just 13ms latency.
• Query embeddings are cached with preset TTL (Time-To-Live) to reduce latency and costs in production.
• The model is exported to ONNX format and served in Java, with custom optimizations like fixed input shapes and GPU acceleration using NVIDIA T4 processors.
Results:
The system showed significant improvements in both offline metrics and live experiments, with:
- +2.84% improvement in NDCG@10 for human evaluation
- +0.54% lift in Add-to-Cart rates in live A/B testing
This is a fantastic example of how modern NLP techniques can be successfully deployed at scale to solve real-world e-

reacted to
julien-c's
post with 🔥❤️
7 months ago
Post
10828
After some heated discussion 🔥, we clarify our intent re. storage limits on the Hub
TL;DR:
- public storage is free, and (unless blatant abuse) unlimited. We do ask that you consider upgrading to PRO and/or Enterprise Hub if possible
- private storage is paid above a significant free tier (1TB if you have a paid account, 100GB otherwise)
docs: https://huggingface.co/docs/hub/storage-limits
We optimize our infrastructure continuously to scale our storage for the coming years of growth in Machine learning, to the benefit of the community 🔥
cc: @reach-vb @pierric @victor and the HF team
TL;DR:
- public storage is free, and (unless blatant abuse) unlimited. We do ask that you consider upgrading to PRO and/or Enterprise Hub if possible
- private storage is paid above a significant free tier (1TB if you have a paid account, 100GB otherwise)
docs: https://huggingface.co/docs/hub/storage-limits
We optimize our infrastructure continuously to scale our storage for the coming years of growth in Machine learning, to the benefit of the community 🔥
cc: @reach-vb @pierric @victor and the HF team