AI & ML interests

None defined yet.

Recent Activity

albertvillanovaĀ 
posted an update 2 days ago
view post
Post
839
šŸš€ SmolAgents v1.19.0 is live!
This release brings major improvements to agent flexibility, UI usability, streaming architecture, and developer experience: making it easier than ever to build smart, interactive AI agents. Here's what's new:

šŸ”§ Agent Upgrades
- Support for managed agents in ToolCallingAgent
- Context manager support for cleaner agent lifecycle handling
- Output formatting now uses XML tags for consistency

šŸ–„ļø UI Enhancements
- GradioUI now supports reset_agent_memory: perfect for fresh starts in dev & demos.

šŸ”„ Streaming Refactor
- Streaming event aggregation moved off the Model class
- āž”ļø Better architecture & maintainability

šŸ“¦ Output Tracking
- CodeAgent outputs are now stored in ActionStep
- āœ… More visibility and structure to agent decisions

šŸ› Bug Fixes
- Smarter planning logic
- Cleaner Docker logs
- Better prompt formatting for additional_args
- Safer internal functions and final answer matching

šŸ“š Docs Improvements
- Added quickstart examples with tool usage
- One-click Colab launch buttons
- Expanded reference docs (AgentMemory, GradioUI docstrings)
- Fixed broken links and migrated to .md format

šŸ”— Full release notes:
https://github.com/huggingface/smolagents/releases/tag/v1.19.0

šŸ’¬ Try it out, explore the new features, and let us know what you build!

#smolagents #opensource #AIagents #LLM #HuggingFace
a-r-r-o-wĀ 
posted an update 12 days ago
view post
Post
2230
New diffusion model for text-to-image and video-to-world generation: Cosmos Predict-2 šŸ‘½

Model collection: nvidia/cosmos-predict2-68028efc052239369a0f2959
Diffusers support: https://github.com/huggingface/diffusers/pull/11695
Documentation: https://huggingface.co/docs/diffusers/main/en/api/pipelines/cosmos

These are results with the 2B param model. Imagine what you could do with the 14B version! Go check it out now!
  • 1 reply
Ā·
a-r-r-o-wĀ 
posted an update 14 days ago
view post
Post
1283
Did you know how simple it was to get started with your own custom compiler backend with torch.compile? What's stopping you from writing your own compiler?

import torch
from torch._functorch.partitioners import draw_graph

def compiler(fx_module: torch.fx.GraphModule, _):
    draw_graph(fx_module, f"compile.dot")
    return fx_module.forward

def capture(model, *inputs):
    compiled_model = torch.compile(model, backend=compiler)
    y = compiled_model(*inputs)
    y.sum().backward()

class MLP(torch.nn.Module):
    def __init__(self):
        super().__init__()
        
        self.linear_1 = torch.nn.Linear(16, 32)
        self.linear_2 = torch.nn.Linear(32, 16)
    
    def forward(self, x):
        x = self.linear_1(x)
        x = torch.nn.functional.silu(x)
        x = self.linear_2(x)
        return x

if __name__ == '__main__':
    model = MLP()
    model.to("mps")
    x = torch.randn(4, 16, device="mps", dtype=torch.float32)

    capture(model, x)


--------------

Part of https://huggingface.co/posts/a-r-r-o-w/231008365980283
  • 1 reply
Ā·
a-r-r-o-wĀ 
posted an update 14 days ago
view post
Post
2217
Recently, I've been focusing my learning on the following topics:
- Pytorch internals, specifically the inductor system (roughly ~1 month of experience)
- Triton internals (~8 moe)
- CUDA (~3 moe)
- Understanding fusion patterns in compilers and how to improve them (~1 moe)
- Parallelism strategies for large scale inference optimization (~6-7 moe)

I thought it would be nice to document it somewhere for no particular reason. Maybe someone will find it useful? It's also because I want to get into the habit of writing, but had no motivation to do so. Maybe writing short informal posts will help build the habit.

Since I don't have a personal site, and don't plan to create one in the near future, I think HF posts are best suited for short and informal documentation to share my little discoveries and learnings. If you're interested, strap in!

First post in this series will be on basic study of Pytorch's float32 matmuls and their Triton implementation (nothing much, just the tutorial available on the website), short dive into TF32 and their TFLOPS comparison on an A100 machine.
Ā·
NarsilĀ 
posted an update 15 days ago
view post
Post
1402
Me: This function is too slow. Find a faster algorithm.
Cursor: Hold my beer.

Me: *Slacking off with colleagues*
Cursor: Ping.

Me: 🤯

XenovaĀ 
posted an update 22 days ago
view post
Post
4185
NEW: Real-time conversational AI models can now run 100% locally in your browser! 🤯

šŸ” Privacy by design (no data leaves your device)
šŸ’° Completely free... forever
šŸ“¦ Zero installation required, just visit a website
āš”ļø Blazingly-fast WebGPU-accelerated inference

Try it out: webml-community/conversational-webgpu

For those interested, here's how it works:
- Silero VAD for voice activity detection
- Whisper for speech recognition
- SmolLM2-1.7B for text generation
- Kokoro for text to speech

Powered by Transformers.js and ONNX Runtime Web! šŸ¤— I hope you like it!
Ā·
danaaubakirovaĀ 
posted an update 23 days ago
albertvillanovaĀ 
posted an update 30 days ago
sayakpaulĀ 
posted an update about 1 month ago
view post
Post
2554
Diffusers supports a good variety of quantization backends. It can be challenging to navigate through them, given the complex nature of diffusion pipelines in general.

So, @derekl35 set out to write a comprehensive guide that puts users in the front seat. Explore the different backends we support, learn the trade-offs they offer, and finally, check out the cool space we built that lets you compare quantization results.

Give it a go here:
https://lnkd.in/gf8Pi4-2
joaoganteĀ 
posted an update about 1 month ago
view post
Post
479
Let's go! Custom generation code has landed in transformers šŸš€

Have you designed a new cool KV cache? Maybe you're comparing new test-time compute ideas you've been researching? Have you found a way to do diffusion with existing models? You can now easily share your findings with the community with custom generation code, sharing the well-known generate interface šŸ¤“

In a nutshell, we have expanded the support of custom modeling code on the Hub with *model-agnostic* custom generation code. Write for one model, reuse with any model -- hopefully, this will democratize access to new generation ideas 🫔

As a creator, you gain the ability to get your ideas in transformers with minimal effort. You'll also have access to all Hub features: a landing page for your creation, discussions, usage metrics, ... šŸ¤“

šŸ’Ž Resources šŸ’Ž
- docs: https://huggingface.co/docs/transformers/generation_strategies#custom-decoding-methods
- minimal example: transformers-community/custom_generate_example
- discussion: transformers-community/support#10
sayakpaulĀ 
posted an update about 1 month ago
view post
Post
1705
Despite the emergence of combining LLM and DiT architectures for T2I synthesis, its design remains severely understudied.

This was done long ago and got into CVPR25 -- super excited to finally share it now, along with the data and code ā™„ļø

We explore several architectural choices that affect this design. We provide an open & reproducible training recipe that works at scale.

Works like Playground v3 have already explored a deep fusion between an LLM and a DiT, sharing their representations through layerwise attention. They exhibit excellent performance on T2I.

Despite its compelling results and other performance virtues, it remains unexplored, which is what we want to improve in our work. Specifically, we take a pre-trained LLM (Gemma-2B) and trainable DiT, and set out to explore what makes a "good deep fusion" between the two for T2I.

We explore several key questions in the work, such as:

Q1: How should we do attention? We considered several alternatives. PixArt-Alpha like attention (cross-attention) is very promising.
Q2: Should we incorporate additional text modulation?
Q3: Can we eliminate timestep conditioning?
Q4: How do we do positional encodings?
Q5: Do instruction-tuned LLMs help deep fusion?
Q6: Would using a decoder LLM from a multimodal model be helpful?
Q7: Does using a better variant of Gemma help?

Based on the above findings, we arrive at FuseDiT with the following components on top of the base architecture from the findings of our experiments.

* No AdaLN-Zero modules
* 1D + 2D-RoPE
* Gemma 2 2B, adjusting DiT configurations accordingly

We trained FuseDiT on a mixture from CC12M, JourneyDB, & SA (~26M image-text pairs) for 800 steps. While not the best model, it's encouraging to develop something in a guided manner using open datasets.

To know more (code, models, all are available), please check out the paper:
https://lnkd.in/gg6qyqZX.
albertvillanovaĀ 
posted an update about 1 month ago
view post
Post
2474
New in smolagents v1.16.0:
šŸ” Bing support in WebSearchTool
šŸ Custom functions & executor_kwargs in LocalPythonExecutor
šŸ”§ Streaming GradioUI fixes
🌐 Local web agents via api_base & api_key
šŸ“š Better docs

šŸ‘‰ https://github.com/huggingface/smolagents/releases/tag/v1.16.0