AI & ML interests

None defined yet.

Recent Activity

melmass  updated a model about 2 months ago
Lightricks/LTXV-LoRAs
melmass  new activity about 2 months ago
Lightricks/LTXV-LoRAs:Upload 4 files
melmass  new activity about 2 months ago
Lightricks/LTXV-LoRAs:Update README.md
View all activity

multimodalart 
posted an update 7 days ago
view post
Post
3392
Self-Forcing - a real-time video distilled model from Wan 2.1 by @adobe is out, and they open sourced it 🐐

I've built a live real time demo on Spaces 📹💨

multimodalart/self-forcing
  • 2 replies
·
a-r-r-o-w 
posted an update 12 days ago
a-r-r-o-w 
posted an update 13 days ago
view post
Post
1280
Did you know how simple it was to get started with your own custom compiler backend with torch.compile? What's stopping you from writing your own compiler?

import torch
from torch._functorch.partitioners import draw_graph

def compiler(fx_module: torch.fx.GraphModule, _):
    draw_graph(fx_module, f"compile.dot")
    return fx_module.forward

def capture(model, *inputs):
    compiled_model = torch.compile(model, backend=compiler)
    y = compiled_model(*inputs)
    y.sum().backward()

class MLP(torch.nn.Module):
    def __init__(self):
        super().__init__()
        
        self.linear_1 = torch.nn.Linear(16, 32)
        self.linear_2 = torch.nn.Linear(32, 16)
    
    def forward(self, x):
        x = self.linear_1(x)
        x = torch.nn.functional.silu(x)
        x = self.linear_2(x)
        return x

if __name__ == '__main__':
    model = MLP()
    model.to("mps")
    x = torch.randn(4, 16, device="mps", dtype=torch.float32)

    capture(model, x)


--------------

Part of https://huggingface.co/posts/a-r-r-o-w/231008365980283
  • 1 reply
·
a-r-r-o-w 
posted an update 14 days ago
view post
Post
2215
Recently, I've been focusing my learning on the following topics:
- Pytorch internals, specifically the inductor system (roughly ~1 month of experience)
- Triton internals (~8 moe)
- CUDA (~3 moe)
- Understanding fusion patterns in compilers and how to improve them (~1 moe)
- Parallelism strategies for large scale inference optimization (~6-7 moe)

I thought it would be nice to document it somewhere for no particular reason. Maybe someone will find it useful? It's also because I want to get into the habit of writing, but had no motivation to do so. Maybe writing short informal posts will help build the habit.

Since I don't have a personal site, and don't plan to create one in the near future, I think HF posts are best suited for short and informal documentation to share my little discoveries and learnings. If you're interested, strap in!

First post in this series will be on basic study of Pytorch's float32 matmuls and their Triton implementation (nothing much, just the tutorial available on the website), short dive into TF32 and their TFLOPS comparison on an A100 machine.
·
melmass 
updated a model about 2 months ago
melmass 
in Lightricks/LTXV-LoRAs about 2 months ago

Upload 4 files

#10 opened about 2 months ago by
Cseti

Update README.md

#9 opened about 2 months ago by
Cseti
Cseti 
in Lightricks/LTXV-LoRAs about 2 months ago

Upload 4 files

#10 opened about 2 months ago by
Cseti

Update README.md

#9 opened about 2 months ago by
Cseti
oumoumad 
in Lightricks/LTXV-LoRAs about 2 months ago

fx-loras

1
#7 opened about 2 months ago by
oumoumad
linoyts 
posted an update about 2 months ago
view post
Post
3522
FramePack is hands down one of the best OS releases in video generation 🙇🏻‍♀️🤯
✅ fully open sourced + amazing quality + reduced memory + improved speed
but more even - its gonna facilitate *soooo* many downstream applications
like this version adapted for landscape rotation 👇https://huggingface.co/spaces/tori29umai/FramePack_rotate_landscape
  • 2 replies
·
linoyts 
posted an update 2 months ago
blanchon 
posted an update 10 months ago