The LLM by @karpathy is officially in the library, and we wrote a blog covering: how did we port the model, differences from the original, and how to run or train it.
Building high-performance, reproducible kernels for AMD ROCm just got a lot easier.
I've put together a guide on building, testing, and sharing ROCm-compatible kernels using the Hugging Face kernel-builder and kernels libraries; so you can focus on optimizing performance rather than spending time on setup.
Learn how to:
- Use Nix for reproducible builds - Integrate kernels as native PyTorch operators - Share your kernels on the Hub for anyone to use with kernels.get_kernel()
We use the 🏆 award-winning RadeonFlow GEMM kernel as a practical example.