Building on HF

Joseph [open/acc] Pollack PRO

Tonic

https://discord.gg/qdfnvSPcqP

AI & ML interests

🤖Making robots to help people learn things quicker 👩🏻‍🚀🚀

Recent Activity

liked a Space 27 minutes ago

victor/lance-zerogpu-demo

reacted to salma-remyx's post with 👍 28 minutes ago

Just trained a 2B coding model to rank candidate AI/ML research ideas against the implicit preferences in a code repository's merge history. The training data comes from a Gaussian Process fit on the accumulated dispositions in VQASynth, where each PR against a deployed project yields a pairwise comparison between the feature branch preferred and the baseline at main. The GP scores candidate papers to synthesize preference pairs, and DPO with LoRA bakes the ranking pipeline into the model's weights. After 1 epoch the model reaches 87.4% reward accuracy on the held-out eval split against 92.3% on training, consistent with learning the task without overfitting. Now, I'm scaling the pipeline to thousands of repos for a generalization test. Dataset: https://huggingface.co/datasets/remyxai/mhpd-dpo-v0 Model: https://huggingface.co/remyxai/mhpd-dpo-qwen3.5-2b-vqasynth Substack: https://remyxai.substack.com/p/the-ai-pm

liked a Space about 13 hours ago

pragnakalp/OCR-image-to-text

View all activity

Organizations

liked a Space 27 minutes ago

Lance Multitask ZeroGPU

🚀

Multitask ZeroGPU demo for ByteDance Lance

reacted to salma-remyx's post with 👍 28 minutes ago

Post

797

Just trained a 2B coding model to rank candidate AI/ML research ideas against the implicit preferences in a code repository's merge history.

The training data comes from a Gaussian Process fit on the accumulated dispositions in VQASynth, where each PR against a deployed project yields a pairwise comparison between the feature branch preferred and the baseline at main.

The GP scores candidate papers to synthesize preference pairs, and DPO with LoRA bakes the ranking pipeline into the model's weights.

After 1 epoch the model reaches 87.4% reward accuracy on the held-out eval split against 92.3% on training, consistent with learning the task without overfitting.

Now, I'm scaling the pipeline to thousands of repos for a generalization test.

Dataset: remyxai/mhpd-dpo-v0
Model: remyxai/mhpd-dpo-qwen3.5-2b-vqasynth
Substack: https://remyxai.substack.com/p/the-ai-pm

liked a Space about 13 hours ago

OCR Image To Text

📸

186

Extract text from images using OCR technology

liked a Space about 22 hours ago

Carbon

🧬

Generate DNA continuations and predict variant effects with Carbon

replied to their post 4 days ago

hey dark , actually if you're keen it wouold be nice to get this into mobile apps , but i do need help with the ux

liked a dataset 4 days ago

fpvlabs/stera-10m

Updated about 3 hours ago • 14.5k • 18

posted an update 7 days ago

Post

2505

🙋🏻‍♂️ Hey there folks ,

Turns out : if we predict 🌏 earth we can save a lot of time looking for interesting things and less time looking at things that we expect to see.

Sentinel-2 imagery 🛰️basically takes a long time to download towards earth. so our "near real time" systems are quite far from that in practical terms.

meanwhile , if we "predict" what we will see , based on what we do see , we can send down much less data in a timely way , and prioritize 📡earth-bound response .

I'm talking about illegal fishing , logging , mining or building in nature reserves , the more of that we predict early the more we're able to stop it on time.

At least that's the concept !

check out the blog : https://huggingface.co/blog/Tonic/save-patagonia-by-predicting-earth

- Collection: https://huggingface.co/collections/NuTonic/earth-observation-with-temporal-and-general-understanding
- Code: https://github.com/Josephrp/Nutonic
- Dataset: NuTonic/sat-vl-sft-training-ready-v1
- Model: NuTonic/lspace
- Training: NuTonic/lspace-trackio
- Evals: NuTonic/Patagonia_Eval

2 replies

reacted to danielhanchen's post with ❤️ 7 days ago

Post

5647

We’re excited to announce that Unsloth has joined the PyTorch Ecosystem! 🔥🦥

Unsloth is an open-source project that makes training & running models more accurate and faster with less compute. Our mission is to make local AI accessible to everyone. Thanks to all of you for making this possible! 💕

Blog: https://unsloth.ai/blog/pytorch
GitHub: https://github.com/unslothai/unsloth

2 replies

reacted to unmodeled-tyler's post with 👀 7 days ago

Post

2943

The UFO/UAP Dataset is complete!

unmodeled-tyler/DoW-UFO-UAP-1

The most recent release from the Department of War is there up in full and ready for analysis!

The dataset ships with an Hermes Agent Skill so you can quickly and easily start parsing through the data immediately.

Go chase some anomalies! 🚀

reacted to cesear64's post with 👀 7 days ago

Post

4090

Just published: how we built production Sango (Central African Republic) translation without fine-tuning, parallel corpus, or training compute.

The method — vocabulary-augmented prompting with a 581-entry native-speaker-verified lexicon — generalizes to any of the ~2,000 African languages at the same data-poverty level. Recipe, dataset, and code template all included.

📄 Blog: https://huggingface.co/blog/MEYNG/sangoai
📦 Dataset: MEYNG/sango-vocabulary

Would especially value feedback from anyone working on other low-resource African languages — Ewondo, Lingala, Wolof next on our roadmap.