Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

SeaWolf-AIย 
posted an update about 23 hours ago
view post
Post
2687
๐Ÿฏ Chitos โ€” The Security Scanner That Actually Proves It

Most security scanners hand you a suspect list and walk away. That gap between detection and proof is where attackers live โ€” and it's exactly the gap that Chitos was built to close.

Chitos is the successor to Mythos, a static analyzer built for quick code health checks. Mythos was good at pattern matching โ€” spotting dangerous sinks, mapping CWEs, producing readable reports. But static analysis has a structural ceiling. A rule that sees eval(user_input) can tell you that looks dangerous. It cannot tell you whether the input is reachable, whether sanitization three layers up covers this path, or whether there's a live exploit chain for your exact framework version. Chitos was built to answer those questions.

๐Ÿ” Phase 1 applies 50 language-agnostic rules across Python, JavaScript, Go, Java, C/C++, Rust, PHP, YAML and more โ€” covering injection sinks, deserialization gadgets, credential leakage, broken crypto, and prototype pollution. Every candidate is re-verified before reaching the report. Findings that can't be substantiated are excluded, not handed to you as noise.

๐Ÿ”ฌ Phase 2 dispatches an autonomous web-search agent to hunt live CVE databases, exploit advisories, and public PoC repositories. It formulates hypotheses, verifies them, and synthesizes a structured threat narrative. This phase needs a user-supplied Claude API key โ€” Phases 1 and 3 run entirely free.

๐ŸŽฏ Phase 3 is where Chitos diverges from everything else. Against targets you own or are authorized to test, it fires real payloads โ€” XSS, SQLi, path traversal, command injection โ€” mutates on block, captures hard evidence, and connects every proven finding into a kill-chain showing which vulnerabilities to remediate first.

No installation. No account. No code sent to third-party APIs.

Article: https://huggingface.co/blog/FINAL-Bench/chitos

Try it now ๐Ÿ‘‰ https://chitos.vidraft.net
ginigen-aiย 
posted an update about 20 hours ago
view post
Post
2202
๐Ÿณ The RoboCasa Kitchen Leaderboard
What does it take for a robot to handle kitchen chores the way a person does? It has to see (Vision), understand instructions (Language), and actually act (Action) โ€” and VLA (Vision-Language-Action) models are emerging as the answer. They're the bridge between large multimodal models and real-world embodied control.

RoboCasa Kitchen is a leading robot-learning benchmark in which a single-arm robot (Franka Panda) performs 24 atomic manipulation tasks โ€” picking up cups and bowls, opening drawers and doors, turning faucets, pressing buttons, and more โ€” inside a photorealistic simulated kitchen. Because the layout and object placement are randomized every episode, it tests genuine generalization rather than memorized motions. The score (success rate, SR) is the average fraction of the 24 tasks completed as instructed, measured over multiple seeds so results aren't down to luck.

The catch: this benchmark has no official leaderboard, and protocols (number of demonstrations, evaluation setup) differ from paper to paper, leaving scores scattered. Lining the numbers up naively quickly turns into an apples-to-oranges comparison.

This leaderboard fixes that by collecting published scores with their sources and comparing only what is genuinely comparable. It's split into three tables:

๐Ÿ† Kitchen 24-task (matched) โ€” head-to-head under identical conditions (per the RLDX-1 Technical Report). This is the core ranking you can actually trust.
โž• Other protocols โ€” self-reported under different setups (e.g. fewer demos). Not directly comparable, so kept separate.
๐Ÿค– GR1-Tabletop โ€” a different, humanoid-based variant suite, separated to avoid confusion.

Any researcher can submit their own model's score directly, and submissions are reviewed before they appear on the board. Every number links to its source paper, so you can verify it yourself.

๐Ÿ‘‰ ginigen-ai/robocasa-kitchen-leaderboard
AxionLab-officialย 
posted an update 1 day ago
view post
Post
4146
โš ๏ธ Community Notice

We would like to clarify that SupraLabs has no affiliation, partnership, or connection whatsoever with "SupraLarps" or its members.

Please avoid interacting with their organization, repositories, or Spaces under the assumption that they are associated with us.

We are currently aware of the situation and have already contacted the appropriate channels to address it.

Thank you to everyone who continues to support SupraLabs. โค๏ธ
  • 8 replies
ยท
Banaxi-Techย 
posted an update 2 days ago
view post
Post
3730
Hello AI Community! ๐Ÿ‘‹
We currently have a new AI Model and we are currently training it.
We are training it on 27B tokens and are currently 8% done.
Follow us to be notified when it releases ๐Ÿš€
Some Info:
Parameters 75M
GPU: RTX Pro 6000
We expect to be able to release it in the coming days

EDIT: We are now at step 210K
  • 28 replies
ยท
pankajpandey-devย 
posted an update 3 days ago
view post
Post
7709
๐Ÿ‡ฎ๐Ÿ‡ณ New in my Hindi LLM Series: Gemma-4 E4B, fine-tuned for Hindi โ€” and it runs on your laptop's CPU.
I fine-tuned Google's new Gemma-4 E4B on ~10k Hindi instruction pairs (AI4Bharat: anudesh + dolly) using Unsloth + LoRA, on a single L4 GPU.
Then I ran an honest side-by-side eval: base Gemma-4 vs my fine-tune, across 25 Hindi prompts. The results were interesting ๐Ÿ‘‡
โœ… My fine-tune is more concise โ€” ask for "3 tips" and it gives exactly 3. Base writes a 1,200-character essay.

โœ… Pure native Hindi โ€” base keeps slipping into English ("เคธเค‚เคคเฅเคฒเคฟเคค เค†เคนเคพเคฐ (Eat a Balanced Diet)", "เคคเคพเคฐเคพ (Star)"). My fine-tune stays in clean Hindi.

โœ… Tighter instruction-following โ€” ask for a "short message" and it gives one, not a menu of options.
โš–๏ธ And to be honest: base Gemma-4 is more detailed and comprehensive. I didn't build a "smarter" model โ€” I built a focused, Hindi-native, edge-friendly one that runs as a 5GB GGUF (Q4) on CPU.
๐Ÿ”— Try it:

Live demo (CPU): pankajpandey-dev/gemma-4-e4b-hindi-demo
GGUF (Ollama/llama.cpp): pankajpandey-dev/gemma-4-e4b-hindi-instruct-GGUF
16-bit model: pankajpandey-dev/gemma-4-e4b-hindi-instruct

Built with @unsloth ยท Data by @ai4bharat ๐Ÿ™
#Hindi #LLM #Gemma #Unsloth #IndicNLP #GGUF
  • 12 replies
ยท
projectlosangelesย 
posted an update 2 days ago
view post
Post
6239
๐Ÿ”ฅIf you love multi-modal art๐ŸŽจ, please check out "A Million Little Fibers" project!!!๐Ÿ”ฅ

https://github.com/asigalov61/A-Million-Little-Fibers-2026

https://soundcloud.com/aleksandr-sigalov-61/sets/a-million-little-fibers

This brand new 2026 edition covers three SOTA models:

zai-org/GLM-5.2
k2-fsa/OmniVoice
HeartMuLa/HeartMuLa-oss-3B-happy-new-year

The project aims to showcase what kind of multi-modal art is now possible to create with these amazing OSS resources!

If you enjoyed the project, please โญor ๐Ÿ”ฑ GitHub repo andโค๏ธon SoundCloud and Hugging Face. It really helps!

Most sincerely,

Alex

Project Los Angeles
Tegridy Code 2026

P.S. Don't forget to bring a towel! ๐Ÿ˜‚

@multimodalart
@victor
@John6666
  • 3 replies
ยท
fffiloniย 
posted an update 4 days ago
view post
Post
1395
A few weeks ago, @victor opened the door: coding agents can now ship Hugging Face Spaces autonomously.

I pulled on that thread.

As someone who builds and ships Gradio demos regularly, I didnโ€™t just want to reproduce the loop. I wanted to see what happens when that loop is plugged into the whole Hugging Face stack.

The interesting part is not only that an agent can ship a Space.

Itโ€™s what happens when Space generation becomes a first-class Hugging Face workflow.

That became Agentic Space Factory.

More soon. ๐Ÿค—
  • 1 reply
ยท
Bc-AIย 
posted an update 4 days ago
view post
Post
193
# ๐Ÿ”ฅ Nova-1 Beta: Test Our New LLMs!

**Smilyai Labs** is building **Nova-1** โ€” open-source LLMs with novel architectures. Join our beta program!

## ๐ŸŽฏ Available Now:

**Nova-1-Standard (1.2B)** โ€” Phase 2 of pretraining in progress
- PPL 13.5 (beats GPT-2 Large!)
- 48K tok/s on consumer GPUs
- Great for code, reasoning, edge deployment

**Nova-1-Large (3.5B)** โ€” Training live RIGHT NOW
- Current: 30.9 PPL, improving fast, loss at 3.5 right now
- Will finish with ~1.7B tokens today
- Better reasoning & longer context

**Nova-1-XL (10B MoE)** โ€” Coming soon (We dont know yet! haha)
- Final Specs not decided yet


## What Makes Nova Special?

โœจ **Mixture of Depths (MoD)** โ€” Routes tokens dynamically, 30% faster
โœจ **Grouped Query Attention** โ€” Efficient like LLaMA 2/3
โœจ **Phased Training** โ€” Fresh 1B tokens each phase (no overfitting!)
โœจ **RoPE** โ€” Context extendable to 8K+

## ๐Ÿค Join Beta Testing:

๐Ÿ‘‰ **[Smilyai-labs-beta-testers](
Smilyai-labs-beta-testers


Get early access, shape the roadmap, and help build transparent open-source AI!

  • 1 reply
ยท
stasย 
posted an update about 16 hours ago
view post
Post
59
After many months of intense work the
Snowflake AI Research team is happy to present to you the new open source project: Arctic RL

https://snowflake.com/en/blog/engineering/arctic-rl-open-source-backend/

- Arctic RL integrates with VeRL and SkyRL today; enable ZoRRo with one config flag, no code changes required
- ZoRRo delivers up to 6x actor-update acceleration and a 3.5x end-to-end training speedup, reducing Arctic-Text2SQL-R2 training from ~5 days to ~36 hours on 32 H200 GPUs
- Arctic-Text2SQL-R2 achieved higher accuracy scores (48.7) than Gemini 3.1 Pro (47.9) and Claude 4.7 (47.3) on Snowflake's evaluated enterprise SQL benchmark under the tested conditions
- Two open source recipes ship with this release: a text-to-SQL recipe that improved BIRD dev accuracy from 59.92% to 70.35%, and a multi-hop QA recipe that improved average accuracy from 69.6% to 72.3%
codelionย 
posted an update 2 days ago
view post
Post
114
SPROG-9M โ€” a 9.37M parameter model trained from scratch to solve GSM8K-style math without using an LLM at inference.

The model, codelion/sprog-9m, predicts symbolic programs over number slots, then a deterministic executor does the arithmetic. With a simple verifier, it reaches ~11.8% on GSM8K test.

We also released the dataset: codelion/gsm8k-synth, 117K validated synthetic GSM8K-style problems.

Tiny model, no pretraining, no LLM at inference, runs on a laptop.