Edit Models filters

Model Tree

Apps

Docker Model Runner

Inference Providers

OVHcloud AI Endpoints

HF Inference API

Misc

reinforcement-learning

Inference Endpoints

text-generation-inference

Eval Results (legacy)

text-embeddings-inference

4-bit precision

8-bit precision

Mixture of Experts

Carbon Emissions

Models

73,845

Base only

Active filters: reinforcement-learning

mit-oasys/rlm-qwen3-30b-a3b-v0.1

Text Generation • Updated 4 days ago • 28 • 7

zghhui/OmniNFT

Any-to-Any • Updated 13 days ago • 59 • 34

MBZUAI/MediX-R1-8B

Image-Text-to-Text • 9B • Updated Feb 27 • 370 • 8

Adilbai/stock-trading-rl-agent

Reinforcement Learning • Updated Jan 8 • 376 • 146

PhysicsWallahAI/Aryabhata-2.0

Text Generation • 21B • Updated 4 days ago • 265 • 2

WithinUsAI/GODs.Ghost.Codex.VII

Text Generation • 1B • Updated 3 days ago • 3

rstar2-reproduce/rStar2-Agent-14B

Text Generation • 15B • Updated Sep 1, 2025 • 121 • • 28

Salesforce/xRouter

Text Generation • 8B • Updated Nov 4, 2025 • 50 • • 16

nics-efc/MARSHAL-Kuhn-Poker-Qwen3-4B

Text Generation • 4B • Updated Feb 11 • 63 • • 1

zai-org/GLM-TTS

Text-to-Speech • Updated Jan 12 • 373 • 338

nvidia/NitroGen

Reinforcement Learning • Updated Feb 5 • 537

nvidia/GEAR-SONIC

Reinforcement Learning • Updated Apr 11 • 44

MuXodious/HER-32B-absolute-heresy

Text Generation • 33B • Updated Feb 15 • 8 • 9

zosmaai/Qwen3.5-0.8B-GRPO-Math

Text Generation • 0.8B • Updated Mar 10 • 12 • 1

Camais03/camie-crafter

Reinforcement Learning • Updated Mar 29 • 10 • 6

GamageShahan/LunarLander-V2

Reinforcement Learning • Updated Apr 5 • 1

suuley/ppo-LunarLander-v2

Reinforcement Learning • Updated Apr 6 • 1 • 1

rezvan98/trading-agent-rl

Reinforcement Learning • Updated Apr 28 • 228 • 1

intcomp/sub-jepa

Reinforcement Learning • Updated 2 days ago • 3

BruceYuan/MBDPO

Reinforcement Learning • Updated 3 days ago • 1

youngzhong/SOD-GRPO_teacher-4B

Text Generation • 4B • Updated 10 days ago • 115 • • 1

6kplus/PhyMotion-CausalForcing-1.3B

Text-to-Video • Updated 16 days ago • 4

Alopezcordero/LunarLander-v2

Reinforcement Learning • Updated 5 days ago • 184 • 1

Musci-research/Musci-ASR-2.4B

Automatic Speech Recognition • 2B • Updated 4 days ago • 86 • 3

Alopezcordero/ppo-huggy

Reinforcement Learning • Updated 13 days ago • 20 • 1

MeiGen-AI/GenEvolve

Image-Text-to-Text • 9B • Updated 10 days ago • 152 • 6

huggermax/TinyResearcher

Text Generation • 4B • Updated 8 days ago • 20 • 1

ganjii1387/alphabypass3

Reinforcement Learning • Updated 8 days ago • 1

Alopezcordero/q-Taxi-v3

Reinforcement Learning • Updated 5 days ago • 1

Alopezcordero/q-FrozenLake-v1

Reinforcement Learning • Updated 5 days ago • 1