Alexey Gorbatovski's picture

3 7

Alexey Gorbatovski

Myashka

·

Myashka

AI & ML interests

NLP Alignment

Recent Activity

commented on a paper 17 days ago

BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping

new activity 28 days ago

agentica-org/DeepScaleR-Preview-Dataset:There are no answers for 6 samples

updated a model 2 months ago

Myashka/Qwen2.5-7B-UltraChat200K_EMA_SFT-Lr_3e_6-Alpha_0.01

View all activity

Organizations

None yet

commented a paper 17 days ago

BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping

Paper • 2510.18927 • Published 19 days ago • 82 •

New activity in agentica-org/DeepScaleR-Preview-Dataset 28 days ago

There are no answers for 6 samples

#4 opened 28 days ago by

New activity in Myashka/CryptoNews_50_50 over 1 year ago

Librarian Bot: Add language metadata for dataset

#2 opened over 1 year ago by