3 14 22

AlphaSue

AI & ML interests

None yet

Recent Activity

upvoted an article 5 days ago

Open-source DeepResearch – Freeing our search agents

new activity 7 days ago

tokyotech-llm/swallow-math:Why the data only has answers without questions?

upvoted a collection about 2 months ago

Whisper

View all activity

Organizations

None yet

upvoted an article 5 days ago

Article

Open-source DeepResearch – Freeing our search agents

and 4 others •

Feb 4

• 1.28k

New activity in tokyotech-llm/swallow-math 7 days ago

Why the data only has answers without questions?

#1 opened 7 days ago by

AlphaSue

upvoted a collection about 2 months ago

Whisper

Collection

OpenAI Whisper speech recognition models in MLX format • 48 items • Updated Oct 1, 2024 • 51

upvoted an article 3 months ago

Article

Vision Language Models (Better, Faster, Stronger)

and 4 others •

May 12

• 495

upvoted a collection 4 months ago

ProX Refining Models

Collection

Adapted small language models used to generate data refining programs • 5 items • Updated Oct 10, 2024 • 4

New activity in gair-prox/web-chunk-refining-lm 4 months ago

what is the chat template?

#1 opened 4 months ago by

AlphaSue

upvoted 3 papers 4 months ago

How Instruction and Reasoning Data shape Post-Training: Data Quality through the Lens of Layer-wise Gradients

Paper • 2504.10766 • Published Apr 14 • 40

xVerify: Efficient Answer Verifier for Reasoning Model Evaluations

Paper • 2504.10481 • Published Apr 14 • 84

Understanding R1-Zero-Like Training: A Critical Perspective

Paper • 2503.20783 • Published Mar 26 • 57

liked 3 models 4 months ago

upvoted an article 4 months ago

Article

Open R1: Update #3

and 9 others •

Mar 11

• 295

upvoted a paper 4 months ago

Modifying Large Language Model Post-Training for Diverse Creative Writing

Paper • 2503.17126 • Published Mar 21 • 37

upvoted a paper 5 months ago

Organize the Web: Constructing Domains Enhances Pre-Training Data Curation

Paper • 2502.10341 • Published Feb 14 • 3

liked a Space 5 months ago

116

TxT360: Trillion Extracted Text

📖

Create a large-scale deduplicated text dataset for LLM training

liked a model 6 months ago

jinaai/ReaderLM-v2

Text Generation • 2B • Updated Mar 4 • 13.1k • • 680

liked a Space 6 months ago

2.96k

The Ultra-Scale Playbook

🌌

The ultimate guide to training LLM on large GPU Clusters

upvoted an article 6 months ago

Article

Mixture of Experts Explained

and 5 others •

Dec 11, 2023

• 800

upvoted a collection 7 months ago

Papers I've read

Collection

16 items • Updated Jan 12 • 6

AlphaSue

AI & ML interests

Recent Activity

Organizations

AlphaSue's activity

Open-source DeepResearch – Freeing our search agents

Why the data only has answers without questions?

Vision Language Models (Better, Faster, Stronger)

what is the chat template?

Open R1: Update #3

TxT360: Trillion Extracted Text

The Ultra-Scale Playbook

Mixture of Experts Explained