Case study

I Rebuilt Qwen3 From Scratch and Pretrained It on a University Supercomputer

Reconstructed Qwen3-0.6B's architecture component-by-component (751M params), built a 13B-token curated data pipeline, and pretrained it on UNC's Longleaf A100 cluster.

Parameters
751M
Tokens
13B
Layers
28
Hardware
A100s

Act 1 — The Hook

Placeholder spine. The five acts go here. The first time a glossed term appears, wrap it in <Term id="...">term</Term> — for example, models learn by predicting the next token .

Act 2 — Architecture

Component-by-component reconstruction goes here.

Act 3 — Data

13B-token curated pipeline goes here.

Act 4 — Training

Longleaf A100 run, loss curves, ablations go here.

Act 5 — What I’d Do Differently

Self-critique goes here.