Case study
I Rebuilt Qwen3 From Scratch and Pretrained It on a University Supercomputer
Reconstructed Qwen3-0.6B's architecture component-by-component (751M params), built a 13B-token curated data pipeline, and pretrained it on UNC's Longleaf A100 cluster.
- Parameters
- 751M
- Tokens
- 13B
- Layers
- 28
- Hardware
- A100s
Act 1 — The Hook
Placeholder spine. The five acts go here. The first time a glossed term appears,
wrap it in <Term id="...">term</Term> — for example, models learn by
predicting the next token .
Act 2 — Architecture
Component-by-component reconstruction goes here.
Act 3 — Data
13B-token curated pipeline goes here.
Act 4 — Training
Longleaf A100 run, loss curves, ablations go here.
Act 5 — What I’d Do Differently
Self-critique goes here.