david-thrower/HelixLM-tiny-10k-samples-s1-8942pt-s2-700it-20260428 Text Generation • 19.8M • Updated 4 days ago • 225
david-thrower/HelixLM-tiny-10k-samples-s1-8942pt-s2-700it-20260428 Text Generation • 19.8M • Updated 4 days ago • 225
david-thrower/HelixLM-tiny-10k-samples-s1-8942pt-s2-700it-20260427 Text Generation • 19.8M • Updated 5 days ago • 13
david-thrower/HelixLM-tiny-10k-samples-s1-8942pt-s2-700it-20260427 Text Generation • 19.8M • Updated 5 days ago • 13
SmolLM3 pretraining datasets Collection datasets used in SmolLM3 pretraining • 15 items • Updated Aug 12, 2025 • 47
david-thrower/codelion-finemix-pdf-dclm-edu-1024-seq-len-15897-samples Viewer • Updated Jan 19 • 15.9k • 33
david-thrower/codelion-finemix-pdf-dclm-edu-1024-seq-len-15897-samples Viewer • Updated Jan 19 • 15.9k • 33