TinyLM

A 3.4M parameter causal language model trained from scratch, for experimentation.

Architecture

Hyperparameter	Value
Parameters	3.403.968
Layers	4
Hidden size	64
Attention heads	4
FFN dim	192
Embedding rank	32
Context length	256
Tokenizer	GPT-2 (50257 vocab)

Uses a factored (low-rank) embedding to keep the vocab projection from eating the entire parameter budget, with weight tying on the output head.

Training


Datasets	Skylion007/openwebtext (10k samples), roneneldan/TinyStories (10k samples)
Optimizer	AdamW (lr=3e-3, weight_decay=0.01)
Scheduler	Cosine annealing with warm restarts
Mixed precision	fp16 (torch.cuda.amp)
Hardware	Nvidia P100

Usage

from huggingface_hub import snapshot_download
import importlib.util
import torch

# Download files
snapshot_download(repo_id="Fu01978/TinyLM", local_dir="./tinylm")

# Load via script
spec   = importlib.util.spec_from_file_location("modeling_tinylm", "./tinylm/modeling_tinylm.py")
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)

model, tokenizer, config = module.load_tinylm("./tinylm")
model.eval()

# Generate
output = module.generate(model, tokenizer, "Once upon a time, ")
print(output)

Downloads last month: 19

Datasets used to train Fu01978/TinyLM

Collection including Fu01978/TinyLM

Small Models

Collection

A list of all small models (=<1B) that I have published. • 9 items • Updated Mar 2