Quark-50m-Instruct

Quark-50m-Instruct is a small (≈56M parameters) decoder-only language model, fine-tuned for instruction following. It is built on the same architecture as the now‑abandoned “SmolLM” family and was fully pretrained on 5 billion tokens from HuggingFaceTB/smollm‑corpus.

  • Model type: Causal Language Model (LLaMA‑style decoder)
  • Architecture: GQA · SwiGLU · RMSNorm · RoPE · Weight‑tying
  • Pretraining tokens: 5 B
  • Fine‑tuning: Instruction‑tuned (details below)
  • Creators: OvercastLab (research & development lab for ML/AI)
  • Release date: 22 April 2026

Model Summary

Quark-50m-Instruct is designed to be an efficient assistant that can run on consumer GPUs (e.g., RTX 3070 with 8 GB VRAM) and even on CPU for light workloads. It is not competitive with large models on knowledge‑intensive tasks, but it excels at:

  • Simple conversational tasks
  • Code generation and explanation (Python)
  • Short text rewriting and summarisation
  • On‑device / edge inference

The architecture closely follows the efficient‑small‑LM blueprint popularised by SmolLM:

Component Details
Vocab size 49,152
Hidden size 384
Layers 24
Attention Grouped Query (6 Q heads, 2 KV heads)
FFN SwiGLU with 1,024 intermediate
Position RoPE (θ = 10,000)
Normalisation RMSNorm (pre‑block)

Total trainable parameters: ≈48 M (with weight tying).

Uses

Direct Use

The model can be used via the 🤗 Transformers library for standard text generation. It expects chat‑formatted input (see example below).

Downstream Use

Because of the open Apache‑2.0 license, you may fine‑tune Quark-50m‑Instruct on your own data for domain‑specific tasks – for instance, a customer‑support bot, a code reviewer, or a story writer.

Limitations

  • Limited world knowledge (stopped at mid‑2025 pretraining data).
  • Short context window (2,048 tokens).
  • Small size means it can make more factual mistakes than larger models.

How to Get Started

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "OvercastLab/Quark-50m-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

messages = [
    {"role": "system", "content": "You are Quark, a helpful assistant."},
    {"role": "user", "content": "Explain group query attention in one sentence."}
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

outputs = model.generate(inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Downloads last month
1,937
Safetensors
Model size
56.7M params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train OvercastLab/Quark-50m-Instruct

Collection including OvercastLab/Quark-50m-Instruct