Instructions to use Fu01978/FuadeAI-50M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Fu01978/FuadeAI-50M with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Fu01978/FuadeAI-50M")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Fu01978/FuadeAI-50M")
model = AutoModelForCausalLM.from_pretrained("Fu01978/FuadeAI-50M")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Fu01978/FuadeAI-50M with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Fu01978/FuadeAI-50M"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Fu01978/FuadeAI-50M",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Fu01978/FuadeAI-50M

SGLang

How to use Fu01978/FuadeAI-50M with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Fu01978/FuadeAI-50M" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Fu01978/FuadeAI-50M",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Fu01978/FuadeAI-50M" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Fu01978/FuadeAI-50M",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Fu01978/FuadeAI-50M with Docker Model Runner:
```
docker model run hf.co/Fu01978/FuadeAI-50M
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

FuadeAI-50M

A 50 million parameter causal language model trained for conversational chat, built on a GPT-2 architecture with a custom tokenizer.

Model Details

Property	Value
Parameters	51.5M
Architecture	GPT-2 (custom config)
Hidden size	512
Layers	8
Attention heads	8
Context length	1024 tokens
Tokenizer	GPT-2 + custom special tokens
Training precision	FP16

Special Tokens

Token	Purpose
`<\|startoftext\|>`	Beginning of conversation
`<user>` / `</user>`	Wraps user message
`<assistant>` / `</assistant>`	Wraps assistant response
`<\|endoftext\|>`	End of conversation

Training Data

LucidexAi/VIBE-2K
HuggingFaceTB/instruct-data-basics-smollm-H4
MuskumPillerum/General-Knowledge (4k random rows)
Custom synthetic dataset for identity and conversational grounding

How To Use

Transformers

from transformers import GPT2Tokenizer, GPT2LMHeadModel
import torch

# Load model and tokenizer
tokenizer = GPT2Tokenizer.from_pretrained("Fu01978/FuadeAI-50M")
model = GPT2LMHeadModel.from_pretrained("Fu01978/FuadeAI-50M")
model.eval()

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

# Chat function
def chat(prompt, temperature=0.4, top_p=0.9, max_new_tokens=100):
    formatted = (
        f"{tokenizer.bos_token}"
        f"<user>{prompt}</user>"
        f"<assistant>"
    )
    inputs = tokenizer(formatted, return_tensors="pt").to(device)

    with torch.no_grad():
        output = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=True,
            temperature=temperature,
            top_p=top_p,
            repetition_penalty=1.2,
            no_repeat_ngram_size=3,
            eos_token_id=tokenizer.eos_token_id,
            pad_token_id=tokenizer.pad_token_id,
        )

    generated = output[0][inputs["input_ids"].shape[-1]:]
    return tokenizer.decode(generated, skip_special_tokens=True).strip()

# Example usage
print(chat("Hello!"))
print(chat("Who invented the first telephone?"))
print(chat("Who are you?"))

Generation Tips

temperature=0.45 — balanced creativity and coherence (recommended)
temperature=0.2 — more focused and deterministic answers
temperature=0.8 — more creative but less reliable
repetition_penalty=1.2 — keeps responses from looping (recommended)
max_new_tokens=100 — increase for longer responses

Limitations

50M parameters is small — factual recall is imperfect and some answers may be incorrect. Always verify factual claims from this model.
Coverage of topics is limited compared to large-scale models.
Not suitable for factual research, medical/legal/financial advice, or any high-stakes decision making.
Context window — limited to 1024 tokens total (prompt + response).

Intended Use

Learning and experimentation with small language models
Lightweight conversational agent for low-stakes applications
Fine-tuning base for domain-specific chat applications

Downloads last month: 10

Safetensors

Model size

51.5M params

Tensor type

F32

Model tree for Fu01978/FuadeAI-50M

Quantizations

1 model

Datasets used to train Fu01978/FuadeAI-50M

Collection including Fu01978/FuadeAI-50M

Small Models

Collection

A list of all small models (=<1B) that I have published. • 9 items • Updated Mar 2