Instructions to use QueryloopAI/AlphaMonarch-dora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use QueryloopAI/AlphaMonarch-dora with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="QueryloopAI/AlphaMonarch-dora")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("QueryloopAI/AlphaMonarch-dora")
model = AutoModel.from_pretrained("QueryloopAI/AlphaMonarch-dora")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use QueryloopAI/AlphaMonarch-dora with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "QueryloopAI/AlphaMonarch-dora"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QueryloopAI/AlphaMonarch-dora",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/QueryloopAI/AlphaMonarch-dora

SGLang

How to use QueryloopAI/AlphaMonarch-dora with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "QueryloopAI/AlphaMonarch-dora" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QueryloopAI/AlphaMonarch-dora",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "QueryloopAI/AlphaMonarch-dora" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QueryloopAI/AlphaMonarch-dora",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use QueryloopAI/AlphaMonarch-dora with Docker Model Runner:
```
docker model run hf.co/QueryloopAI/AlphaMonarch-dora
```

AlphaMonarch-dora / README.md

abideen

Update README.md

816daf6 verified about 2 years ago

preview code

raw

history blame contribute delete

4.51 kB

	---
	license: cc-by-nc-4.0
	base_model: mlabonne/NeuralMonarch-7B
	tags:
	- generated_from_trainer
	- mistral
	- instruct
	- finetune
	- chatml
	- gpt4
	- synthetic data
	- distillation
	model-index:
	- name: AlphaMonarch-dora
	results: []
	datasets:
	- argilla/OpenHermes2.5-dpo-binarized-alpha
	language:
	- en
	library_name: transformers
	pipeline_tag: text-generation
	---
	# AlphaMonarch-dora

	![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/64fc6d81d75293f417fee1d1/7xlnpalOC4qtu-VABsib4.jpeg)



	<!-- Provide a quick summary of what the model is/does. -->
	AlphaMonarch-dora is a DPO fine-tuned of [mlabonne/NeuralMonarch-7B](https://huggingface.co/mlabonne/NeuralMonarch-7B/) using the [argilla/OpenHermes2.5-dpo-binarized-alpha](https://huggingface.co/datasets/argilla/OpenHermes2.5-dpo-binarized-alpha) preference dataset using DoRA. This model is slightly less performant on the Nous and Openllm leaderboards in comparison to base [AlphaMonarch](https://huggingface.co/mlabonne/AlphaMonarch-7B) and [AlphaMonarch-laser](https://huggingface.co/abideen/AlphaMonarch-laser). I have trained this model for 1080 steps. All hyperparams were kept consist across all these experiments.


	## 🏆 Evaluation results

	# OpenLLM Benchmark


	![image/png](https://cdn-uploads.huggingface.co/production/uploads/64e380b2e12618b261fa6ba0/mVwB5NB0XcUwqharYhDGr.png)

	# Nous Benchmark

	### AGIEVAL

	\| Task \| Version \| Accuracy \| Accuracy StdErr \| Normalized Accuracy \| Normalized Accuracy StdErr \|
	\|--------------------------------\|---------\|----------\|-----------------\|---------------------\|-----------------------------\|
	\| agieval_aqua_rat \| 0 \| 28.35% \| 2.83% \| 26.38% \| 2.77% \|
	\| agieval_logiqa_en \| 0 \| 38.71% \| 1.91% \| 38.25% \| 1.90% \|
	\| agieval_lsat_ar \| 0 \| 23.91% \| 2.82% \| 23.48% \| 2.80% \|
	\| agieval_lsat_lr \| 0 \| 52.55% \| 2.21% \| 53.73% \| 2.21% \|
	\| agieval_lsat_rc \| 0 \| 66.91% \| 2.87% \| 66.54% \| 2.88% \|
	\| agieval_sat_en \| 0 \| 78.64% \| 2.86% \| 78.64% \| 2.86% \|
	\| agieval_sat_en_without_passage \| 0 \| 45.15% \| 3.48% \| 44.17% \| 3.47% \|
	\| agieval_sat_math \| 0 \| 33.64% \| 3.19% \| 31.82% \| 3.15% \|

	AVG = 45.976

	### GPT4ALL

	\| Task \| Version \| Accuracy \| Accuracy StdErr \| Normalized Accuracy \| Normalized Accuracy StdErr \|
	\|--------------\|---------\|----------\|-----------------\|---------------------\|-----------------------------\|
	\| arc_challenge\| 0 \| 65.87% \| 1.39% \| 67.92% \| 1.36% \|
	\| arc_easy \| 0 \| 86.49% \| 0.70% \| 80.64% \| 0.81% \|
	\| boolq \| 1 \| 87.16% \| 0.59% \| - \| - \|
	\| hellaswag \| 0 \| 69.86% \| 0.46% \| 87.51% \| 0.33% \|
	\| openbookqa \| 0 \| 39.00% \| 2.18% \| 49.20% \| 2.24% \|
	\| piqa \| 0 \| 83.03% \| 0.88% \| 84.82% \| 0.84% \|
	\| winogrande \| 0 \| 80.98% \| 1.10% \| - \| - \|

	AVG = 73.18

	### TRUTHFUL-QA

	\| Task \| Version \| MC1 Accuracy \| MC1 Accuracy StdErr \| MC2 Accuracy \| MC2 Accuracy StdErr \|
	\|---------------\|---------\|--------------\|---------------------\|--------------\|---------------------\|
	\| truthfulqa_mc \| 1 \| 62.91% \| 1.69% \| 78.48% \| 1.37% \|

	AVG = 70.69

	### Training hyperparameters
	The following hyperparameters were used during training:
	- learning_rate: 5e-7
	- train_batch_size: 2
	- eval_batch_size: Not specified
	- seed: Not specified
	- gradient_accumulation_steps: 8
	- total_train_batch_size: Not specified
	- optimizer: PagedAdamW with 32-bit precision
	- lr_scheduler_type: Cosine
	- lr_scheduler_warmup_steps: 100
	- training_steps: 1080
	### Framework versions
	- Transformers 4.39.0.dev0
	- Peft 0.9.1.dev0
	- Datasets 2.18.0
	- torch 2.2.0
	- accelerate 0.27.2