Text Generation
Transformers
Safetensors
English
mistral
feature-extraction
Generated from Trainer
instruct
finetune
chatml
gpt4
synthetic data
distillation
conversational
text-generation-inference
Instructions to use QueryloopAI/AlphaMonarch-dora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use QueryloopAI/AlphaMonarch-dora with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="QueryloopAI/AlphaMonarch-dora") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("QueryloopAI/AlphaMonarch-dora") model = AutoModel.from_pretrained("QueryloopAI/AlphaMonarch-dora") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use QueryloopAI/AlphaMonarch-dora with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "QueryloopAI/AlphaMonarch-dora" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "QueryloopAI/AlphaMonarch-dora", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/QueryloopAI/AlphaMonarch-dora
- SGLang
How to use QueryloopAI/AlphaMonarch-dora with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "QueryloopAI/AlphaMonarch-dora" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "QueryloopAI/AlphaMonarch-dora", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "QueryloopAI/AlphaMonarch-dora" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "QueryloopAI/AlphaMonarch-dora", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use QueryloopAI/AlphaMonarch-dora with Docker Model Runner:
docker model run hf.co/QueryloopAI/AlphaMonarch-dora
| license: cc-by-nc-4.0 | |
| base_model: mlabonne/NeuralMonarch-7B | |
| tags: | |
| - generated_from_trainer | |
| - mistral | |
| - instruct | |
| - finetune | |
| - chatml | |
| - gpt4 | |
| - synthetic data | |
| - distillation | |
| model-index: | |
| - name: AlphaMonarch-dora | |
| results: [] | |
| datasets: | |
| - argilla/OpenHermes2.5-dpo-binarized-alpha | |
| language: | |
| - en | |
| library_name: transformers | |
| pipeline_tag: text-generation | |
| # AlphaMonarch-dora | |
|  | |
| <!-- Provide a quick summary of what the model is/does. --> | |
| AlphaMonarch-dora is a DPO fine-tuned of [mlabonne/NeuralMonarch-7B](https://huggingface.co/mlabonne/NeuralMonarch-7B/) using the [argilla/OpenHermes2.5-dpo-binarized-alpha](https://huggingface.co/datasets/argilla/OpenHermes2.5-dpo-binarized-alpha) preference dataset using DoRA. This model is slightly less performant on the Nous and Openllm leaderboards in comparison to base [AlphaMonarch](https://huggingface.co/mlabonne/AlphaMonarch-7B) and [AlphaMonarch-laser](https://huggingface.co/abideen/AlphaMonarch-laser). I have trained this model for 1080 steps. All hyperparams were kept consist across all these experiments. | |
| ## 🏆 Evaluation results | |
| # OpenLLM Benchmark | |
|  | |
| # Nous Benchmark | |
| ### AGIEVAL | |
| | Task | Version | Accuracy | Accuracy StdErr | Normalized Accuracy | Normalized Accuracy StdErr | | |
| |--------------------------------|---------|----------|-----------------|---------------------|-----------------------------| | |
| | agieval_aqua_rat | 0 | 28.35% | 2.83% | 26.38% | 2.77% | | |
| | agieval_logiqa_en | 0 | 38.71% | 1.91% | 38.25% | 1.90% | | |
| | agieval_lsat_ar | 0 | 23.91% | 2.82% | 23.48% | 2.80% | | |
| | agieval_lsat_lr | 0 | 52.55% | 2.21% | 53.73% | 2.21% | | |
| | agieval_lsat_rc | 0 | 66.91% | 2.87% | 66.54% | 2.88% | | |
| | agieval_sat_en | 0 | 78.64% | 2.86% | 78.64% | 2.86% | | |
| | agieval_sat_en_without_passage | 0 | 45.15% | 3.48% | 44.17% | 3.47% | | |
| | agieval_sat_math | 0 | 33.64% | 3.19% | 31.82% | 3.15% | | |
| AVG = 45.976 | |
| ### GPT4ALL | |
| | Task | Version | Accuracy | Accuracy StdErr | Normalized Accuracy | Normalized Accuracy StdErr | | |
| |--------------|---------|----------|-----------------|---------------------|-----------------------------| | |
| | arc_challenge| 0 | 65.87% | 1.39% | 67.92% | 1.36% | | |
| | arc_easy | 0 | 86.49% | 0.70% | 80.64% | 0.81% | | |
| | boolq | 1 | 87.16% | 0.59% | - | - | | |
| | hellaswag | 0 | 69.86% | 0.46% | 87.51% | 0.33% | | |
| | openbookqa | 0 | 39.00% | 2.18% | 49.20% | 2.24% | | |
| | piqa | 0 | 83.03% | 0.88% | 84.82% | 0.84% | | |
| | winogrande | 0 | 80.98% | 1.10% | - | - | | |
| AVG = 73.18 | |
| ### TRUTHFUL-QA | |
| | Task | Version | MC1 Accuracy | MC1 Accuracy StdErr | MC2 Accuracy | MC2 Accuracy StdErr | | |
| |---------------|---------|--------------|---------------------|--------------|---------------------| | |
| | truthfulqa_mc | 1 | 62.91% | 1.69% | 78.48% | 1.37% | | |
| AVG = 70.69 | |
| ### Training hyperparameters | |
| The following hyperparameters were used during training: | |
| - learning_rate: 5e-7 | |
| - train_batch_size: 2 | |
| - eval_batch_size: Not specified | |
| - seed: Not specified | |
| - gradient_accumulation_steps: 8 | |
| - total_train_batch_size: Not specified | |
| - optimizer: PagedAdamW with 32-bit precision | |
| - lr_scheduler_type: Cosine | |
| - lr_scheduler_warmup_steps: 100 | |
| - training_steps: 1080 | |
| ### Framework versions | |
| - Transformers 4.39.0.dev0 | |
| - Peft 0.9.1.dev0 | |
| - Datasets 2.18.0 | |
| - torch 2.2.0 | |
| - accelerate 0.27.2 |