Perfect parameters for coding !!! SHARE YOURS

#14
by Ukro - opened

Hey everybody,
Please share your parameters for webdev/coding use. I use opencode, seems fine, sometime crashing opencode but not sure it's because of oh-my-openagent plugin or because of this gguf. Don't want to analyze further. But if somebody have some configs good to share, please do. This is mine:

c:\0_llama_server\llama-server ^
    -m a:\0_LM_Studio\Jackrong\Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF\Qwen3.5-27B.Q6_K.gguf ^
    -ngl 999 --threads 22 --ctx-size 196610 --alias qwen3.5:27b ^
    --flash-attn on ^
    --batch-size 1024 ^
    --ubatch-size 256 ^
    --host 0.0.0.0 ^
    --repeat_penalty 1.1 ^
    --presence_penalty 0.0 ^
    --temp 0.6 --top-p 0.95 --top-k 20 --min-p 0.00 ^
    --port 8080 --no-context-shift  ^
    --spec-type ngram-mod --spec-ngram-size-n 24 --draft-min 48 --draft-max 64 ^
    --jinja ^
    --seed 3407 --reasoning-format deepseek --ctx-checkpoints 128 --reasoning 1

Deployed on one DGX Spark, Mac connected via LM Studio link + CC-Switch, VSCode terminal, single Claude session: sometimes crashes amid GEN process. However, when I create 4 agents to work together, it looks fine, prompting in and out @200-500 tokens.

LM_Studio\Jackrong\Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF\Qwen3.5-27B.Q4_K_M.gguf
Context Length 262144, GPU Offload 64, CPU Thread Pool Size 15, Eva Batch Size 512, Max Concurrent Predictions 4,
Temp 0.5, Context Overflow Rolling Window

Sign up or log in to comment