Instructions to use llmfan46/Qwen3.6-27B-uncensored-heretic-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use llmfan46/Qwen3.6-27B-uncensored-heretic-v2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="llmfan46/Qwen3.6-27B-uncensored-heretic-v2")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("llmfan46/Qwen3.6-27B-uncensored-heretic-v2")
model = AutoModelForImageTextToText.from_pretrained("llmfan46/Qwen3.6-27B-uncensored-heretic-v2")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use llmfan46/Qwen3.6-27B-uncensored-heretic-v2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "llmfan46/Qwen3.6-27B-uncensored-heretic-v2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "llmfan46/Qwen3.6-27B-uncensored-heretic-v2",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2

SGLang

How to use llmfan46/Qwen3.6-27B-uncensored-heretic-v2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "llmfan46/Qwen3.6-27B-uncensored-heretic-v2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "llmfan46/Qwen3.6-27B-uncensored-heretic-v2",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "llmfan46/Qwen3.6-27B-uncensored-heretic-v2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "llmfan46/Qwen3.6-27B-uncensored-heretic-v2",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use llmfan46/Qwen3.6-27B-uncensored-heretic-v2 with Docker Model Runner:
```
docker model run hf.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2
```

could you make an uncensored version of DeepSeek v4 Flash?

by CCSSNE - opened 14 days ago

Discussion

CCSSNE

14 days ago

No description provided.

llmfan46

Owner 14 days ago

•

edited 14 days ago

I absolutely would if I could, the issue is that to do Heretic you need to basically have everything in VRAM and DeepSeek-V4-Flash is a 160GB model and I do not have that much VRAM, the only way would be if I added an RTX PRO 6000 Blackwell Workstation Edition (or the more expensive Datacenter cards like a B300) and that's not gonna happen unless someone donate for me 11000+ Euros.

CCSSNE

14 days ago

•

edited 14 days ago by

llmfan46

I see, thanks for explaining. Would it be possible to consider renting GPUs? On a platform I’m familiar with, an H200 costs about €0.5/hour. Do you have an estimate of how long Heretic would take for a model of this size?

baytrail

14 days ago

I guess it need multi round work to make best result. so estimate time is not reliable.

llmfan46

Owner 14 days ago

•

edited 14 days ago

I see, thanks for explaining. Would it be possible to consider renting GPUs?

Yes it's possible, the problem, there is a lot of trial and errors involved in the process, there is a lot of experiments and testing to be done, all this takes a lot of time and many models have their own quirks and you have to find what "makes them tick", this involves testing and re-running trials until you get it right and every model needs to be tweaked and tested multiple times and every model takes it's own time, typically the more parameters a model has the longer Heretication takes and you don't know how many trials you should run, some models gives you the best result at Trial 150, some others at Trial 248, some other at Trial 600, it's impossible to know in advance.

A great example is this:

https://huggingface.co/RadicalNotionAI/Qwen3.5-397B-A17B-heretic

This is the only person to ever Hereticate this model (the rest are just cloned repos of this release) and the model is actually Hereticated as it has now low refusals, but look at what cost:

You can already see the bad sign, KL Divergence of 0.3823, then you go UGI Leaderboard:

If you would notice the big amount of quality loss due to this 0.3823 KL divergence? The issue is that to run Heretic on this model would cost a lot of money very fast and you can't just go ahead and retry until you get it right and have a great result, because every hour costs a lot of money just to run Heretic on this model through multi-GPU renting and so far nobody else ran Heretic on this model as far as I can tell, for obvious reasons.

When I am behind my PC, I can insure that I can release the best models and re-run trials no issue, when I am renting GPUs and I have to operate everything though an interface everytime instead of having all my own tools on my own PC and I am limited with how much time I can use because money is going away every hour and so can not afford any trials and errors, testing, experiments etc., well you can guess that this is a lot different workflow.

On a platform I’m familiar with, an H200 costs about €0.5/hour. Do you have an estimate of how long Heretic would take for a model of this size?

H200 is old and outdated, it "only" has 141GB of VRAM, H200 was supplanted by B200 (192GB of VRAM) and now B200 was supplanted by B300 (288GB of VRAM), Runpod doesn't mention offering B300 on their website and on Runpod B200 costs $5.49/hr per hour, also B200 doesn't have enough VRAM to Heretic DeepSeek-V4-Flash comfortably as everything needs to have a good amount of headroom.

Now I am not saying I can not do it, I can definitely do it, the issue is that I also need a good amount of money to rent Cloud GPUs and I have no idea if I will be able to release a high quality version or not due to all these issues and constraints mentioned in this post.

llmfan46

Owner 14 days ago

I guess it need multi round work to make best result. so estimate time is not reliable.

Yes exactly, see my post above.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment