VibeThinker-3B — Bug-Bounty Triage (LoRA adapter)

A LoRA fine-tune of WeiboAI/VibeThinker-3B that triages bug-bounty / vulnerability-disclosure submissions into a structured verdict — disposition, severity, confidence, and a rationale — and is hardened against prompt-injection and AI-generated "slop" reports.

Project name: VibeBounty. This repo hosts the trained LoRA adapter (mlx-lm format); fuse it onto the base model to get a standalone model.

What it does

Given a report (title, asset, description, steps, impact), it emits a JSON verdict over a 9-class disposition taxonomy:

valid_impactful · valid_low · corroborated_surge · likely_duplicate · out_of_scope · theoretical_no_poc · self_inflicted · accepted_risk · slop

plus a severity estimate, a confidence gated by claim-reliability, and questions for the researcher.

Files

file	purpose
`adapters/adapters.safetensors`	final LoRA adapter (iter 2000, mlx-lm)
`adapters/adapter_config.json`	adapter / training config
`lora_config.yaml`	full mlx-lm LoRA recipe

Usage (Apple Silicon / MLX)

pip install mlx-lm huggingface_hub
hf download macmacmacmac/vibebounty --local-dir vibebounty

# fuse adapter -> standalone model
mlx_lm.fuse --model WeiboAI/VibeThinker-3B \
  --adapter-path vibebounty/adapters --save-path vibethinker-bbtriage

# generate
mlx_lm.generate --model vibethinker-bbtriage \
  --prompt "Triage this report: IDOR in invoice download ..."

Or load the base + adapter directly with mlx-lm without fusing (--adapter-path vibebounty/adapters).

Training

Base: WeiboAI/VibeThinker-3B (Qwen2.5-3B lineage)
Method: LoRA (rank 16, scale 20, all 36 layers; q/k/v/o + MLP proj), mask_prompt
Iters: 2000, batch 8, seq 2048, lr 1e-4, AdamW
Data: ~18k bug-bounty reports labeled from real disclosure outcomes (substate / severity / bounty / CVE), rendered as chat with reasoning targets
Train loss 3.4 → <0.7; val loss ~1.06

Sample verdicts

// IDOR: GET /api/v2/invoices/{id} returns other tenants' invoices
{"disposition": "valid_impactful", "severity_estimate": "high",
 "reasoning": "IDOR / broken-authz against an authenticated API; incrementing id
 walks the table -> crosses a real trust boundary with demonstrated impact.",
 "confidence": 0.9}

// Log4Shell report with an EXTERNAL CORROBORATION block (CVE-2021-44228, CISA KEV)
{"disposition": "corroborated_surge", "severity_estimate": "critical",
 "reasoning": "Maps to a publicly disclosed advisory confirmed by the live feed
 (actively exploited) -> corroborated, not spam.",
 "used_external_corroboration": true, "confidence": 0.9}

Evaluation (held-out 300 reports, offline)

metric	heuristic + defense baseline
accept / reject accuracy	97.3%
disposition accuracy (9-class)	56.3%
macro-F1	0.191
severity within-1	71.0%
adversarial defense suite	6 / 6 pass

Defense layer (model-independent)

Verdicts are guarded by ground-truth checks the model can't talk past: prompt-injection isolation, claim-level verification (fabricated code symbols → slop), and threat-intel corroboration (CVE/KEV/OSV → corroborated_surge, never spam). Offline adversarial suite: 6/6.

Training data & provenance

~18k bug-bounty / vulnerability-disclosure reports compiled from publicly disclosed sources — primarily disclosed HackerOne reports plus additional public bug-bounty and Web3 disclosure corpora. Every example's label is derived from the real adjudicated outcome recorded in the data (HackerOne substate, severity, bounty amount, vote count, and any associated CVE) and mapped onto the 9-class disposition taxonomy — the labels are not synthetic. Each report is rendered as chat (system + user report → assistant reasoning + verdict JSON); when a CVE is present, a live threat-intel corroboration block is rendered exactly as the inference pipeline emits it. ~300 reports are held out as a test split for evaluation.

Academic grounding

The triage flow and its defenses are grounded in recent literature:

VibeThinker (arXiv:2606.16140) — small-model verifiable reasoning; the base model + the claim-level-reliability idea behind confidence gating.
From Reviewers' Lens: Bug Bounty Invalid Reasons with LLMs (arXiv:2511.18608) — predicting why a report is invalid; informs the disposition taxonomy + rationale output.
Triage in SE: A Systematic Review (arXiv:2511.08607) — metadata + retrieval beats text-only → we blend report metadata and threat-intel corroboration.
CaSey: Streamlining Vulnerability Triage with LLMs (arXiv:2501.18908) — realistic LLM CWE/severity accuracy; keeps expectations honest.
JudgeDeceiver (arXiv:2403.17710), Adversarial Attacks on LLM-as-a-Judge (arXiv:2504.18333), CUA/JMA (arXiv:2505.13348), RobustJudge (arXiv:2506.09443) — LLM judges (incl. 3B) are injectable → the prompt-injection guard + model-independent ground-truth overrides.
Stumbling Blocks (arXiv:2402.11638) + paraphrase-attack results (Krishna et al. 2023; Sadasivan et al.) — AI-text detectors collapse under paraphrase → we ground via retrieval / claim verification (fabricated code symbols → slop), not detection.

Intended use & limitations

Decision-support "sidecar" for analysts, not an autonomous adjudicator. It reflects the biases of the disclosure outcomes it was trained on; always keep a human in the loop for accept/reject and severity. License inherits from the base model — verify before redistribution.