Show their thinking — slower, but you see how each writer reasons
| 🪶 | ux-writing-1 (fine-tune)went to training camp | 3 | |
| 🌲 | Qwen3.6-27B (base)the wild stock model | 1 | |
| 🤝 | Tieboth copies would ship | 4 | |
| 🪵 | Both need workback to the drawing board | 0 |
8 campfire votes so far. Before launch, the author's own blinded review put the fine-tune at 83% (65/78 decisive) over the base model.
A blind taste test for UX writing
Each battle sends your copy to two UX writers: 🌲 Qwen3.6-27B (base) (Apache-2.0, ≈27.8B parameters) and 🪶 ux-writing-1 (fine-tune) — the same model after a QLoRA fine-tune on a hand-built UX writing dataset. Sides are shuffled every round; you vote before the reveal. Your vote literally trains v2 — votes are stored privately as preference data for the next training run.
🎯 Why trust the matchup?
Before launch, the author blind-reviewed all 90 held-out benchmark items the same way (options anonymized, judged, then unblinded): the fine-tune was preferred 65/78 = 83% of decisive comparisons. Methodology, eval code, and the training pipeline are open.
🔦 Fair-fight settings
Both writers get the identical prompt, greedy decoding, and the same token budget (1536 with lantern mode, 256 without). And because anything that fingerprints a writer would bias your vote — one of them reasons at much greater length — explanations, token counts, and timings stay tucked into identical "field notes" until after you vote. Judge the copy; the forensics come with the reveal.
🏕️ Take it home
The fine-tune runs anywhere: scan a whole codebase for copy issues with the CLI, run it on your laptop via GGUF, or teach it your own style guide in an afternoon. Built for the HF Build Small hackathon (small models, big adventure) on ≈$40 of compute.
⛺ model: gr33r/ux-writing-1 · code: github.com/content-designer/ux-writing-1