How we train brand tone in 4 hours, not 4 weeks
Why most LLM tone-tuning approaches fail, and the per-customer fine-tuning pipeline we built instead.
Maya Okafor
Founding Engineer · Apr 28, 2026 · 7 min read
For the first year of QuickPly, every customer onboarding involved the same conversation: "How long until it sounds like us?" The honest answer used to be four to six weeks. Today it's four hours. This post is about what changed.
The two-axis problem
Brand tone isn't a single dial. There's the macro level — luxury versus playful, formal versus casual — and the micro level: do you use em dashes? Oxford commas? Do you sign off with the agent's name or the brand's? A model that gets the first right and the second wrong reads worse than one that nails neither, because the gap is uncanny instead of generic.
Most teams handle this with a giant system prompt. That works for the macro axis. It fails completely on the micro axis, because micro decisions are statistical: a brand uses contractions 84% of the time, not always or never. You can't write that as a rule.
What we tried first (and why it didn't ship)
- Full fine-tunes per workspace — too expensive, too slow, and brittle when style evolves.
- RAG over historical replies — the model still defaulted to its base distribution; retrieved examples felt pasted in.
- LoRA adapters — closer, but training 1,000+ adapters on Groq's stack was operationally painful.
What actually worked
The shipping pipeline is unglamorous. We sample 200–500 of the customer's best historical replies, score each on twelve style dimensions with a separate classifier, and produce a compact style fingerprint — about 1.2 KB of structured JSON. That fingerprint goes into the prompt on every generation. The model isn't fine-tuned; it's primed.
The fingerprint captures the things rules can't: ratio of questions to statements, average sentence length, contraction frequency, emoji usage by sentiment, opening and closing patterns. Twelve numbers do most of the work.
Why this beat fine-tuning
Updates are instant. A customer changes their voice guide on Tuesday, we re-score on Wednesday morning, and Wednesday's drafts already reflect it. With fine-tuning, that loop took us a week.
What's still hard
Mixed-tone brands — where customer support sounds different from marketing — break the fingerprint, because we end up averaging across two voices. Our current workaround is per-channel fingerprints (Gmail vs. Zendesk vs. Intercom), but it's a patch, not a fix. The right answer is probably learned channel embeddings. That's next quarter's project.
If you want to see your own fingerprint, connect your inbox and check the Tone tab in the dashboard. The whole thing finishes in about four hours for most workspaces — fast enough that we can do it during the free trial.