Cite-or-discard as a product primitive: how the MessageVerifier earns its keep

The single piece of the agentic stack that does the most work for trust isn’t the executor and isn’t the advisor. It’s a small, opinionated verifier that reads every draft the agent produces, checks every factual claim against the lead’s actual context, and throws the draft away if any claim is uncited. Not a quality score. A binary. Here’s why “cite-or-discard” turned out to be the right shape, and what shipping it that way costs us.

The failure mode worth designing around

The category-defining failure mode for an AI sales agent isn’t a low reply rate. It’s one email — one — that leaves the user’s mailbox containing a fabricated claim. “I saw your team just raised a Series B” when there’s no Series B. “Congrats on launching the Berlin office” when the Berlin office is an old footer never updated. “Your competitor X just signed with us” when X did not. The lead replies “where did you get this?” or, more often, replies nothing at all and flags it as spam, because that’s what humans do when an obvious lie shows up in their inbox.

The user — our buyer, the B2B SaaS founder doing their own outbound — will not tolerate this even once. Their domain is their company’s domain. The email signature is their name. They will pause the campaign and they will pause us, and they’re right to. So the question isn’t “how do we minimize hallucinations.” It’s “how do we make hallucinations impossible to ship,” accepting that the impossible bar forces some other costs we’ll have to absorb.

Why not a quality score

The first instinct, and the instinct most LLM-eval tooling encourages, is to score the draft. Give it a 0-to-1 hallucination probability. Set a threshold. Send anything below it. The trouble with a continuous score is that a threshold is always wrong somewhere. Set it tight and you throw away drafts that were fine. Set it loose and you ship drafts that weren’t. The threshold becomes a knob and the knob becomes a thing you tune, and tuning it means measuring the failures, which means letting some through to measure.

The cite-or-discard reframe sidesteps the knob entirely. Every claim in the draft either points at a specific span of the lead’s context — a website paragraph, a job posting line, a discovery probe finding — or it doesn’t. If it does, fine. If it doesn’t, the draft is broken; throw it away and ask the executor to draft again with the verifier’s reasons attached. Re-drafting once or twice usually closes the gap. If it doesn’t close after a configurable retry cap, the agent gives up on that decision and escalates — not silently, with a decision log entry the user can see.

What “claim” means in practice

The verifier prompt is unglamorous and deliberately so. It reads the draft, lists every concrete assertion about the lead — their company, their product, their team, their market, their recent activity — and for each one asks: where in the context does this come from? The acceptable answers are spans of source text the executor was given. Anything else is uncited.

Generic copy is exempt by construction. “I’m reaching out because we help teams like yours” makes no factual claim beyond “we exist and we help teams,” which is a self-claim rather than a lead-claim. “I saw on your careers page that you’re hiring three SDRs” is a lead-claim, and the careers page text either supports it or doesn’t. The verifier doesn’t police style. It polices facts.

The interesting failure mode the verifier catches in practice isn’t the obvious one. It’s the plausible-sounding inference. The executor sees a lead in the events industry and writes “I imagine you’re juggling a few summer festivals right now.” The lead’s context says nothing about summer festivals. The claim is plausible — events industry, May — but it’s uncited, and the verifier throws it. Good. Plausible-but-uncited is exactly the class of claim that erodes trust slowly, because one in ten of them will be wrong in a way the user can’t predict.

What it costs us

Three real costs, none of them theoretical:

Lower send rate per decision. Some drafts can’t be made citably for a given lead — the lead’s context is thin (a website with one paragraph and no signal). The executor will try, the verifier will throw, the executor will try again with a tighter scope, and sometimes after the retry cap there’s no draft worth shipping. The decision becomes a WAIT or, in non-email channels, a manual-outreach task. The user sees fewer sends per campaign than a tool that doesn’t gate this hard. That’s the trade.

Extra LLM cost per decision. Drafter pass plus verifier pass is roughly 2× the token spend per accepted draft, plus the cost of any rejected drafts that got re-tried. We budget for this in the per-lead cost ceiling and the campaign auto-pauses when a campaign starts burning cost without producing accepted drafts. That’s worth the line item.

A class of clever drafts we can’t ship. If the cleverest possible opener for this lead requires an inference the context doesn’t support, we don’t ship it. A human writing for hours might land that inference because they know things the agent doesn’t. The agent can’t. The ceiling on draft cleverness is “what the context will support.” We’ve decided that’s the right ceiling.

Why “verifier” and not “evaluator”

The naming was deliberate. An evaluator implies a score and a threshold. A verifier implies pass-or-fail. The team and the prompt and the API all use “verify.” A draft is either verified or it isn’t. When it isn’t, the reasons go straight back to the executor, the user sees them in the AI Inbox when verification fails repeatedly, and the campaign records the failure in its activity feed.

One of the AI Inbox task types is, literally, MESSAGE_VERIFICATION_FAILED. When a draft fails verification more than the retry cap, the user sees the draft, the verifier’s reasons, and the lead’s context side-by-side. They can manually approve (if the verifier’s being too strict — rare but real), edit, or discard. The escape hatch is exposed because the alternative — silently dropping the lead — felt worse than asking the user one question.

Cite-or-discard composes

The piece that makes the primitive useful beyond a single drafter is that it composes with every other generator we ship. Reply drafts go through it. Manual-outreach scripts for non-email channels go through it. The brand-voice extractor, when it proposes a brand-voice sample from the user’s sent folder, goes through a related cite-check against the source email. Anything the agent produces that will end up in front of a human runs the same gate.

We didn’t plan that as a feature. We started with the drafter, built the verifier for the drafter, and then every subsequent surface that produced text wanted the same gate. That’s the test of a good primitive — when the next thing you build reaches for it without you having to remember it.

What we still get wrong

The verifier isn’t perfect. Two real classes of miss:

Overclaim by association. The lead’s context mentions “we serve enterprise customers” and the draft says “I saw you’ve signed three Fortune 500 logos.” The first part is cited; the second invents specificity. The verifier catches most of these. It misses some when the specificity is subtle. We watch the failure log and tighten the prompt when patterns emerge.

Stale context. The lead’s website said something true six months ago that’s no longer true. The verifier can’t know that. The draft cites the website and ships. The lead replies “we don’t do that anymore.” That’s not a verifier bug — the citation was real. It’s a freshness bug in the discovery layer, and we treat it as one: re-scrape on a TTL, store fetched-at timestamps, mark a draft as low-confidence when the citation is from context older than 60 days.

If you’ve been burned by a fabricated email

You’re the user who tried one of the category tools and the first batch went out with two emails confidently mentioning a competitor of yours by name as though the lead was already a customer of theirs. You paused the tool and you don’t trust the category. The reflex is to demand fewer sends, smaller scope, more manual review. That works, but it makes the tool not a tool.

The cite-or-discard primitive is what makes the tool autonomous and trustworthy at the same time. The agent can run on its own because the gate makes “ship a fabricated claim” impossible by construction. The user doesn’t have to read every draft. They have to read the AI Inbox when verification fails or when something else needs them. The rest, the agent handles.

If you want to see the verifier in action against a real lead — including watching a draft fail and the agent re-draft — the 14-day trial exposes the per-lead decision log under each lead in a campaign. You can read every drafter pass, every verifier pass, every rejection reason. The agent doesn’t hide its work.

— Tobias Duelli, founder · tobias@overwise.ai