A table in a Kalceo technical sheet, clean and well-formatted. Row: "exceptional apprenticeship grant, levels 5 to 7: €5,000". Citation: service-public.fr. A contractor reads this sheet, builds a profitability model, evaluates whether hiring an apprentice is viable, makes a decision.

Except the real rate, verified on March 9, 2026 against form F23556 on service-public.fr, is:

  • €4,500 for level 5 (BTS, DUT, BUT)
  • €2,000 for levels 6 and 7 (Bachelor, Master)

Not €5,000. The gap can be the difference between "I'll hire a Master's-level apprentice" and "I won't hire anyone." The sheet was corrected before publication. But it should never have contained that error.

This is not an isolated case. It is the structural problem with LLM agents: they produce authoritative-looking text on subjects they only partially grasp. They don't improvise, they synthesize. And in that synthesis, they interpolate, round off, confuse, with a precision that disarms suspicion.

The Problem Is Not Just the Model

Built-in safeguards are improving. Models too. But in 2026, that is not yet enough. Quality varies from one run to the next, from one subject to the next, for no apparent reason. A model that produces a flawless article on social economy can, the next day, generate an approximate tax statistic on the same topic.

"Every output from an LLM is subject to hallucinations: even if models improve and agentic systems incorporate built-in safeguards, today that is not yet sufficient. Some models cut corners. The reason is sometimes obscure — from one day to the next or from one subject to another, the model is sometimes more on point."

The response is not to wait for better models. It is an architectural response.

"This is why you need to decouple responsibilities and create adversarial agents that run separately from content production itself. Each agent has its own skills, objective, memory, context... that is how you create healthy and reliable control loops."

One agent produces. Another verifies, separately, with a distinct execution context and an opposing objective: actively searching for what does not hold up. This is the foundation of the adversarial pattern. Not quality control added as an afterthought: an architectural decision.

Act 1: The Pivot (May 17, 2026, Bloomii)

Bloomii is a media outlet covering social and environmental alternatives. Every published figure engages the project's credibility: if an article claims that the French social economy represents 10% of employment when it actually represents 10% of GDP, you lose the trust of readers who know the subject.

The first fact-checker report, dated May 2, 2026, blocks an article on regenerative agriculture. The cause: a CIAT statistic claiming "78% higher profitability and an average ROI of 176% across 4 farms." The cited page exists, returns HTTP 200, does discuss regenerative agriculture. But the exact figure appears nowhere in the text. The fact-checker cannot validate. It blocks.

At that point, fact-checking is still ad hoc. Two weeks later, everything changes.

On May 17, in the Bloomii repository:

feat(agents): enforce fact-check on all channels — brèves, X threads, newsletter

Before: fact-checking on request. After: mandatory pass on every outbound channel. Daily news briefs, X threads, newsletter, long articles, initiative atlas. No content can move to Review without passing through the fact-checker.

What triggered the decision:

"Proofreading articles revealed generation errors that were not acceptable for a serious news outlet. Online, you stake your credibility, and it erodes quickly. It is a matter of rigor, and it is a differentiating argument against other approximate outlets that relay unverified information."

Results are immediate. Concrete example: the script for a video on the French social and solidarity economy (ESS) states that the sector represents "ten percent of French employment". The Direction générale du Trésor says the opposite: 10% of GDP, and 13.7% of private sector employment. Not the same thing. Script corrected before publication.

On the X threads covering legislative articles 7 through 11, the same pattern: participation figures, session dates, vote results. Everything passes through the fact-checker. The Irish Citizens' Assembly on Biodiversity (2022-2023): 99 randomly selected members, 83% in favor of a constitutional referendum, report submitted to Parliament on April 5, 2023. Verified against citizensassembly.ie.

Across the 48 Bloomii reports accumulated between May 2 and June 11, 2026, the majority conclude with PASS after corrections applied. The volume says more than the detail: one topic per report, one to three corrections per report on average, across subjects ranging from the Mondragon cooperative to Porto Alegre participatory budgets and local initiative atlas entries.

Act 2: The Source Registry (May 30, 2026)

Two weeks after the pivot, an efficiency problem surfaces. The fact-checker and the source-researcher are often working on the same sources. One finds and validates URLs, the other opens them and confirms claims. Two agents, often the same domains, double token spend.

On May 30, in the Bloomii repository:

chore: add shared source-registry and wire it into fact-checker/source-researcher

The source registry is a file shared between both agents: .agents/knowledge/source-registry.md. It references known domains, their access status (HTTP 200, 403 anti-bot, timeout), validated fallbacks, and already-completed verifications. For example, certain scientific publishers systematically block automated requests. The registry documents the validated alternative source. No agent attempts the fetch, discovers the block, or searches for alternatives by trial and error. It consults the registry and applies the substitute source directly.

On the decision to merge this resource:

"The work was being done twice by two different agents. Merging this resource did not undermine their respective efficiency and objectivity, but it contributed to building a shared base and thus saved tokens. It also allowed for lasting source traceability, not just temporary work that gets forgotten."

Three distinct benefits:

Token savings. Verifications are cached. A domain tested once is not re-tested on every article. Across 48 reports, the cumulative savings are substantial.

Lasting traceability. Verification reports stay in the repository. A figure verified today is not lost after the run. It is auditable and available for the next article on the same subject.

Independence preserved. Both agents share a source registry, not a judgment. The source-researcher and the fact-checker continue to work separately toward their respective objectives. This separation is precisely what creates the adversarial value. The fact-checker does not have access to the source-researcher's reasoning, and vice versa.

A shared registry does not dilute the control loop: it optimizes it.

Workflow: From Ticket to Verdict

Fact-checker workflow: from ticket to verdict

Act 3: The Extension (June 2026)

The pattern scales beyond long-form articles. It applies to any outbound content, regardless of surface.

Kalceo: Regulatory B2B Content

Kalceo produces technical sheets for construction contractors: VAT on renovation work, apprenticeship grants, electronic invoicing, unpaid invoices. The risk is not editorial, it is legal and financial. A contractor who acts on incorrect information about apprenticeship grants does not miss a blog post, they miss a €2,500 grant.

The apprenticeship grant catch detailed in the opening is the textbook case. But another report, on unpaid invoices, illustrates a subtler error type.

The opening testimonial described "Stéphane, a painter from Nice, waiting on a client who owed him €4,000." The verified source, an online debt-recovery platform (GCollect), said "Stéphane, a craftsman from Nice." No profession. No amount cited. The passage was a reconstruction by the writer from insufficient context. Deleted. Six corrections total in that single report: invented profession, invented amount, wrong source attribution (FFB/Altares replaced by EY/Altares/Banque de France), generic homepage links replaced with actual article source URLs.

12 Kalceo reports over 3 weeks (April 16 to May 6, 2026). Topics: VAT on construction, quotation models, e-invoicing platform (including identification of incorrect terminology around the Public Invoicing Portal (PPF) and its relationship to Chorus Pro, an outdated count of 112 accredited dematerialization platforms (PDP), and an unverifiable CAPEB statistic), unpaid invoices, grants and subsidies, electronic invoicing penalties.

Ekioo: Self-Reporting

Ekioo applies the same fact-checking to its own project pages and social media drafts.

On the project pages side: the VizMail project page claimed 43 features. After direct API verification (GET /api/skill), the actual count is 38. Five announced features did not exist. Self-reporting: the article was describing a product in the same ecosystem, and the number was wrong.

On the social drafts side: LinkedIn and X drafts generated for the KittyClaw KPI dashboard contained two errors. "Every morning" in a tweet: the source article specifies that the review happens every hour, not every morning. "11 templates": the article says 11 tiles and 7 distinct templates. Six lines corrected.

It is the last filter before publication. Not just a numbers verifier: the fact-checker can validate the hook of a YouTube Short, check the objectivity and tone of a thread, detect bias in a chosen angle. The AccountBuildUp project takes this pattern further: an agent checks the thematic relevance of each generated post against a reference corpus of texts. image-factory and video-factory (the image and video production pipelines set up in KittyClaw for the various projects) also include systematic verifications before delivery: visual consistency, brand identity compliance, adherence to editorial criteria. Validation is not a separate step: it is a layer integrated into every production pipeline.

The Generalizable Pattern

Eight weeks, 61 verification reports across three projects with radically different profiles: a media outlet covering social alternatives (ideologically sensitive topics), a B2B regulatory SaaS (legally sensitive topics), and a technical blog about building this very system.

The pattern that emerges is not specific to editorial content.

A claim-checker on code comments: a docstring asserting that a function returns X when it actually returns Y is documentary hallucination. Code changes, documentation stays. An adversarial agent reads both and flags divergences.

A landing page fact-checker: commercial claims are a classic hallucination surface. VizMail 43→38 is already that pattern. The agent that writes the sales page and the agent that counts real features should not be the same.

A strategic premise audit: before committing to a quantified decision, verify that the figures underlying it are accurate. Same logic, applied upstream.

In each of these cases, the principle is identical: one agent produces, another verifies separately, with its own context and its own objective. Not redundancy. A control loop.

AI Scaling Is Only Viable If Verification Scales With It

Scaling content creation with AI agents solves one problem while creating another. If production multiplies tenfold but validation stays manual, the bottleneck shifts from production to human review.

"If AI is heavily used to scale content creation, humans themselves become a bottleneck if everything has to be verified each time. This is why we need production pipelines we can trust. The role of fact-checkers, gatekeepers, judges, validators, etc. is precisely to create that trust. And the feedback they produce allows content-producing agents to improve and converge toward higher-quality outputs with fewer iterations."

The fact-checker is not just a filter: it is also a learning mechanism. Each documented report (claim identified, correction applied, primary source) becomes a signal for producing agents. Memory adjusts, the skill evolves, the same errors recur less and less. Not a recurring cost: an investment that decreases as the pipeline matures.

Without automated validation, AI does not truly scale. It simply shifts the bottleneck.

What You Would Expect From a Human

A professional editorial team would not send an article without proofreading, source verification, and style review. A lawyer would not deliver a document without checking legislative references. An art director would not sign off on a visual without brand compliance verification.

This quality control is not an exceptional measure. It is standard practice in any serious production process.

The fact-checker is the automated equivalent of that role. It is not there because AI is particularly unreliable: it is there because no production system, human or artificial, should publish without validation. The difference: it runs on every ticket, without exception, without fatigue, with a documented report.

Credibility as Infrastructure

One might object that systematic sourcing against primary sources produces tepid, analytical-voice-drained content. The response is direct: "You need to cite the right source, avoid flat formulations." Sourced does not mean flat. The verification constraint does not prevent assertive framing, tension between figures, editorial choices.

What it prevents is publishing "€5,000" when the legal rate is "€4,500." What it prevents is attributing a profession to Stéphane that the source does not mention and an amount that is not in the source. What it prevents is letting "every morning" slip through when the article says "every hour."

A serious outlet is not the one that publishes the most, or the fastest. It is the one that can defend every figure, every date, every claim in every article, at any time. Institutionalizing fact-checking means refusing to become one of those approximate relay outlets whose credibility erodes article by article.

Credibility is not a style. It is infrastructure.