KittyClaw D+7: What We Learned Letting Agents Run a Real Kanban

KittyClaw D+7 — real metrics

Monday morning. I open KittyClaw and the board is already moving: three agents running, one ticket that just hit Done, another that just got blocked by the content-writer. I triggered nothing. That's what changed.

Here's one week in production. Not a demo. Not an architecture article — the previous ones cover that. What follows is what actually happened, numbers included.

This piece documents KittyClaw, the kanban orchestrator at the center of the Ekioo agent-fleet R&D. Alongside Bloomii (constructive-journalism media) and Kalceo (regulatory B2B SaaS for construction contractors), KittyClaw runs the AI agents that drive these projects in production.

The Numbers

Since KittyClaw started running the ekioo.com project, the board holds 20 tickets:

17 Done (85%)
2 Todo (waiting)
1 InProgress (this one)

Breakdown by assignee:

Agent	Tickets	Share
content-writer	9	45%
lain (direct)	6	30%
programmer	4	20%
owner	1	5%

65% of tickets were handled by agents without direct intervention. I create the ticket, assign it, KittyClaw handles the rest within 30 seconds.

But that number only tells part of the story. What the table doesn't show: each ticket generates up to 6 cascading agent runs. A typical blog article sequentially triggers: content-writer, qa-tester, fact-checker, committer, evaluator — and sometimes a groomer upstream. 20 tickets = well over 60 agent runs.

What Worked Well

The `assignee-dispatch` automation

This is the board's simplest automation, and by far its most valuable: a ticket moves to Todo with an assignee → 30 seconds later, the agent is running. No clicking, no terminal to open, no command to type. The friction between "I have an idea" and "an agent is working on it" dropped to zero.

Over the course of a week, that's the clearest habit shift: I note a task, assign it, and forget it. By evening, it's often Done.

The `qa-on-review` pipeline

Every ticket that reaches Review automatically triggers the qa-tester. For content tickets, the fact-checker follows immediately. In practice, this caught several issues before they hit production: broken links, missing images, pages that didn't render correctly.

The automatic "someone always checks before merging" behavior is exactly what I was after. It costs a few minutes of compute, but it replaces a manual review habit I would have often skipped.

`owner-feedback`

When I comment on a blocked ticket, the agent resumes automatically. No manual restart needed. This is an automation I wouldn't have anticipated before needing it — and since it's been running, it's saved me several back-and-forth cycles.

What Surprised Us (Not Always in a Good Way)

The content-writer blocks too often

Block rate on content tickets: ~33%. One article in three generates a Q&A phase before writing begins. In some cases that's legitimate — the ticket was underspecified. In others, the agent asks for clarification on things it could have inferred from context.

This isn't a bug, it's a calibration issue. The content-writer SKILL may be too conservative on this point. Something to revisit.

The evaluator always plateaus at 0.5

The evaluator scores every completed ticket. Across the 13 evaluated tickets, deliveryQuality is 0.5 in 11 out of 13 cases. The two exceptions (0.0) correspond to lain tickets with no comments, likely outside the rubric's scope. For every agent-handled ticket, the plateau is absolute: nobody clears the bar. Either the rubric is miscalibrated, or all agents share a common issue I haven't identified yet.

The symptom is clear; the diagnosis is not. That's a next-week topic.

The groomer stopped after 2 tickets

The groomer is supposed to enrich tickets in Backlog before they get assigned. It ran on tickets #22 and #23, then nothing. Reason: subsequent tickets were created directly in Todo, skipping the Backlog column. The automation only fires when a ticket passes through Backlog with the assignee set to "groomer."

Result: 80% of tickets reach the dispatcher without enrichment. Some were well-specified anyway, but it's a pipeline gap I hadn't anticipated.

The noise from the lain sweep

The lain agent runs every hour to review the board — that's the ceo-board-review automation. Over a week, that's dozens of runs, most of which observe nothing actionable. The debug log is cluttered with lain sessions pointing to ticket # (empty).

It's useful for catching stuck situations. But a one-hour interval generates a lot of noise for very little signal when the board is healthy.

What We'd Do Differently

Force the Backlog passage. Not to slow down the flow, but to give the groomer time to work. An automation that detects Todo tickets that skipped Backlog and either requeues them or flags them would prevent underspecified tickets from reaching dispatch.

Calibrate the evaluator. A 0.5 floor for everyone provides no useful signal. The rubric needs refinement, or the evaluator needs more context to discriminate between different levels of quality.

Increase the lain sweep interval. Moving to 4h or even 8h would cut the noise without losing the ability to detect stuck situations.

Update the content-writer SKILL to reduce the block rate on well-specified tickets. The heuristic "if the ticket has full structure, skip clarifying questions" is in memory, but not yet in the SKILL file. It should be.

One Week Is Short

What stands out most after seven days: the tool works, but the calibrations are far from stable. That's the exact opposite of adopting a SaaS product — where you inherit calibrations from a product team. Here, you discover them yourself, through real friction.

The advantage: every friction point is directly actionable. The groomer isn't running enough? Change a rule in automations.json. The content-writer blocks too often? Adjust the SKILL. No product forum tickets. No "under consideration." Just code you control.

That's why I built this tool instead of using an existing one. Not for the features — for the feedback loop.

KittyClaw D+7: What We Learned Letting Agents Run a Real Kanban

The Numbers

What Worked Well

The `assignee-dispatch` automation

The `qa-on-review` pipeline

`owner-feedback`

What Surprised Us (Not Always in a Good Way)

The content-writer blocks too often

The evaluator always plateaus at 0.5

The groomer stopped after 2 tickets

The noise from the lain sweep

What We'd Do Differently

One Week Is Short

Related articles

KittyClaw dashboard: how I produce and leverage my KPIs

Agentic System Hygiene — What You Learn the Hard Way

4 Products, 1 CEO Agent — Multi-Project Orchestration with Lain

KittyClaw D+7: What We Learned Letting Agents Run a Real Kanban

The Numbers

What Worked Well

The assignee-dispatch automation

The qa-on-review pipeline

owner-feedback

What Surprised Us (Not Always in a Good Way)

The content-writer blocks too often

The evaluator always plateaus at 0.5

The groomer stopped after 2 tickets

The noise from the lain sweep

What We'd Do Differently

One Week Is Short

Stay in the loop

Related articles

KittyClaw dashboard: how I produce and leverage my KPIs

Agentic System Hygiene — What You Learn the Hard Way

4 Products, 1 CEO Agent — Multi-Project Orchestration with Lain

The `assignee-dispatch` automation

The `qa-on-review` pipeline

`owner-feedback`