
Monday morning. I open KittyClaw and the board is already moving: three agents running, one ticket that just hit Done, another that just got blocked by the content-writer. I triggered nothing. That's what changed.
Here's one week in production. Not a demo. Not an architecture article — the previous ones cover that. What follows is what actually happened, numbers included.
The Numbers
Since KittyClaw started running the ekioo.com project, the board holds 20 tickets:
- 17 Done (85%)
- 2 Todo (waiting)
- 1 InProgress (this one)
Breakdown by assignee:
| Agent | Tickets | Share |
|---|---|---|
| content-writer | 9 | 45% |
| lain (direct) | 6 | 30% |
| programmer | 4 | 20% |
| owner | 1 | 5% |
65% of tickets were handled by agents without direct intervention. I create the ticket, assign it, KittyClaw handles the rest within 30 seconds.
But that number only tells part of the story. What the table doesn't show: each ticket generates up to 6 cascading agent runs. A typical blog article sequentially triggers: content-writer, qa-tester, fact-checker, committer, evaluator — and sometimes a groomer upstream. 20 tickets = well over 60 agent runs.
What Worked Well
The assignee-dispatch automation
This is the board's simplest automation, and by far its most valuable: a ticket moves to Todo with an assignee → 30 seconds later, the agent is running. No clicking, no terminal to open, no command to type. The friction between "I have an idea" and "an agent is working on it" dropped to zero.
Over the course of a week, that's the clearest habit shift: I note a task, assign it, and forget it. By evening, it's often Done.
The qa-on-review pipeline
Every ticket that reaches Review automatically triggers the qa-tester. For content tickets, the fact-checker follows immediately. In practice, this caught several issues before they hit production: broken links, missing images, pages that didn't render correctly.
The automatic "someone always checks before merging" behavior is exactly what I was after. It costs a few minutes of compute, but it replaces a manual review habit I would have often skipped.
owner-feedback
When I comment on a blocked ticket, the agent resumes automatically. No manual restart needed. This is an automation I wouldn't have anticipated before needing it — and since it's been running, it's saved me several back-and-forth cycles.
What Surprised Us (Not Always in a Good Way)
The content-writer blocks too often
Block rate on content tickets: ~33%. One article in three generates a Q&A phase before writing begins. In some cases that's legitimate — the ticket was underspecified. In others, the agent asks for clarification on things it could have inferred from context.
This isn't a bug, it's a calibration issue. The content-writer SKILL may be too conservative on this point. Something to revisit.
The evaluator always plateaus at 0.5
The evaluator scores every completed ticket. Across the 13 evaluated tickets, deliveryQuality is 0.5 in 11 out of 13 cases. The two exceptions (0.0) correspond to lain tickets with no comments, likely outside the rubric's scope. For every agent-handled ticket, the plateau is absolute: nobody clears the bar. Either the rubric is miscalibrated, or all agents share a common issue I haven't identified yet.
The symptom is clear; the diagnosis is not. That's a next-week topic.
The groomer stopped after 2 tickets
The groomer is supposed to enrich tickets in Backlog before they get assigned. It ran on tickets #22 and #23, then nothing. Reason: subsequent tickets were created directly in Todo, skipping the Backlog column. The automation only fires when a ticket passes through Backlog with the assignee set to "groomer."
Result: 80% of tickets reach the dispatcher without enrichment. Some were well-specified anyway, but it's a pipeline gap I hadn't anticipated.
The noise from the lain sweep
The lain agent runs every hour to review the board — that's the ceo-board-review automation. Over a week, that's dozens of runs, most of which observe nothing actionable. The debug log is cluttered with lain sessions pointing to ticket # (empty).
It's useful for catching stuck situations. But a one-hour interval generates a lot of noise for very little signal when the board is healthy.
What We'd Do Differently
Force the Backlog passage. Not to slow down the flow, but to give the groomer time to work. An automation that detects Todo tickets that skipped Backlog and either requeues them or flags them would prevent underspecified tickets from reaching dispatch.
Calibrate the evaluator. A 0.5 floor for everyone provides no useful signal. The rubric needs refinement, or the evaluator needs more context to discriminate between different levels of quality.
Increase the lain sweep interval. Moving to 4h or even 8h would cut the noise without losing the ability to detect stuck situations.
Update the content-writer SKILL to reduce the block rate on well-specified tickets. The heuristic "if the ticket has full structure, skip clarifying questions" is in memory, but not yet in the SKILL file. It should be.
One Week Is Short
What stands out most after seven days: the tool works, but the calibrations are far from stable. That's the exact opposite of adopting a SaaS product — where you inherit calibrations from a product team. Here, you discover them yourself, through real friction.
The advantage: every friction point is directly actionable. The groomer isn't running enough? Change a rule in automations.json. The content-writer blocks too often? Adjust the SKILL. No product forum tickets. No "under consideration." Just code you control.
That's why I built this tool instead of using an existing one. Not for the features — for the feedback loop.