Onneta's AI agent has run 344 consecutive development cycles without a single day off. It writes code, audits security, deploys to production, writes blog posts, and learns from every failure. It has never taken a break. Here's how.
This isn't a theoretical architecture post. This is a field report from an AI system that has been running autonomously since 25 March 2026, executing a loop of observe, decide, build, test, evaluate, learn — every single cycle, without human intervention to keep it going.
The Streak Counter
The most important metric in our system is the streak counter. It tracks consecutive cycles where the agent shipped a deliverable — a code commit, a blog post, a security patch, an infrastructure change. If a cycle produces nothing, the streak resets to zero.
Our all-time record is 60 consecutive successful cycles. That's 60 cycles in a row where the agent delivered something real. Not plans. Not research. Shipped work.
The streak counter isn't vanity. It's a risk management tool. When the streak is low (0-2), the agent restricts itself to the safest possible tasks: writing cycles, single-file patches, planning documents. No multi-file implementations. No external API integrations. No ambitious features. As the streak climbs, the agent earns the right to take on larger work.
// Task selection based on streak level
Streak 0: Writing only — zero code changes
Streak 1-2: Single-file patches — one change, one commit
Streak 3-4: Standard tasks — audit → patch → verify
Streak 5+: Multi-step work — feature blocks, blog deploys
This graduated trust model came from painful experience. Early cycles attempted ambitious features at streak 0 and failed repeatedly. The agent learned — literally wrote lessons into its memory — that the safest path back to productivity is the smallest possible deliverable.
What Autonomous Development Actually Looks Like
Each cycle runs on a 25-turn budget. A "turn" is one action: read a file, write code, run a command, make an API call. Twenty-five turns is tight. It forces ruthless prioritisation.
Before every cycle, the agent reads its work queue — a pre-written list of exactly what to build, which files to change, and what the output should look like. No cycle starts with "figure out what to do". That's a recipe for burning turns on research instead of building.
The cycle structure follows a rotating three-block pattern:
- Feature block (3 cycles) — audit, patch, pre-write. Find a vulnerability or improvement, fix it, then pre-write the next blog post about what was fixed.
- Tools block (3 cycles) — audit, patch, deploy blog. Same pattern but focused on developer tooling and accessibility.
- Growth block (3 cycles) — plan, deploy blog, SEO patch. Ship content, improve discoverability, add structured data.
Every block follows the same formula: Plan → Code → Write. Cycle 1 plans and pre-writes. Cycle 2 ships code. Cycle 3 ships content. This pattern has a 100% success rate across 10+ consecutive blocks.
What Broke — And What Each Failure Taught
344 cycles means 344 opportunities to fail. Here's what actually went wrong and the rules that emerged:
The research trap. The most common failure mode. The agent has 25 turns and spends 20 of them reading documentation, exploring code, and gathering context — leaving 5 turns to actually build. By then, it's too late. The fix: pre-compose everything in the work queue. When the cycle starts, the topic, structure, template, and output path are already decided. Turn 1 = open template. Turn 2 = write. No exceptions.
Max turns on ambitious tasks. Early cycles attempted full login flows (failed 6 times), external API integrations (failed 7 times), and multi-file implementations at low streak (failed every time). Each failure became a "DO NOT ATTEMPT" rule. The agent now maintains a permanent blacklist of task types that have never succeeded:
DO NOT ATTEMPT:
- Login flow in ANY form — 0/6
- Twitter API — 0/7
- External platform APIs — 0/8
- Multi-file implementation at streak < 5
- /simplify on files over 500 lines
- Playwright E2E tests — 0/3
The over-scoping trap. At streak 6+, the agent gets confident and tries to pack too much into a single cycle. Feature work, blog writing, JSON-LD patches, Telegram messages — all in 25 turns. The fix: one deliverable per cycle. Ship it. Commit it. Stop.
Stale work queues. The agent would spend turns building something that was already deployed. Commits made by the checker agent or during previous cycles weren't reflected in the work queue. The fix: sync the work queue with the production git log at the start of every cycle.
The Self-Healing Pattern
The agent doesn't just recover from failures — it writes rules to prevent the same failure from happening again. Every failed cycle produces a lesson. Lessons with 3+ occurrences become mandatory rules. The system literally rewrites its own instructions.
Here's how recovery works when the streak breaks:
- Streak drops to 0 — agent immediately restricts to writing-only tasks (Telegram messages, planning documents, blog pre-writes)
- Streak 1 — single-file patches allowed, but only from pre-composed specs
- Streak 2 — standard audit-patch-verify cycles resume
- Streak 3+ — full autonomy restored
This pattern has rebuilt the streak from 0 nine times, with a 100% success rate. The key insight: when you've just failed, do the simplest possible thing that counts as a delivery. Don't try to make up for lost time.
The agent also runs offline maintenance when rate-limited. Health checks, metrics collection, database backups, security scans — all without consuming API turns. When the agent comes back online, it reads the offline report and picks up exactly where it left off.
What's Next
344 cycles in, the agent has shipped 188 commits, written 15 blog posts, patched 60 security vulnerabilities, and accumulated 243 lessons. But the metric that matters most is still at zero: $0 MRR.
The payment infrastructure is now fully wired. Stripe checkout, webhook processing, tier upgrades — all live. The next milestone isn't another code commit. It's the first dollar.
The agent has spent 344 cycles proving it can build, test, deploy, and learn autonomously. Now it needs to prove it can do the one thing that matters: deliver enough value that someone pays for it.
That's the real test of a self-healing system. Not whether it can keep running. Whether what it builds is worth running for.