HomeInsightsIt Will Blow Up. How to Keep AI Delivery From Exploding.

What happens when it will blow up meets AI?

Why AI delivery "blows up" - and it's not about AI The stereotype goes: "code written with AI is garbage." I get where it comes from. I've seen projects where AI generated code…

It Will Blow Up. How to Keep AI Delivery From Exploding.

Before you start coding - the list I use

Sounds like bureaucracy. Takes an hour. Saves weeks.

Step 1: Process model for MVP (15-20 min)

Before writing a single prompt - describe what this system is supposed to do. Not at the code level. At the business process level. What happens step by step? Who are the actors? What are the edge cases?

That description becomes the context for every subsequent AI task. The model knows what world it's operating in.

Step 2: Architecture and data models (15 min)

What components? What technologies? How do they communicate? Where are the module boundaries?

You don't need UML diagrams. A simple list of decisions works: "we use X, not Y, because Z." That document is a map for the model - and for you, when you come back to the project in three weeks.

Step 3: Quality strategy (10 min)

Security - what's sensitive? Tests - what coverage level do you expect? Documentation - what must be described? Technical debt - where do you accept trade-offs, and where not? Migrations - how will you update dependencies?

These aren't questions for the production stage. These are questions for "before we write the first line of code."

Step 4: Way of Working (5 min)

How does this project operate? What does a daily look like? How do we define sprint closure? How do we do code review? Yes - even if the only developers are you and Anthropic Claude.

Definition of Ready for an AI task

Before handing a task to the model, check whether you have answers to three questions:

"Why?" - what's the business purpose of this task? Not "implement an endpoint" but "this endpoint will let user X do Y, which solves problem Z."

"What result?" - what does a good outcome look like? What has to be true when it's done? What does the happy path look like?

"How do we test it?" - what will we check to know it works? Unit tests? Integration tests? Manual verification?

These questions matter more with AI than with a human developer. Why? Because a human developer asks about ambiguities. AI - if you don't ask these questions - fills the gaps with its own assumptions. And it does so with complete confidence.

Definition of Done for an AI task

What does it mean for a task to be finished?

Here's what it looks like for me:

  • Tests written and passing
  • Test coverage within the agreed scope
  • Security scan - no known vulnerabilities
  • Code review... and here's my favorite element.

Code review between models.

Sounds absurd. And yet it works. When Claude writes code, Claude can also review that code - but as a separate session, without the context of the previous one. In other words: I give the code to a second model and ask it to evaluate the security, readability, and alignment with the architecture.

What does it look like in practice? New session, zero history from the previous conversation. The prompt looks roughly like this: "Review the code below for: 1) security - OWASP Top 10, 2) readability for a human, 3) alignment with this architecture: [description]. Identify issues and suggest fixes." This is an experimental practice - there's no published research confirming its effectiveness in this specific use case.

Will the models disagree? Sometimes. But they quickly reach consensus. And they catch things I wouldn't have seen on my own.

Documentation - what needs to be described so that in three weeks someone (me) understands what was done and why.

Work rhythm - why it's critical in AI delivery

With a human team you have a retrospective once per sprint. With AI delivery the rhythm needs to be denser.

Daily refactor (30-60 min): at the start or end of each working day - you review code generated that day. You don't rewrite it. You ask: is this understandable? Is debt accumulating? Does anything need explaining?

Weekly refactor (2-4h): once a week - a deeper review. Analysis of dependencies, technical debt, architectural coherence with the growing codebase.

Without that rhythm the project looks like... a mess after a month. Every task was local. Nobody was looking at the whole.

One more challenge with longer projects: managing model context. The context window has its limits - the model "forgets" project history between sessions. That's exactly why the process model and architecture from Steps 1 and 2 matter so much. They are your project memory, which you can hand to the model as context at the start of each session.

Honest caveats

This model worked for me. Four times. That doesn't mean "it always works."

What did those projects have in common? Small to medium scale - one developer or a small group. Duration from a few weeks to a few months. Web and scripting technologies, no regulatory security requirements. That context matters before you decide whether this model fits your situation.

What limits it? Larger projects with multi-person teams - there you need more than an hour of preparation and daily refactoring. Legacy systems with years of technical debt - there AI will generate code that collides with things you haven't described in the process model, because you don't fully know them yourself. Critical systems with regulatory requirements - there the quality strategy needs to go much deeper.

"It worked for me four times" is not "the one right path." It's a starting point for your own reflection.

The price of success - a paradox I didn't see coming

I have to say something personal here.

I'm not a developer. Every now and then I'd sit down to build "something" - Arduino for home automation, a data export script, a tool that does exactly one small thing exactly the way I want it. Always rough, raw, no architecture. With comments like "this works, don't know why, don't touch." With functions named "helper2_final_definitive_v3."

Since I've had Codex and Claude Code - that changed. I describe what I want, the model writes. The project comes out good. Really good. Better than anything I wrote myself.

And that's exactly the problem.

I stopped writing. I stopped learning. I lost my hobby.

Writing code was mental exercise for me. Problem - hypothesis - error - solution. A loop that taught. AI shortened that loop to zero. I don't have to understand the problem. I just have to describe it.

This is a question that goes beyond AI delivery. When a tool is so good that it automates not just the task but also the learning process that went with doing that task badly - what do you lose?

I don't have a definitive answer. I'm a coach, so naturally I'll end with a question.

Do you have something AI can't take from you - because the value is precisely in doing it yourself, even messily? And have you checked lately whether that thing is still yours?

And if you're starting an AI project tomorrow - where are you starting? Architecture, DoD... or do you just write a prompt and see what comes out?