5 minute read

I read Martin Fowler and his team’s article on Structured Prompt-Driven Development as an architecture signal, not as a prompt-writing guide. What caught my attention is not the technique itself, but the underlying shift in responsibility: moving from individual prompt skill to system-level design.

My opinion is straightforward. If we keep treating prompting as a personal craft, we will keep getting local wins and organizational fragility. If we treat it as architecture, we can aim for repeatability, accountability, and controlled evolution. In my experience, teams rarely fail because a model is incapable; they fail because there is no shared architecture around how that model is used. This is exactly why Fowler and his co-authors’ framing matters: they are not proposing prompt cleverness, they are proposing delivery discipline.

Structured Prompt-Driven Development architecture flow

From prompt skill to delivery architecture

I have seen a recurring pattern in AI-assisted development initiatives. A small group of engineers gets very good results with ad-hoc prompting, the rest of the organization tries to imitate those results, and leadership assumes it has found a scalable operating model. It usually has not. One contribution I appreciate in Fowler’s article is that his team makes this distinction explicit: individual effectiveness is not the same as organizational reliability.

At that point, familiar architectural problems emerge. Different people use different implicit interfaces for the same task. Valuable knowledge ends up in private chats rather than reusable assets. Reviews focus on generated code while ignoring the generation conditions. Sensitive context is copied around with no clear policy boundary. To me, this looks very similar to a distributed monolith in integration architecture: productive in short bursts, expensive and risky over time.

This is why I prefer to think in terms of prompt assets rather than prompts. A reusable prompt module should be narrow in intent, explicit in assumptions, constrained by policy, and stable in output shape. I see no meaningful difference between this and how we design internal APIs or shared platform libraries. It is the same engineering discipline applied to a new interface. In that sense, I read Fowler and team’s structure as a software architecture concern disguised as AI guidance.

When I imagine a concrete case, such as ADR generation, I do not want each engineer inventing structure from scratch. I want a governed module that always returns context, options, trade-offs, recommendation, and risks. That consistency does not reduce creativity; it reduces ambiguity and makes architectural reasoning comparable across teams.

The boundary problem most teams underestimate

The part I consider most underestimated is context. Context injection is not a convenience feature; it is a data architecture concern with governance implications. Fowler and his collaborators repeatedly emphasize structure around inputs and expected outputs, and I think that is where many teams still underinvest.

I find it useful to frame context as an envelope with clear boundaries: what is required to do the work, what is conditionally useful, and what is restricted unless transformed or explicitly approved. Without that framing, teams usually end up with both compliance exposure and reproducibility problems. The output quality also becomes unstable because each run is shaped by arbitrary context choices.

A backlog decomposition assistant is a good example. I would allow domain glossary, service boundaries, and non-functional constraints, because those improve relevance. I would block production secrets, raw customer transcripts, and unapproved roadmap material, because those create risk with no proportional value. Architecture, in this scenario, is the mechanism that maximizes useful flow while controlling coupling and exposure.

Reliability requires deterministic controls

The other point I feel strongly about is evaluation. I do not trust generated output without gates, not because models are inherently poor, but because architecture should always assume variance.

For me, structured prompt-driven workflows become credible only when deterministic controls are built into the pipeline: format checks, policy checks, architecture conformance checks, and mandatory human review for high-impact changes. Without these controls, teams can ship plausible regressions faster than before, which is the worst possible outcome of acceleration. This is another point where I align with Fowler’s broader engineering philosophy: fast feedback without quality control is not acceleration, it is deferred failure.

In legacy migration work, for instance, my minimum expectation is simple: the build must pass, the existing test suite must pass without reducing scope, forbidden dependencies must stay out, and architecture rules must still hold. If one of these conditions fails, the workflow should go back to refinement instead of moving toward merge. I consider this non-negotiable.

This is also why I prefer orchestration to one-shot prompting. Real delivery work has stages, dependencies, and feedback loops. A staged flow that frames constraints, proposes a plan, generates changes, validates them, and documents rationale is more diagnosable and easier to improve than a single opaque interaction. When quality drops, I can see where it dropped and fix that stage.

From a governance perspective, I think a federated model works best: platform teams own shared modules and gates, product teams own domain context, and architecture or security functions own guardrails and exceptions. That balance avoids both central bottlenecks and unmanaged fragmentation.

Yes, this approach adds overhead. It requires asset lifecycle management, context governance, gate automation, and new skills in failure analysis. I accept that trade-off for the same reason I accept the cost of CI/CD and API governance: reliability is never free, and pretending otherwise is usually how technical debt is born.

I do not see Structured Prompt-Driven Development as another methodology to memorize. I see it as a correction toward architectural maturity. The meaningful shift, in my mind, is moving from “I can get good results with prompts” to “we can deliver reliable outcomes with a governed system.” Reading Fowler and his team’s article through that lens helped me clarify my own position: the long-term value is not in better prompts, but in better architecture around prompt-driven work. Once that shift happens, the priorities become obvious: design boundaries, define contracts, instrument workflows, and enforce quality. That is architecture work, and I believe it deserves the same seriousness as any other production-critical concern.