Every month, sometimes twice a month, a CTO or VP of engineering calls me and the conversation goes the same way. The product is working. The customers are paying. The code is a mess. The team is unhappy. The leadership has concluded, reluctantly but firmly, that a rewrite is the only path forward. They want me to validate the plan, help them scope it, and maybe run the first phase.
By the end of the call I have almost always talked them out of it. Not because rewrites are always wrong (they sometimes succeed spectacularly), but because the reasons this particular team wants to rewrite are almost never the reasons a rewrite would actually solve. The rewrite is a symptom, not a solution. This is old advice, not a new opinion: Joel Spolsky called a from-scratch rewrite the single worst strategic mistake a software company can make back in 2000, and the arithmetic has not changed much. What teams actually need is something cheaper and faster and less risky, which they have mostly not considered because it is less exciting.
This essay is the set of four questions I ask on those calls. Not a treatise against rewrites. A diagnostic to figure out whether the rewrite is the actual answer or a prestigious distraction from the answer.
Why this conversation keeps happening
Before the questions, the pattern. Rewrites are proposed on calls that sound like this:
“The codebase is ten years old. The original team is gone. Every new feature takes three times longer than it should. Our senior engineers refuse to work on the legacy module. We tried to refactor last year and it did not take. We think we need to bite the bullet and rebuild on [new framework / new language / new architecture].”
Every sentence in that paragraph is true, in the sense that each one describes a real problem the CTO is seeing. The paragraph as a whole is also a trap, because it compresses a half-dozen different problems into a single conclusion (“rewrite”) without examining whether any of the individual problems require a rewrite to fix.
The CTO usually does not want a rewrite. They want the symptoms to go away. The rewrite is the narrative the team has converged on, because it is concrete and actionable and emotionally satisfying in a way that “continue the painful work of incremental improvement” is not. When I ask what specifically is broken, the symptoms are usually things that a strangler-fig migration, a targeted refactor, a hiring change, or a process change would fix at a tenth of the cost.
So I ask the four questions. And I almost always find that the symptoms are not, in fact, pointing at rewrite as the right intervention.
Question 1: What specifically is broken, in terms the board would understand?
I am not looking for engineering vocabulary. I am looking for the sentence the CTO would say to the CEO or the board if asked: “what exactly is the problem, and what will fixing it enable us to do that we cannot do now?”
The sentences that come back fall into rough categories. Each category has a different right answer, and the right answer is almost never a full rewrite.
“Feature velocity is too slow.” This is the most common answer. What it usually means: the parts of the codebase the team is currently working in are tangled, but the parts they are not touching are fine. The fix is to identify the specific modules blocking velocity and extract those. The rest of the system keeps doing its job. This is what the Strangler Fig pattern is for. A rewrite of the whole system spends 80% of its budget on code that was not the problem.
“We cannot hire engineers who want to work on this stack.” This is a hiring problem wearing an architecture costume. Sometimes the stack genuinely is an obstacle (PHP 5 in 2026; a bespoke in-house framework). Sometimes the stack is fine and the real issue is the onboarding docs, the tooling, the deployment story, or the codebase’s lack of tests. If your Symfony 5 codebase has PHPStan level 0 and no tests, the hiring problem is not the framework version. Upgrading (see Symfony Major Version Upgrades) and investing in the quality bar fixes more of this than a rewrite will.
“The architecture cannot scale.” Often said; rarely true. “Cannot scale” is almost always “the current specific bottleneck is hard to address.” The specific bottleneck might be a single hot table, a synchronous service call that should be async, a missing cache layer, an N+1 query hidden inside a listener. All of these are fixable without a rewrite. Rewriting because of a single bottleneck is like demolishing a house because one window leaks.
“We are on the wrong database.” Sometimes true, usually exaggerated. If you are on MongoDB and your access patterns are relational, a migration to PostgreSQL is expensive but finite; the rest of the application is unaffected. Rewriting the application to move the database is a category error.
“The original team is gone.” This is a real concern, but rewriting does not fix it. The new team will also eventually leave, and they will leave behind code that is newer but equally opaque to the team that inherits it. What fixes this is documentation, tests, and reducing the implicit knowledge any one person holds. A rewrite led by people who will leave in 18 months produces the same problem in 18 months plus one rewrite.
If the CTO cannot produce a single clear, business-readable sentence, the rewrite has not been scoped; it has been vibed. The rewrite proposal is not yet something that could be funded or evaluated, and no amount of engineering work will rescue it until someone writes the sentence.
Question 2: What have you actually tried, and what did you learn?
A team proposing a rewrite has usually tried two or three interventions that failed. The details of those failures are enormously informative.
“We tried to refactor module X and it did not take.” The interesting question is why. Was the scope too large? Did the refactor get abandoned mid-stream because a customer incident pulled the team away? Did the refactor ship but get silently reverted by later changes because nobody was enforcing the new patterns? Each of these has a different remedy, and none of them is a rewrite. A rewrite will encounter the same forces that killed the refactor, at ten times the scale.
“We tried to add tests and it was too painful.” Usually means the code is tightly coupled and untestable in its current shape. The fix is to introduce seams, then tests. A rewrite from scratch is not magically testable; a rewrite done by the same team that wrote the untestable code typically produces another untestable codebase. I have watched this happen.
“We tried to hire senior engineers and nobody stayed.” Usually means the onboarding experience is bad, or the team culture has problems, or the stack truly is an outlier. A rewrite does not improve any of these unless you also fix them explicitly, in which case you could have just fixed them explicitly.
“We have not tried anything serious, but we are sure a rewrite is the answer.” This is the most dangerous version. If nobody has attempted a serious, scoped, funded refactor in the current codebase, you cannot know yet whether refactoring is insufficient. The first thing I do in these cases is recommend a ninety-day targeted refactor of the single most painful module. If it goes well, you have evidence that refactoring works and a pattern to scale. If it goes poorly, you have evidence for the rewrite case, at a fraction of the cost of learning through the rewrite itself.
This question is the most important of the four. Rewrites that succeed almost always come after multiple honest attempts at smaller interventions. Rewrites that fail almost always come from teams that have not seriously tried anything else.
Question 3: What stays the same, and what changes?
If the rewrite happens, which parts of the business are actually different at the end? This question is usually uncomfortable, because the answer is frequently “not much.”
The scenarios I see:
The product is the same, the architecture is new. The rewrite reimplements the existing feature set on a new stack. End state: same product, same customers, same revenue, new code. The business did not change. Eighteen months of engineering time bought you cleaner internals. Sometimes that is worth it, usually because of a specific path the old code blocked. Often it is not. Ask: what can we do with the new system that we could not do with the old one, in concrete customer-facing terms? If the answer is fuzzy, the economics are probably wrong.
The product is the same, the database model is new. A rewrite motivated by data model cleanup. End state: the same features work, but the underlying tables are sensible now. Valuable internally; invisible externally. The question is whether the data model cleanup justifies a full rewrite when it could have been done as a standalone migration (see Zero-Downtime Doctrine Migrations). Usually no.
The product changes. The team is also rethinking what the product does, not just how it is built. This is the case where rewrite is most defensible, because the existing code was shaped by assumptions that no longer apply. Even here, the better path is often to freeze the existing product (maintenance only, minimal investment) and build the new product as a genuinely new system, not a rewrite of the old one. Rewriting a thing you are also going to change substantially is two risks in one project.
The team changes. Sometimes the “rewrite” is half about the code and half about clearing out implicit hierarchies. The engineer who wrote the original system and has veto power over it leaves during the rewrite planning. The rewrite is, partly, a way to reset team dynamics. I have some sympathy for this; rewrites can function as organizational resets. But they are extremely expensive organizational resets, and there are cheaper ones (new tech lead, team restructuring, explicit modernization mandate).
The healthiest rewrite conversations answer “the product changes in specific ways that the current system cannot accommodate” with a list of those specific ways. Everything else is rewrite as aesthetic preference, and aesthetic preference rarely justifies the cost.
Question 4: What is the actual cost, honestly priced?
This is where the math gets uncomfortable. I ask the team to price three things:
Elapsed time to feature parity. Not “first deploy” or “MVP of the new system.” Actual feature parity with the current production system. Most teams underestimate this by 2-3x, because the current system contains three years of undocumented edge cases, hotfixes, and behaviors-that-are-actually-features-customers-rely-on. The elapsed time to match all of those is longer than the original build, not shorter.
Opportunity cost. Whatever your team is doing during the rewrite, they are not doing something else. If the team would otherwise ship six new features a year, the rewrite costs six features for each year it runs. At eighteen months, that is nine features you did not ship. What does your competitive landscape look like after nine skipped features?
Risk cost. Rewrites have a nontrivial probability of never finishing. Industry data on large software projects is unflattering: the long-running Standish CHAOS research consistently finds that a significant minority of projects are cancelled outright, and a majority ship late, over budget, or with reduced scope. A full rewrite is a big project by definition, and it inherits those odds. Even successful rewrites often ship with reduced functionality relative to the original, with “we’ll add X back later” features that never actually get added. Price this as: “what is the expected value of the rewrite, weighted by probability of success?” The number is usually smaller than leadership is imagining.
When I push on these costs, the conversation changes. A CTO who was thinking of the rewrite as “six months, some grumbling, cleaner code” usually leaves the call thinking of it as “eighteen months, delayed roadmap, 30% chance of failure, and we still have to migrate data and customers off the old system either way.” Same project, different framing. The framing is what enables a clear decision.
When the answer genuinely is rewrite
I said I almost always talk them out of it. Sometimes I do not, and it is worth naming those cases.
The language is dead and unhireable. If you are on PHP 5.4 in 2026, not because of neglect but because the codebase uses libraries that were never ported forward, and the community around the language has moved on, you are in a corner that refactoring cannot reach. Migrating to a currently-supported language stack is a rewrite by any other name. Do it, but phase it (strangler fig across versions, not big bang).
The data model is fundamentally wrong for the product the business has become. If the business was built around “users” and is now built around “accounts that contain teams that contain users,” and the existing tables do not support that at all, sometimes the unwinding is so expansive that a parallel build is cheaper than an in-place remodel. Rare, but real.
A critical regulatory requirement cannot be met by the existing architecture. If you are taking on healthcare or financial regulation and the current system mixes PII with operational data in ways that cannot be cleanly separated, you might need a rewrite with the separation baked in. This is rare enough that I would want to see a written regulatory opinion before accepting it as the reason.
The original stack is a known dead-end bet. Teams that built on frameworks that subsequently lost their community (or on proprietary systems whose vendor went bankrupt) are in a different situation than teams maintaining a working Symfony or Rails codebase. Walking away from a dead ecosystem is sometimes the right call.
Notice: none of these reasons is “the code is ugly” or “feature velocity is slow.” Those problems have cheaper fixes. The reasons above describe categorical mismatches between the existing system and the business’s future, not local problems of code quality.
What I recommend instead, almost every time
When the four questions push the conversation away from rewrite, the recommendation I land on is usually some version of:
- Stabilize. Invest one to two months in quality hygiene. PHPStan level up, deprecation cleanup, test coverage on the hot paths, a deployment pipeline that actually ships safely. The team’s pain drops measurably from this alone.
- Map. Produce a technical debt map (see Technical Debt Is a Map, Not a Backlog). Identify which regions of the system are the ones blocking velocity, which can be left alone, and which are actually fine once you look at them honestly.
- Extract one thing. Pick the single module causing the most pain. Extract it using the Strangler Fig pattern, along the lines described in Fowler and Lewis’s Patterns of Legacy Displacement. Ship it. Delete the old code. Measure the impact.
- Reevaluate. After step three, the team has evidence. Either the pain is measurably reduced and the program continues, or it is not and the rewrite case is stronger than it was.
This plan takes three to six months, costs a fraction of a rewrite, and is reversible at every step. If it works, you have bought years of productive evolution out of the existing system. If it does not work, you have learned exactly why, and the rewrite case is now built on evidence rather than intuition.
Most teams that do this plan never need the rewrite. A few still do, and they start it with a clearer understanding of what they are actually fixing and why.
The conversation I want to be having
The CTOs I talk to are, almost without exception, thoughtful people trying to do right by their teams and their businesses. They are not proposing rewrites because they are naive. They are proposing rewrites because the rewrite is the most visible option available in the conversation they are having with their leadership.
My job on those calls is to widen the conversation. Make the smaller, cheaper, less heroic options visible. Show the evidence that they work. Give the CTO a different sentence to bring back to the board: “instead of an eighteen-month rewrite, we’re going to spend three months modernizing, six months extracting the two modules that are actually slow, and reassess in nine.” That sentence is easier to fund, easier to defend, and has a dramatically higher probability of working than the rewrite would have.
The goal is not to prevent all rewrites. It is to prevent the ones that were symptoms rather than cures. The rewrite that survives all four questions with honest answers probably is the right call. It is just a much smaller set than the set of rewrites that get proposed.
If you are weighing a rewrite and not sure whether the case holds up under pressure, my monolith modernisation engagement starts with exactly this conversation. Four questions, honest answers, a written decision either way, and (if the answer is not the rewrite) a concrete plan for the cheaper alternative that actually addresses the symptoms.
References
- Things You Should Never Do, Part I by Joel Spolsky : the canonical argument against from-scratch rewrites, using Netscape 6 as the cautionary tale.
- StranglerFigApplication by Martin Fowler : the original write-up of the pattern for gradually replacing legacy systems without a big-bang rewrite.
- Patterns of Legacy Displacement by Martin Fowler and Ian Cartwright, Rob Horn, and James Lewis : a longer treatment of how to actually displace a legacy system in increments, including event interception and legacy mimic.
- LegacySeam by Martin Fowler : short essay on seams, building on Michael Feathers’ Working Effectively with Legacy Code, which is the standard reference for getting untested code under test.
- Standish CHAOS 2015 Q&A on InfoQ : summary of the long-running Standish research on software project outcomes, including the effect of project size on success rates.