The most common testing question I hear from new clients is some variant of: “We just took over this Symfony application. There are no tests. Where do we start?”
The answer that gets repeated in conference talks (write a test for every bug, refactor for testability, aim for 80% coverage) is the right answer for a small, well-loved codebase with a sympathetic team. It is the wrong answer for the codebase most teams actually inherit: 200,000 lines of Symfony 4 mixed with bespoke “framework” code from a 2017 freelancer, no test suite, no documentation, and a backlog of features the business expects to ship next quarter.
For that codebase, the playbook is different. You are not adding tests to a tested codebase. You are adding tests to a codebase that has been running on production traffic alone, and you have to do it without slowing the team to a halt. This essay is the playbook I use, in the order I use it.
Phase 0: do not write tests yet
The first instinct of an engineer dropped into an untested codebase is to write tests. Suppress it for a week.
The reason: tests written before you understand the system get aimed at the wrong things. You will end up testing the parts of the code you can read most easily, which is a strong negative correlation with the parts that are most likely to break. Tests on a UserFormatter utility do not protect you from the regression in the OrderCancelledHandler that was written by someone who has since left the company.
What to do in that first week instead:
- Read the entrypoints. Every controller, every console command, every message handler. Skim. Note what they do, not how. The goal is a mental map of “the application has 47 endpoints, broken roughly into these eight clusters.”
- Find the production logs. What endpoints get the most traffic? Which ones throw the most errors? What console commands run on cron, and how often? You are looking for the heat map: the parts of the code that handle the most volume or fail the most.
- Find the customer-impacting failures. Talk to support. The set of “things that broke recently and customers complained about” is the most accurate prioritisation signal you will get for free.
By the end of that week you should be able to write three lists on a whiteboard: the ten endpoints with the most traffic, the ten that fail most often, and the five console commands that move the most money. The intersection of those three lists is where the test budget goes first.
Phase 1: characterisation tests, not unit tests
The first tests on a legacy codebase are not unit tests. They are characterisation tests: tests that pin down what the system currently does, whether it is correct or not.
The shape:
namespace App\Tests\Functional\Controller;
use App\Tests\Functional\FunctionalTestCase;
use PHPUnit\Framework\Attributes\Test;
final class OrderCheckoutControllerTest extends FunctionalTestCase
{
#[Test]
public function checkoutWithKnownInputProducesKnownResponse(): void
{
$this->browser()
->post('/checkout', [
'json' => [
'cartId' => '01J3Z9X0ABCDE',
'paymentMethod' => 'card',
'amount' => 4999,
],
])
->assertStatus(200)
->assertJsonMatches('orderId', '01J3Z9X0ABCDF')
->assertJsonMatches('status', 'pending');
}
}
This test does not assert that the output is correct. It asserts that the output is what it is right now. If the system has a quiet bug where amounts are stored in cents but reported in euros, the characterisation test pins the bug in place. That is the point. You can fix the bug later. First you need a tripwire that tells you when you accidentally change behaviour, and characterisation tests are the cheapest tripwire that exists.
The rules:
Cover from the outside in. Use Zenstruck Browser or WebTestCase to drive the application via HTTP, exactly the way customers use it. This is the level at which the behaviour you care about is observable. Do not start with unit tests of internal services; you do not yet know which behaviours of those services are load-bearing.
Use real dependencies, not mocks. Mocks freeze your assumptions about what an internal collaborator returns. In a legacy codebase you do not yet have correct assumptions. Hit the real database, the real templating engine, the real cache, with dama/doctrine-test-bundle providing transaction isolation. Slower, but it tests the thing.
Capture the full response shape. Status code, headers, JSON shape, redirect location. The bug that someone introduces six months from now might change a header you did not bother asserting on, and the test should catch it.
The test suite that comes out of this phase is unattractive. It is slow, it duplicates database fixtures, the assertions are precise to the point of being brittle. That is appropriate. The suite is not optimised for elegance, it is optimised for catching regressions in code nobody on the current team wrote.
Phase 2: a seam map, not a refactor
Once the top-traffic endpoints have characterisation tests, the next instinct is to refactor. Suppress that one for another month.
Instead, draw a seam map. A “seam” in Working Effectively With Legacy Code terms is a place in the code where you can change behaviour without editing the source: dependency injection points, event subscribers, factory functions. The legacy codebase is full of seams. You just have to find them.
The map I produce, on a whiteboard or in a markdown file:
| Module | Inputs (HTTP, CLI, message) | External dependencies | Test seams available |
|---|---|---|---|
| Checkout | POST /checkout | Stripe, internal pricing service | Stripe via injected client interface, pricing service via its repository |
| Subscription billing | cron app:billing:run |
Stripe, Mailer | Stripe injected, Mailer transports overridable in test env |
| Order export | message ExportOrders |
S3, Sendgrid | S3 client injectable, Sendgrid via Symfony Mailer |
The map answers a question that is otherwise difficult to answer for legacy code: “where can I add tests without rewriting the surrounding code?”
Modules with good seams (Stripe behind an interface, mailer abstracted, S3 injected as a service) get unit and integration tests. Modules without seams (a 600-line controller that calls Stripe directly via a static method, parses XML inline, and writes to the filesystem in three different places) stay on characterisation tests until somebody has time to introduce seams.
The mistake I see is teams trying to introduce seams everywhere at once: the “let’s just refactor it to be testable” project that takes six months and ships nothing. The version that works: introduce seams as a side effect of feature work. Every feature ticket on a hard-to-test module has a small bonus task: introduce one seam. After six months you have ten seams without ever running a refactor project.
Phase 3: write the unit test budget down
By now you have characterisation tests on the high-traffic endpoints and a seam map for the modules. The team is asking the right next question: “What do we unit test?”
The honest answer is: not everything. Unit tests have an opportunity cost. Time you spend unit-testing a feature is time you do not spend testing another feature, refactoring, or shipping.
A budget that works:
- Domain logic, value objects, calculations. Always unit test. These are the easiest to test, the most stable, and the highest density of correctness-critical code per line. A
Moneyclass, aTaxCalculator, aShortIdGeneratorshould have unit tests. A test forMoney::add()will outlive three rewrites of the surrounding application. - State machines, validators, parsers. Unit test. Anything where the input space is large and the rules are tricky deserves the precise feedback that unit tests give.
- Controllers, message handlers, console commands. Functional tests, not unit tests. The interesting behaviour of a controller is its interaction with the framework: routing, security, validation, the response shape. Unit tests on a controller method assert almost nothing useful.
- Repositories. Integration tests with a real database. Unit-testing a repository against a Doctrine mock proves nothing.
- Glue code. Often does not need any tests at all. A class that wires three services together, with no logic of its own, is covered by the functional tests on the things it wires.
The shape of a healthy unit test:
namespace App\Tests\Unit\Domain\Values;
use App\Domain\Values\Money;
use App\Tests\Unit\UnitTestCase;
use PHPUnit\Framework\Attributes\Test;
final class MoneyTest extends UnitTestCase
{
#[Test]
public function addingTwoEuroAmountsProducesTheirSum(): void
{
$a = Money::eur(1500);
$b = Money::eur(750);
self::assertSame(2250, $a->add($b)->cents());
}
#[Test]
public function addingMismatchedCurrenciesThrows(): void
{
$eur = Money::eur(100);
$usd = Money::create(100, 'USD');
$this->expectException(\InvalidArgumentException::class);
$eur->add($usd);
}
}
Two tests, one happy path, one failure case, both fast, both immune to changes in the rest of the codebase. That is the unit test budget being well spent.
Phase 4: the failure-driven test pattern
A pattern that pays off forever: every production incident gets one regression test before the fix.
The flow:
- A production incident is reported.
- Engineer reproduces the bug locally.
- Engineer writes a test that reproduces the bug. The test fails.
- Engineer fixes the bug. The test passes.
- The fix and the test ship together.
This is not a new idea. It is the standard test-driven debugging loop. What is new in a legacy codebase is the discipline of doing it consistently. The temptation when a P1 is open is to fix the bug and ship; the test can come later. The test never comes later. The bug recurs in three months.
The version that works in practice: the team’s pull request template has a checkbox “this PR includes a regression test, or explains why it cannot.” The PR cannot merge without the box being ticked or explained. After six months, the legacy codebase has 80 to 120 tests that each correspond to a real bug that has happened. That is the highest signal-to-noise test suite you will ever have.
What to leave alone
Some parts of the codebase are not worth investing in tests for. Be honest about which:
Code that is on the deletion roadmap. If a module is being replaced by a strangler fig in three months, do not test it. A characterisation test is enough to keep it stable until the replacement lands.
Auto-generated code. Doctrine entities with no business logic, controllers that are pure routing-plus-template, code that the framework generates for you. The framework’s tests cover this; yours would be redundant.
Code that is unreachable. Every legacy codebase has dead code. Find it (with static analysis, for example PHPStan dead-code diagnostics, or just by reading the routing config) and delete it. Tests on dead code are pure overhead.
Code that needs a bug fix in the next sprint. If a module is going to change soon anyway, you do not need to characterise its current behaviour, you need to define its target behaviour. Skip the characterisation step and write the future-state test.
The instinct to test everything is what produces the test suites that nobody runs because they take 40 minutes and are flaky. Be selective.
A 90-day plan
If you have just inherited a Symfony codebase with no tests, here is what the first 90 days look like in practice:
- Week 1. Do not write tests. Read the code. Find the heat map. Identify the top 10 endpoints by traffic, the top 10 by failure rate, and the 5 cron commands that move the most money.
- Weeks 2 through 4. Characterisation tests on the top 10 endpoints, using
dama/doctrine-test-bundlefor isolation and the real dependencies for everything else. Do not fix bugs you find; just pin them. - Weeks 5 through 8. Seam map. For each module on the heat map, list the test seams and the modules without seams. Refactor only as a side effect of feature work, never as a project.
- Weeks 9 through 12. Unit tests on the domain layer (value objects, calculations, state machines). Adopt the failure-driven test pattern: every production bug gets a regression test.
By day 90, the team has a test suite that protects the production-critical paths, a map of where it is safe to add deeper tests, and a habit of writing tests in response to real bugs rather than theoretical ones. The codebase still has untested corners. That is fine. The point was never 100% coverage. The point was a test suite that catches the regressions that matter, fast enough that the team is willing to run it.
A test suite is not a finish line. It is a tool, and a particularly expensive one. Spend the budget on the parts of the codebase where the cost of being wrong is highest, and accept that the rest of the codebase will be tested by the same thing that has been testing it for the last three years: production traffic.
If you are inheriting a Symfony application and want help building a test strategy that does not stall the rest of the roadmap, our technical debt engagement includes a one-week test-strategy sprint that produces a heat map, a seam map, and a costed 90-day plan tailored to your codebase.
References
- Working Effectively With Legacy Code by Michael Feathers : the original definition of seams and the canonical playbook for testing untested code.
- Symfony testing documentation : the framework’s reference for
WebTestCase, the Symfony Browser, and integration test patterns. - Zenstruck Browser : a fluent functional-test client that wraps Symfony’s BrowserKit with assertions worth using.
dama/doctrine-test-bundle: transaction-based test isolation for Doctrine, essential for fast functional tests against a real database.- Zenstruck Foundry : the fixture library used in the test base classes referenced throughout this essay.
- PHPStan dead-code diagnostics : one static-analysis signal for unreachable code when cleaning legacy modules.