Symfony Messenger in Production: What Actually Breaks at Scale

What breaks when Symfony Messenger meets production traffic, and the patterns that keep retries, dead letters, and worker memory under control.

Glowing purple and blue message cubes flowing along three converging tracks, with a cluster of cubes piling up at the merge point representing transports, retries, and a dead-letter queue.

Symfony Messenger is one of those components that demos beautifully and then quietly accumulates production scars over the next eighteen months. The first message gets dispatched in a tutorial, a worker is started with messenger:consume, and everyone goes home happy. Two years later the same team has 14 transports, three retry strategies, a failed_high queue nobody opens, and a Slack channel where someone asks every Tuesday why the worker is OOM again.

This essay is the punch list of everything I tell teams to fix before Messenger becomes their production incident generator. It assumes you already have it running. It is about the second mile, not the first.

1. Choose your transport on durability, not familiarity

The default Doctrine transport is fine for development and the smallest production workloads. It stops being fine the moment you have more than one worker, more than a handful of messages per second, or a database that already has work to do.

The honest comparison:

  • Doctrine transport. Pros: zero new infrastructure. Cons: consume cycles use database locking (trying FOR UPDATE SKIP LOCKED where supported, with fallbacks otherwise), which can scale surprisingly well but adds load to the same database that serves your application. Acceptable up to a few hundred messages per second on a healthy Postgres. Painful past that.
  • AMQP (RabbitMQ). Pros: purpose-built, fan-out exchanges, mature operations tooling. Cons: you now own a broker. Backpressure is not free. Dead-letter exchanges are powerful but easy to misconfigure.
  • Redis Streams. Pros: low latency, plays well with an existing Redis cache. Cons: persistence semantics depend on your Redis configuration, and “I lost a message” is a debugging nightmare when nobody is sure whether appendfsync was set.
  • Amazon SQS / Google Pub-Sub. Pros: the broker is somebody else’s problem. Cons: at-least-once delivery is a hard constraint, visibility timeouts are a footgun, and the Symfony adapters lag behind features the platform exposes.

The pattern I see most often is a team picking the transport they already have running for a different reason. That is fine for the first 90% of cases. The other 10% is where you needed durability guarantees the chosen transport does not give you, and discovered it during an outage.

Decide on durability first. If a lost message is acceptable, anything works. If a lost message is a customer-facing problem, write down what your transport guarantees, and verify it survives the failure modes you actually care about.

2. Mark messages with marker interfaces, not transport names

A pattern that pays off every time:

PHP
namespace App\Messenger;

interface SyncMessage
{
}

interface AsyncMessage
{
}
YAML
framework:
    messenger:
        transports:
            async:
                dsn: '%env(MESSENGER_TRANSPORT_DSN)%'
            sync: 'sync://'
        routing:
            'App\Messenger\AsyncMessage': async
            'App\Messenger\SyncMessage': sync

Now every message is one interface implementation away from being routed correctly, and the routing config never grows past those two lines. The alternative is the routing block growing to a wall of message class names, and a slow drift where some messages are misrouted because someone forgot to update the config when they added a new class.

If you need three transports (high priority, normal, batch), add three interfaces. The point is that the routing decision is a property of the message, declared next to it, not a config file someone forgets to update.

3. Retries are a strategy, not a default

The default retry strategy in Symfony is three attempts with exponential backoff. It is a sane starting point for exactly one shape of failure: a transient downstream that recovers in seconds. Most production failures are not that shape.

The shapes I see:

  • Transient (network blips, dependency restarts). Retry helps. Three attempts at 1s, 5s, 30s usually catches it.
  • Persistent (downstream is broken, the message will never succeed). Retry hurts. You burn worker capacity on a message that is never going to land.
  • Poison (the message itself is malformed, the handler will throw forever). Retry actively makes the system worse: you reproduce the same bug 3 times instead of once, and you delay every other message in the queue by the retry budget.

The fix is per-message-type retry configuration:

YAML
framework:
    messenger:
        transports:
            async:
                dsn: '%env(MESSENGER_TRANSPORT_DSN)%'
                retry_strategy:
                    max_retries: 3
                    delay: 1000
                    multiplier: 5
                    max_delay: 60000

            async_no_retry:
                dsn: '%env(MESSENGER_TRANSPORT_DSN)%'
                retry_strategy:
                    max_retries: 0

Then route messages where retry never helps (for example, “send this exact webhook now or never”) to async_no_retry. The default is no longer the only option.

For really nuanced cases, implement RetryStrategyInterface and decide retry per exception type:

PHP
namespace App\Messenger;

use App\Repository\Exception\NotFoundException;
use Symfony\Component\Messenger\Envelope;
use Symfony\Component\Messenger\Retry\MultiplierRetryStrategy;
use Symfony\Component\Messenger\Retry\RetryStrategyInterface;

final readonly class SmartRetryStrategy implements RetryStrategyInterface
{
    public function __construct(
        private MultiplierRetryStrategy $fallback,
    ) {
    }

    public function isRetryable(Envelope $message, ?\Throwable $throwable = null): bool
    {
        if ($throwable instanceof NotFoundException) {
            return false;
        }

        return $this->fallback->isRetryable($message, $throwable);
    }

    public function getWaitingTime(Envelope $message, ?\Throwable $throwable = null): int
    {
        return $this->fallback->getWaitingTime($message, $throwable);
    }
}

A NotFoundException from a repository is almost always a poison message: the entity does not exist, retrying will not change that. Bail out, log, move on.

4. The dead letter queue is not optional

Every async transport needs a failure transport configured, and somebody needs to look at it.

YAML
framework:
    messenger:
        failure_transport: failed

        transports:
            async:
                dsn: '%env(MESSENGER_TRANSPORT_DSN)%'

            failed:
                dsn: 'doctrine://default?queue_name=failed'

That much is in the docs. Here is what is not in the docs:

  • Set up an alert on the failure queue depth. Anything above a threshold (10? 50? depends on volume) pages someone. A failure queue with 4,000 messages in it is not a queue, it is a tomb.
  • Automate the failed message review. A weekly cron that runs messenger:failed:show and posts the count plus the top exception types to a chat channel keeps the queue from being a place where messages go to be forgotten.
  • Be deliberate about retry from the dead letter queue. Running messenger:failed:retry with no filter is a recipe for retrying an old poison message that will fail in exactly the same way and re-land in the dead letter queue. Filter by exception type or message class.

A good test of an organisation’s Messenger maturity: ask how many messages are in the failure transport right now, and how many have been there for more than a week. If nobody knows, you have a problem you are not aware of.

5. Workers are processes with budgets

A long-running PHP worker is doing something the language was not really designed to do. PHP’s request lifecycle assumes a fresh process per request. Doctrine’s identity map, hydrators, and event managers all expect a short-lived process. Long-lived workers accumulate memory, stale entity references, and cached metadata in ways that bite at hour 6 of a 24-hour run.

The defence is hard limits, set on every worker:

Bash
php bin/console messenger:consume async \
    --memory-limit=128M \
    --time-limit=3600 \
    --limit=1000

What this does: kills the worker after 128 MB resident, after 1 hour of wall time, or after 1,000 messages, whichever comes first. The supervisor (systemd, supervisor, Kubernetes) restarts it. The fresh process has a fresh Doctrine identity map and a clean memory profile.

Why all three: memory limits catch leaks, time limits catch slow leaks that do not trip the memory ceiling, message count limits catch problems that scale with messages rather than time.

Then in the handler itself, two patterns matter:

PHP
namespace App\Messenger;

use Doctrine\ORM\EntityManagerInterface;
use Symfony\Component\Messenger\Attribute\AsMessageHandler;

#[AsMessageHandler]
final readonly class ProcessReportHandler
{
    public function __construct(
        private EntityManagerInterface $entityManager,
        private ReportRepositoryInterface $reports,
    ) {
    }

    public function __invoke(ProcessReport $message): void
    {
        $report = $this->reports->get($message->reportId);

        // ... processing ...

        $this->entityManager->flush();
        $this->entityManager->clear();
    }
}

clear() after flush() releases entity references the identity map is holding. Without it, every message processed grows the identity map until the worker dies of memory exhaustion. With it, the worker can run for thousands of messages.

6. Idempotency is your problem, not Messenger’s

Almost every transport gives you at-least-once delivery. That means the same message can be handled twice. If your handler is not idempotent, eventually the duplicate will hurt.

The patterns that work:

Idempotency keys. Embed a unique key in the message and check it on handle:

PHP
namespace App\Messenger\Notification;

use App\Messenger\AsyncMessage;
use Symfony\Component\Uid\Ulid;

final readonly class SendInvoiceEmail implements AsyncMessage
{
    public function __construct(
        public string $invoiceId,
        public Ulid $idempotencyKey,
    ) {
    }
}
PHP
namespace App\Messenger\Notification;

use App\Messenger\Notification\Repository\ProcessedMessageRepositoryInterface;
use Symfony\Component\Messenger\Attribute\AsMessageHandler;

#[AsMessageHandler]
final readonly class SendInvoiceEmailHandler
{
    public function __construct(
        private ProcessedMessageRepositoryInterface $processed,
        private InvoiceMailer $mailer,
    ) {
    }

    public function __invoke(SendInvoiceEmail $message): void
    {
        if ($this->processed->exists($message->idempotencyKey)) {
            return;
        }

        $this->mailer->send($message->invoiceId);
        $this->processed->record($message->idempotencyKey);
    }
}

A unique constraint on idempotency_key in the database does the same job with one fewer round trip. The point is the same: state, not hope.

Idempotent operations. Better than tracking keys is making the operation itself idempotent. UPDATE invoice SET sent_at = NOW() WHERE id = ? AND sent_at IS NULL does not care if it is called twice. The second call updates zero rows.

The pattern I push hardest: idempotency at the boundary. The handler is allowed to be called twice. Whatever the handler does to the outside world (send an email, charge a card, write to a third-party API) needs a deterministic input or a check that prevents duplicates.

7. Observability beats hope

The fastest way to learn that your Messenger setup is broken is to find out from a customer. The second-fastest is to find out from a deploy that suddenly fails. The way you actually want to find out is from a dashboard that has been screaming for two hours.

Minimum viable observability:

  • Queue depth per transport. A dashboard graph. Alerts when it stays above a threshold for longer than expected processing time. RabbitMQ exposes this natively, Doctrine transport needs a SELECT count(*) cron, SQS exposes ApproximateNumberOfMessages.
  • Handler duration distribution. A histogram per message class. A handler that took 200ms a week ago and takes 4s today is news, even if it is still succeeding.
  • Failure rate per message class. Total failures over total handles, per type. A 0.1% failure rate is normal, 5% is an incident.
  • Dead letter queue size. Already covered above. It bears repeating because people set up the queue, never look, and get burned.

You do not need a full APM stack to do this. A few Symfony Messenger middleware implementations and a Prometheus exporter cover most of the value:

PHP
namespace App\Messenger\Middleware;

use Symfony\Component\Messenger\Envelope;
use Symfony\Component\Messenger\Middleware\MiddlewareInterface;
use Symfony\Component\Messenger\Middleware\StackInterface;
use Symfony\Component\Messenger\Stamp\HandledStamp;

final readonly class TimingMiddleware implements MiddlewareInterface
{
    public function __construct(
        private MetricsCollectorInterface $metrics,
    ) {
    }

    public function handle(Envelope $envelope, StackInterface $stack): Envelope
    {
        $start = \microtime(true);
        $messageClass = $envelope->getMessage()::class;

        try {
            $envelope = $stack->next()->handle($envelope, $stack);

            $this->metrics->recordSuccess($messageClass, \microtime(true) - $start);

            return $envelope;
        } catch (\Throwable $exception) {
            $this->metrics->recordFailure($messageClass, $exception::class);

            throw $exception;
        }
    }
}

8. Schema changes break replays

Once you have a failure queue, you have messages from a week ago sitting in the database, encoded in whatever Symfony serializer was active when they were dispatched. When you change a message class (rename a field, change a type, add a non-nullable constructor argument), retrying those old messages either explodes loudly or, worse, succeeds with subtly wrong data.

The defences:

  • Treat message classes as wire formats. Once a message has shipped to production, treat it like an external API. Add fields, do not remove them. Make new fields nullable. Do not rename. Do not retype.
  • Version messages explicitly when you must change shape. SendInvoiceEmail becomes SendInvoiceEmailV2. Both handlers exist for a release window. Old messages drain via the V1 handler. Once the queue is empty, V1 is removed.
  • Drain before deploying breaking changes. Stop dispatching, let workers consume, deploy. This is operational work, not a code pattern, but it is the simplest answer when the message volume is low enough.

Versioning is what teams skip first and regret first. Three messages in production after a rename, an hour-long debug session later, the team adds versioning to the next ten messages.

9. Do not make the bus a god object

The pattern I push back on hardest: “every operation goes through the bus, including the ones that are not async.”

The reasoning sounds appealing. One pattern. Easy to read. Future-proof.

In practice it produces:

  • Synchronous controllers that go through five layers of envelope wrapping for a 2ms operation.
  • Stack traces that start in a controller, go through the messenger middleware, end in a handler, and have to be read backwards.
  • A bus that is doing the job of three different things (request handling, command handling, event publishing) and confuses everyone about which is which.

The version that works: async work goes through the bus, sync work stays as services. A controller calls a service. A service does the synchronous work. If part of that work needs to be async, the service dispatches a message. The bus is for crossing the synchronous/asynchronous boundary, not for everything.

10. The first 90 days checklist

If you are six months into Messenger and starting to feel the weight, this is the punch list I run with teams:

  • Every transport has a failure_transport configured.
  • Failure queue depth is graphed and alerted.
  • Workers run with --memory-limit, --time-limit, and --limit.
  • Handlers call EntityManagerInterface::clear() after flush().
  • Per-message retry strategies exist for the message types where the default is wrong.
  • Marker interfaces (SyncMessage, AsyncMessage) drive routing, not a wall of class names in YAML.
  • Idempotency is documented per message class, with a written answer to “what if this runs twice?”.
  • At least one middleware records handler duration and failure rate per message class.
  • Message classes have a documented versioning policy.
  • The team agrees that the bus is for crossing async, not for every controller.

Most of those are an afternoon of work each. All of them together change the failure mode from “find out from a customer” to “find out from a graph”, which is the only progression that actually matters.


If your Messenger setup is starting to feel like a place where messages disappear, our scaling engagement includes a Messenger production review that goes through this checklist against your transports, alerts, and handler code, and produces a prioritised list of fixes you can ship in two weeks.

References

Ready to Fix Your Architecture?

Book a free 30-minute call with Silas. No sales pitch, just a direct conversation about your challenges.

Typically responds within 24 hours.

Book a Free Call