Dead Letter Queue Pattern with Broadway and RabbitMQ
Building resilient message processing systems requires handling failures gracefully. When Broadway processes messages through pipelines, transient failures happen. Network issues, temporary service outages, and data inconsistencies are inevitable in distributed systems.
The Dead Letter Queue (DLQ) pattern provides a robust solution for handling failed messages at the infrastructure level. By leveraging RabbitMQ's native capabilities, you can implement automatic retries, exponential backoff, and failure isolation without cluttering your application code.
Understanding the Pattern
Broadway excels at processing messages through concurrent pipelines but doesn't provide built-in retry mechanisms. The DLQ pattern fills this gap by implementing retry logic at the RabbitMQ infrastructure level using exchange and queue configurations.
The architecture separates concerns cleanly:
- Broadway handles message acknowledgment and processing
- RabbitMQ manages retry timing and routing
- Your Pipeline implements business logic and retry decisions
Queue Architecture
The system creates three specialized queues per processing step:
1. Worker Queue
The primary queue where Broadway consumers receive messages for processing. Normal message flow starts here. Successful processing completes the cycle, while failures trigger the retry mechanism.
2. Retry Queue
Failed messages land here with a TTL (Time To Live) configuration. RabbitMQ automatically moves expired messages back to the worker queue using Dead Letter Exchange (DLX) routing. This creates time-delayed retries without custom timers or scheduled jobs.
3. Dead Queue
Permanently failed messages accumulate here after exhausting retry attempts. This queue enables manual inspection, debugging, and alternative handling strategies without blocking the main processing pipeline.
How It Works
Queue Setup
When a producer starts, it declares the complete queue topology:
# Declare retry queue with dead-letter exchange configuration
AMQP.Queue.declare(channel, "my_queue.retry",
durable: true,
arguments: [
{"x-dead-letter-exchange", :longstr, "my_exchange.worker"},
{"x-dead-letter-routing-key", :longstr, "my_queue.worker"},
{"x-queue-version", 2}
]
)
The retry queue's x-dead-letter-exchange argument tells RabbitMQ where to route expired messages. When a message's TTL expires, RabbitMQ automatically publishes it back to the worker queue.
Normal Processing Flow
Messages flow through the happy path:
Message → Worker Queue → Broadway Pipeline → Success → ACK
Broadway acknowledges successful messages, and they're removed from the queue permanently.
Failure and Retry Flow
When processing fails, Broadway's handle_failed/2 callback routes messages to the retry queue:
Message → Worker Queue → Broadway Pipeline → Failure
↓
handle_failed/2 callback publishes to Retry Queue (with expiration)
↓
After TTL expires → RabbitMQ DLX routes back to Worker Queue
↓
Retry processing attempt
The retry queue acts as a holding area. RabbitMQ's TTL mechanism provides automatic time delays, creating exponential backoff without complex scheduling logic.
Permanent Failure
After exhausting retries, messages move to the dead queue. This prevents infinite retry loops and provides visibility into systemic issues requiring manual intervention.
Implementation Guide
Step 1: Configure Producer with Queue Declaration
The producer declares all three queues with proper configuration:
defmodule AnswerProcessing.MyStep.Producer do
use Broadway.Producer
def start_link(opts) do
connection = Keyword.fetch!(opts, :connection)
queue_name = "my_step"
# Declare exchanges
AMQP.Exchange.declare(channel, "#{queue_name}.worker", :direct, durable: true)
AMQP.Exchange.declare(channel, "#{queue_name}.failed", :direct, durable: true)
# Declare worker queue
AMQP.Queue.declare(channel, "#{queue_name}.worker", durable: true)
AMQP.Queue.bind(channel, "#{queue_name}.worker", "#{queue_name}.worker")
# Declare retry queue with DLX
AMQP.Queue.declare(channel, "#{queue_name}.retry",
durable: true,
arguments: [
{"x-dead-letter-exchange", :longstr, "#{queue_name}.worker"},
{"x-dead-letter-routing-key", :longstr, "#{queue_name}.worker"}
]
)
# Declare dead queue
AMQP.Queue.declare(channel, "#{queue_name}.dead", durable: true)
# ... producer implementation
end
def publish(queue_type, message, opts \\\\ []) do
case queue_type do
:worker -> publish_to_worker(message, opts)
:retry -> publish_to_retry(message, opts)
:dead -> publish_to_dead(message, opts)
end
end
end
Step 2: Implement Retry Logic with Exponential Backoff
The pipeline's handle_failed/2 callback implements retry strategy:
defmodule MyApp.Pipeline do
use Broadway
@max_retries 3
def handle_failed(messages, _context) do
Enum.map(messages, fn message ->
retry_count = get_retry_count(message)
if retry_count < @max_retries do
# Calculate exponential backoff: 30s, 60s, 120s
expiration = :timer.seconds(30 * :math.pow(2, retry_count))
# Publish to retry queue
message.data
|> increment_retry_count()
|> Jason.encode!()
|> MyStep.Producer.publish(:retry,
expiration: expiration,
priority: message.metadata.priority
)
else
# Max retries exhausted, send to dead queue
message.data
|> Jason.encode!()
|> MyStep.Producer.publish(:dead, priority: 0)
end
message
end)
end
defp get_retry_count(%{data: %{"retry_count" => count}}), do: count
defp get_retry_count(_), do: 0
defp increment_retry_count(%{"retry_count" => count} = data),
do: %{data | "retry_count" => count + 1}
defp increment_retry_count(data),
do: Map.put(data, "retry_count", 1)
end
This implementation provides:
- Exponential backoff - Each retry waits progressively longer (30s, 60s, 120s)
- Retry limits - Prevents infinite loops with max retry count
- Metadata tracking - Embeds retry count in message payload
- Priority preservation - Maintains message priority through retries
Step 3: Configure Complete Step
Wire everything together in your supervision tree:
defmodule AnswerProcessing.MyStep do
use AnswerProcessing.Step,
implementation: AnswerProcessing.MyStepImpl,
producer: [queue_name: "my_step"],
pipeline: [queue: "my_step.worker"]
end
# In supervisor
children = [
{AnswerProcessing.MyStep.Producer, connection: rabbitmq_connection},
{AnswerProcessing.MyStep.Pipeline,
connection: rabbitmq_connection,
next_step: AnswerProcessing.NextStep
}
]
Supervisor.start_link(children, strategy: :one_for_one)
Key Benefits
Automatic Retry Without Custom Timers
RabbitMQ's TTL + DLX combination provides time-delayed reprocessing automatically. No GenServer timers, no external schedulers, no manual tracking. The message broker handles everything.
Failure Isolation
Dead queue separates permanent failures from transient ones. Your monitoring system can alert on dead queue growth, while retry queue fluctuations are normal and expected.
No Message Loss
Every message exists in exactly one queue at all times. Worker, retry, or dead. Nothing falls through the cracks. Durable queues survive broker restarts.
Independent Queue Monitoring
Each queue provides distinct metrics. Worker queue depth indicates processing capacity. Retry queue shows transient failure rate. Dead queue reveals systemic problems requiring attention.
Stateless Pipeline Logic
Broadway callbacks remain stateless. No retry state tracking, no timer management, no complex error handling logic. The infrastructure handles retry mechanics.
Broadway's Role
Broadway focuses on what it does best:
- Message acknowledgment - Handles successful and failed message acknowledgment
- Failure isolation - Processes messages through stateless callbacks
- Immediate acknowledgment - Prevents RabbitMQ redelivery by acknowledging failures immediately
Broadway does NOT handle:
- Retry logic - Delegated to producer/pipeline routing decisions
- TTL management - Handled by RabbitMQ queue configuration
- DLQ routing - Implemented at infrastructure level
This separation of concerns keeps the architecture clean and maintainable.
Production Considerations
Monitoring and Alerting
Monitor queue depths independently:
- Worker queue - Alert on sustained high depth (processing bottleneck)
- Retry queue - Track for transient failure patterns
- Dead queue - Alert immediately on growth (systemic issues)
Message Expiration Tuning
Adjust exponential backoff parameters based on your failure characteristics. Network issues might need shorter delays, while rate-limited APIs benefit from longer waits.
Dead Queue Processing
Implement tooling to inspect and reprocess dead queue messages. Consider administrative endpoints for manual retry or batch reprocessing after fixing underlying issues.
Retry Count Persistence
Store retry count in message payload, not headers. This ensures retry tracking survives message transformations and provides debugging visibility.
Key Takeaways
- Infrastructure-level retries using RabbitMQ TTL and Dead Letter Exchange
- Three-queue architecture separating worker, retry, and dead messages
- Exponential backoff implemented through increasing TTL values
- Stateless Broadway pipelines with routing decisions in handle_failed/2
- No message loss with durable queues and explicit acknowledgment
- Independent monitoring for worker, retry, and dead queue health
- Clean separation between Broadway processing and retry infrastructure
The Dead Letter Queue pattern with Broadway and RabbitMQ provides production-grade message processing resilience. By leveraging infrastructure capabilities, you build robust systems without complex application logic.