Postgres Advisory Locks: What Guarantee Are You Actually Buying?

We needed exactly one active processor per shard in a streaming pipeline, with automatic takeover during deploys and crashes.

We did not reach for etcd, ZooKeeper, or a queue coordinator.

We reached for pg_try_advisory_lock.

That worked well for our deployment model. It also would have failed in subtle ways if that model looked even slightly different: a transaction-mode pooler in the connection path, a multi-region primary failover, or a shared database helper abstraction would each have broken it in a different way.

The Setup

Each worker opened a dedicated Postgres connection at startup and raced to acquire a session-scoped advisory lock for one of N shard slots.

The lock was attached to the connection, not the transaction.

If the worker died:

TCP died
Postgres noticed and released the lock
another worker acquired the slot on its next poll

No lease timer to tune. No external coordinator. No additional infrastructure to operate.

That is the appeal: a coordination primitive you already have, with recovery semantics that align surprisingly well with many operational workloads.

The Guarantee, In One Sentence

You are buying mutual exclusion under normal operation, automatic recovery under process failure, and no guarantee at all during database failover or behind a transaction-mode pooler.

Everything below is the unpacking of that sentence.

What You Are Buying

mutual exclusion against normal concurrency: deploys, restarts, races, accidental double-rollouts
automatic release on process death, mediated by TCP teardown and Postgres session cleanup
no additional operational dependency
the ability to mutate "who owns the work" and "what the work is" in a single transaction, because coordination and application state live in the same store

Where Advisory Locks Fit

Advisory locks fit when:

the database is already your operational dependency boundary, so you are not adding a new failure surface
short windows of zero ownership during reconnect are tolerable
writes are idempotent or epoch-fenced, so brief windows of double ownership do not corrupt state

Idempotence is not a fit criterion. It is the load-bearing assumption that lets the rest of the model survive its bad days. If you cannot make the work idempotent or fence it with an epoch, the failure modes below stop being tolerable and start being incidents.

Things get dangerous when teams unconsciously promote advisory locks from coordination primitive to distributed consensus mechanism. Advisory locks do not know what consensus is.

Failure Modes

Database Failover

Durable ownership across failover.

When the primary flips, every advisory lock in the cluster disappears. The promoted replica does not inherit session state.

For a brief window:

nobody owns anything
every worker reconnects simultaneously
a thundering herd attempts reacquisition

If your side effects are not idempotent, that is where integrity problems start.

Transaction-Mode Poolers

There is another trap that is worse because it fails silently: transaction-mode poolers.

Session advisory locks live exactly as long as the backend connection that acquired them.

PgBouncer in transaction mode, and several managed proxy defaults including RDS Proxy, recycle the backend after each statement.

So this can happen:

pg_try_advisory_lock returns true
the backend is returned to the pool
the lock is now attached to a backend the application no longer logically holds
another worker successfully acquires the same lock from a different backend

No error. No warning. Just double ownership.

This is the easiest way for a future infrastructure change to quietly invalidate your coordination guarantees. Someone three months from now adding a pooler to reduce database load can break this with no test failures, no log line, and no obvious symptom until two workers begin writing the same data.

Defend against it in code, not in convention. Use a connection type that cannot be backed by a pool: a raw single connection, not a pooled handle. Assert at startup that you actually hold the lock against your own backend PID. The next person to refactor your database helpers will not read your comments. They will read your types.

Missing Fencing

Advisory locks do not provide fencing.

The lock says you have the right to act. It does not prove the previous holder has stopped acting.

A wedged-but-alive process can still hold the lock. A long GC pause or partial network failure can leave multiple workers believing they are authoritative.

Fencing is what makes that safe. The recipe below - epoch row, stamped writes, row-level rejection of stale epochs - is fencing. The advisory lock decides who gets to start. The fencing token decides whose writes get to land.

The Thesis

Advisory locks are excellent for preventing accidental concurrency.

They are not a substitute for:

fencing tokens
idempotent writes
epoch-based ownership
explicit recovery semantics

Treat database failover as a leadership reset, not a transparent pause.

The Recipe

use pg_try_advisory_lock, never blocking pg_advisory_lock
hold the lock on a dedicated connection that bypasses your pooler, and enforce this with the type system instead of a code comment
on acquisition, write a durable epoch row
stamp side effects with the epoch
reject stale writes at the row level
on any reconnect or error, assume leadership is lost
reacquire, bump the epoch, then resume processing
verify ownership explicitly at startup
- assert the lock exists in pg_locks
- assert it belongs to your own pg_backend_pid()

The first point matters more than it looks. Blocked workers can appear healthy while making no progress. Contention becomes a silent outage instead of a visible failure.

The dedicated-connection rule matters because conventions decay. The next person to refactor your database access will read the types, not the comments.

The startup checks matter because if a pooler silently moved you to another backend, you want to crash, not continue under invalid assumptions.

The Spectrum

Coordination systems are a spectrum.

Advisory locks sit on the cheap end:

low operational overhead
transactional integration with the data they coordinate
automatic cleanup on disconnect

The next rung up is epoch-plus-fencing leadership: a small table, some discipline, no new infrastructure. You buy correctness across failover at the price of slightly more code.

Above that is a real consensus system: Raft, Paxos, etcd, ZooKeeper. You operate a separate cluster, tune it for partition tolerance, and monitor it on its own terms. You buy durable leadership across primary failure at the price of a second on-call surface.

Each rung costs more operationally. Each rung buys stronger guarantees.

Advisory locks are often the right answer.

But only if you understand exactly what guarantee you are actually buying.