PostgreSQL MV Best Practices
Materialized views (MVs) are fantastic for speeding up heavy joins/aggregations, but they add operational responsibilities. This guide distills practical, production-oriented advice tuned to how this project manages MVs (definitions → services → jobs → run tracking).
1) Model the definition well
- Simple identifier names. Keep
MatViews::MatViewDefinition.name
free of dots — e.g.,mv_user_activity
. Schema is resolved fromsearch_path
and safely quoted by services. - Explicit columns. Avoid
SELECT *
. List the output columns; it stabilizes consumers and indexes. - Deterministic SQL. Prefer immutable/stable functions. Avoid volatile ones (
random()
,clock_timestamp()
), which break refresh determinism and diffability. - Shape for uniqueness. If you want concurrent refresh, design a natural unique key in the result (e.g.,
user_id
or a composite). You’ll create a unique index on that key.
2) Pick the right refresh strategy
Strategy | When to use | Pros | Cons | Requirements |
---|---|---|---|---|
Regular | Low traffic; brief read lock OK | Simple | Locks reads during refresh | none |
Concurrent | Reads must stay available | No read lock | Slower; cannot run inside a transaction block | Unique index covering all rows |
Swap | Low downtime without unique key; complex rebuild choreography | Atomic content swap; flexible | Recreate indexes/privileges; more moving parts | Transactional swap & careful scripting |
Guidelines
- If you can add a unique key, concurrent is often ideal.
- If you can’t, consider swap for near-zero read downtime.
- Use regular for simplicity when brief read locks are acceptable (internal/admin UIs, nightly batches).
3) Indexing the MV
-
Unique index (for concurrent refresh):
CREATE UNIQUE INDEX CONCURRENTLY IF NOT EXISTS idx_mv_user_activity_user_id ON "public"."mv_user_activity"(user_id);
- Query-pattern indexes. Add additional (non-unique) indexes to match your read queries: projections, predicates, join keys.
-
Create before first concurrent load. Pattern:
CREATE MATERIALIZED VIEW "public"."mv_user_activity" WITH NO DATA AS ... ; CREATE UNIQUE INDEX CONCURRENTLY ... ; REFRESH MATERIALIZED VIEW CONCURRENTLY "public"."mv_user_activity";
This avoids one blocking load and ensures the first population keeps reads open.
With swap, remember the new temp MV needs the same indexes before you swap names.
4) Locking & transaction gotchas
REFRESH MATERIALIZED VIEW CONCURRENTLY
cannot run inside a transaction block (same rule asCREATE INDEX CONCURRENTLY
). Ensure your orchestration runs it outside a wrappingBEGIN
.- Regular refresh holds locks that block readers; schedule during off-peak or prefer concurrent/swap.
- Beware of long transactions or open cursors on the MV — they can stall refresh or index builds. Monitor
pg_stat_activity
.
5) Statistics, vacuum, and planner health
- MVs are physical relations. Keep planner stats fresh:
- Large changes?
ANALYZE "schema"."mv_name"
after refresh (or rely on autovacuum). - Track bloat/visibility maps like any big table; indexes still need love over time.
- Large changes?
- Estimated row counts (
reltuples
) power fast metrics. For verification, use exact counts selectively (COUNT(*)
).
6) Scheduling & staleness windows
- Define a freshness SLO (e.g., “≤ 15 min stale”). Align your refresh cadence accordingly.
- Watch duration trends in run tables; scale hardware or strategy if refresh exceeds the SLO.
- Stagger heavy MVs to avoid load spikes; don’t refresh all at once if not necessary.
7) Security & access
-
Grant least privilege:
GRANT SELECT ON "public"."mv_user_activity" TO app_user;
-
Treat MVs as derived data that may contain sensitive aggregates. Apply the same review you’d apply to tables exposed to applications.
8) Dependency management
- Capture upstreams in definition
dependencies
for visibility (“tables/views/MVs I read from”). It helps with ordering and documentation. - If MVs depend on other MVs, define a safe refresh order or let independent jobs run with retries. Swap strategy can minimize cascading lock risks.
9) When not to use an MV
- Single-row lookups or trivial filters that indexes can satisfy.
- Highly volatile data with zero tolerance for staleness (consider caching layers or streaming materialization instead).
- Write paths where you need transactional visibility of derived results (MVs update only on refresh).
10) Operational playbooks
Creation (idempotent with NO DATA)
CREATE MATERIALIZED VIEW "public"."mv_x" WITH NO DATA AS
SELECT ... ;
-- Build required indexes (CONCURRENTLY where possible)
CREATE UNIQUE INDEX CONCURRENTLY IF NOT EXISTS idx_mv_x_key ON "public"."mv_x"(key);
-- First population (keeps reads available)
REFRESH MATERIALIZED VIEW CONCURRENTLY "public"."mv_x";
Regular vs concurrent refresh
-- Regular (locks readers)
REFRESH MATERIALIZED VIEW "public"."mv_x";
-- Concurrent (requires unique index; not inside BEGIN)
REFRESH MATERIALIZED VIEW CONCURRENTLY "public"."mv_x";
Swap (pattern)
-- 1) Build temp
CREATE MATERIALIZED VIEW "public"."mv_x__tmp_20250817" AS
SELECT ... ;
-- 2) Recreate indexes/privileges on temp
CREATE UNIQUE INDEX CONCURRENTLY idx_mv_x_tmp_key ON "public"."mv_x__tmp_20250817"(key);
GRANT SELECT ON "public"."mv_x__tmp_20250817" TO app_user;
-- 3) Transactional swap
BEGIN;
ALTER MATERIALIZED VIEW "public"."mv_x" RENAME TO "mv_x__old_20250817";
ALTER MATERIALIZED VIEW "public"."mv_x__tmp_20250817" RENAME TO "mv_x";
COMMIT;
-- 4) Cleanup old later (optional, outside critical path)
DROP MATERIALIZED VIEW IF EXISTS "public"."mv_x__old_20250817";
11) Monitoring & alerting
- Use the run tables to drive dashboards:
duration_ms
over time; percentiles- last
status
per MV rows_count
(if tracked) to catch anomalies
- Alert on:
- consecutive failures
- refresh time > SLO
- no successful refresh within the freshness window
12) Partitioning & very large datasets
- MVs themselves aren’t partitioned. If source tables are partitioned, your MV still materializes the full result. For extreme sizes:
- Consider sharding the logic across multiple MVs and
UNION ALL
a view on top. - Or move to a swap pattern that incrementally builds per-slice content and then unifies.
- Consider sharding the logic across multiple MVs and
13) CI/CD integration
- Treat MVs as generated artifacts:
- Keep definitions in code.
- Use Create/Refresh/Delete services and jobs to manage lifecycle.
- In releases, prefer swap or concurrent to reduce blast radius.
- Include validation in pre-prod: run the demo validator or your own on representative data.
14) Common pitfalls
- Concurrent refresh without unique index You’ll get: “cannot refresh materialized view concurrently without a unique index”.
- Wrapping concurrent refresh in a transaction Like
CREATE INDEX CONCURRENTLY
, it must be outside a transaction block. SELECT *
drift Downstream code breaks when new columns appear. Be explicit.- Volatile SQL Results differ across runs; tests and diffs become unreliable.
15) How this project helps
- Definitions: Single source of truth (name, SQL, strategy, metadata).
- Services: Safe quoting, validation, and strategy-specific execution.
- Jobs & Adapter: Uniform enqueue → run tables with timings/status/meta.
- Rake tasks: Admin-friendly CLI with confirmations and logging.
- Demo & Validator: Seed, build, refresh, and benchmark with reproducible scripts.