Definitions (schema & SQL)
Materialized views are defined and managed through the MatViews::MatViewDefinition
model. This page explains the definition schema, how we resolve schemas & quoting safely, and how to write production-ready SQL for views.
What a definition is
A definition is a persistent record that tells the engine:
- what to name the materialized view,
- the SQL that produces its rows,
- how it should be refreshed (strategy and options),
- what indexes (especially unique) it expects,
- optional dependencies metadata for your own tooling.
Once a definition exists, you can create, refresh (regular/concurrent/swap), and delete the materialized view using services, jobs, or rake tasks.
Model schema
MatViews::MatViewDefinition
has the following key attributes (columns):
Column | Type | Required | Description |
---|---|---|---|
name | string | ✅ | Unqualified relation name (e.g., mv_user_activity ). Must be a simple SQL identifier (no dot). |
sql | text | ✅ | The SELECT statement used to build the view. |
refresh_strategy | enum | ✅ | One of regular , concurrent , swap . Dictates the service used by RefreshViewJob . |
schedule_cron | string | Optional: for your scheduler to read (engine does not schedule itself). | |
unique_index_columns | jsonb[] | Columns expected to be uniquely indexed on the MV (e.g., ["user_id"] ). Required for concurrent refresh. | |
dependencies | jsonb[] | Free-form list (tables, views, other MVs) you consider inputs; purely informational. | |
last_refreshed_at | datetime | Optional timestamp you can maintain in your app. |
Naming rule:
name
must be a simple identifier (/^[A-Za-z_][A-Za-z0-9_]*$/
). Do not put a schema in the name (no dots). The engine resolves the schema separately (see below).
Schema resolution & quoting
We always build a qualified relation like:
"schema"."relname"
Key points:
- We determine the schema from the database session’s
search_path
.- Empty or invalid
search_path
→ we fall back topublic
. $user
tokens are supported; if the user schema doesn’t exist we fall back topublic
.- Quoted tokens in
search_path
are handled.
- Empty or invalid
-
The definition’s
name
must not include a dot; it’s treated as a bare identifier and validated (public.mv
would be rejected). - Quoting:
- Schema and relation parts are each identifier-quoted to avoid injection and case/keyword issues.
- Internally we centralize this in helpers that produce a
qualified_rel
string suitable for SQL likeCREATE MATERIALIZED VIEW
,REFRESH MATERIALIZED VIEW
, andDROP MATERIALIZED VIEW
.
If you ever hand-roll SQL, never interpolate raw strings; always use the engine’s helpers (e.g., the service’s
qualified_rel
) orActiveRecord
quoting methods to build"schema"."rel"
safely.
Refresh strategies (recap)
- regular —
REFRESH MATERIALIZED VIEW
(locks reads during refresh). - concurrent —
REFRESH MATERIALIZED VIEW CONCURRENTLY
(keeps reads available) requires a unique index covering all rows. - swap — create a temp MV from the definition SQL, atomically swap names, recreate indexes/permissions; ideal for low downtime when concurrent is not feasible.
Choose in the definition:
refresh_strategy: :regular # or :concurrent, :swap
Writing production-ready SQL
General guidelines
- SELECT only: Your
sql
should be a single SELECT (CTEs are fine). - Deterministic output: Prefer immutable/strict functions; avoid volatile functions (
random()
,clock_timestamp()
) unless you truly need them. - No
SELECT *
: List columns explicitly. It stabilizes indexes, consumers, and future changes. - Group/aggregate carefully: Every non-aggregated column must be in
GROUP BY
. Keep it minimal to avoid accidental row duplication. - Filter early: Push filters into the base CTE/joins rather than on top of an already massive result.
- Use CTEs for clarity: They help readability and make swap/concurrent troubleshooting easier.
-
Indexes: If you plan to refresh concurrently, design a unique key in the result set and index it after creation:
CREATE UNIQUE INDEX CONCURRENTLY idx_mv_user_activity_user_id ON "public"."mv_user_activity"(user_id);
- Permissions: Grant read privileges to app roles as needed after creation.
Example: simple one-table view
MatViews::MatViewDefinition.create!(
name: 'mv_users',
sql: <<~SQL,
SELECT u.id, u.email, u.created_at
FROM users u
SQL
refresh_strategy: :regular
)
Example: two-table aggregate
MatViews::MatViewDefinition.create!(
name: 'mv_user_accounts',
sql: <<~SQL,
WITH accounts_by_user AS (
SELECT a.user_id, COUNT(*) AS accounts_count
FROM accounts a
GROUP BY a.user_id
)
SELECT u.id AS user_id,
COALESCE(ab.accounts_count, 0) AS accounts_count
FROM users u
LEFT JOIN accounts_by_user ab ON ab.user_id = u.id
SQL
refresh_strategy: :concurrent,
unique_index_columns: ['user_id']
)
Example: three- and four-table aggregates (pattern)
MatViews::MatViewDefinition.create!(
name: 'mv_user_activity',
sql: <<~SQL,
WITH a AS (
SELECT user_id, COUNT(*) AS accounts_count
FROM accounts GROUP BY user_id
),
e AS (
SELECT user_id, COUNT(*) AS events_count
FROM events GROUP BY user_id
),
s AS (
SELECT user_id, COUNT(*) AS sessions_count
FROM sessions GROUP BY user_id
)
SELECT u.id AS user_id,
COALESCE(a.accounts_count, 0) AS accounts_count,
COALESCE(e.events_count, 0) AS events_count,
COALESCE(s.sessions_count, 0) AS sessions_count
FROM users u
LEFT JOIN a ON a.user_id = u.id
LEFT JOIN e ON e.user_id = u.id
LEFT JOIN s ON s.user_id = u.id
SQL
refresh_strategy: :concurrent,
unique_index_columns: ['user_id']
)
Unique indexes & concurrent refresh
For concurrent refresh to succeed, Postgres requires a unique index that uniquely identifies every row in the MV.
- Pick a stable, deterministic key from your result set (e.g.,
user_id
, or a composite). -
Create the index after the MV is created and before you attempt a concurrent refresh:
CREATE UNIQUE INDEX CONCURRENTLY idx_mv_user_activity_user_id ON "public"."mv_user_activity"(user_id);
- If you use the swap strategy, ensure the swap step recreates indexes on the new view before renaming or immediately after, per your service’s implementation.
Dependencies (optional metadata)
dependencies
is a JSON array you can use to document upstream tables/views/MVs:
dependencies: [
{ "type": "table", "name": "users" },
{ "type": "table", "name": "accounts" },
{ "type": "mv", "name": "mv_users" }
]
This has no functional effect in the engine today; it’s useful for your dashboards, CI checks, or doc generation.
Validations & safety checks
- Name validation:
name
must be a simple identifier (no schema). If you need a different schema, change the DBsearch_path
(the engine resolves and validates it). - Existence checks: Services verify whether the MV exists when appropriate (e.g., creation vs refresh vs delete with
IF EXISTS
). - Quoting: All SQL issued by services uses safe identifier quoting for schema and relation separately.
- Row-count strategy (refresh):
:estimated
(fast, viareltuples
) or:exact
(viaCOUNT(*)
).
Common pitfalls & how to avoid them
-
Using a dotted name in
definition.name
Don’t. The engine will rejectpublic.mv_name
. Usemv_name
; schema is resolved fromsearch_path
. -
Concurrent refresh without unique index You’ll get
ERROR: cannot refresh materialized view concurrently without a unique index
. Design your result set to include a unique key, then index it. -
Volatile functions in SQL Avoid
random()
,clock_timestamp()
, etc. They can cause non-repeatable results and surprise diffs. -
Relying on
ORDER BY
withoutLIMIT
ORDER BY
in a view is usually wasted work unless it’s part of a limited/top-N pattern. Sort at read time if needed. -
Forgetting permissions Grant
SELECT
to your app role after creation or swap:GRANT SELECT ON "public"."mv_user_activity" TO app_user;
Creating, refreshing, deleting
Use the services programmatically, or the rake tasks:
# Create
MatViews::Services::CreateView.new(defn, force: true).run
# Refresh (strategy chosen by definition.refresh_strategy)
MatViews::RefreshViewJob.perform_later(defn.id, row_count_strategy: :estimated)
# Delete
MatViews::DeleteViewJob.perform_later(defn.id, cascade: false)
All enqueue paths go through:
MatViews::Jobs::Adapter.enqueue(job_class, queue: MatViews.configuration.job_queue, args: [...])
The adapter uses your configured backend (ActiveJob / Sidekiq / Resque). It doesn’t guess.