Validation & Benchmarking
This guide shows how we verify correctness (materialized view rows match the baseline SQL) and measure performance (baseline vs. MV reads) using the companion demo app mat_views_demo
.
It uses two sets of tasks:
- Demo data & views (seed, define, create/index, refresh)
- Validation (timing and reporting)
Everything logs via Rails.logger
and produces a CSV report suitable for docs and dashboards.
Prerequisites
-
Project layout (adjacent repos)
rails_materialized_views/ mat_views/ # the engine mat_views_demo/ # the demo app (docs/examples live here)
-
Demo app configured with a working Postgres connection.
-
Queue backend (optional for this guide) if you plan to run create/refresh via jobs. The demo bootstrap uses the engine’s rake tasks which enqueue via:
MatViews::Jobs::Adapter.enqueue(job_class, queue: MatViews.configuration.job_queue, args: [...])
The adapter assumes your backend is configured (ActiveJob, Sidekiq, Resque).
1) Demo data & views (one-shot bootstrap)
From the mat_views_demo/
directory:
# Seed ~500 users (× scale), define 4 MV definitions,
# create them, add unique indexes, and do an initial refresh.
bin/rake 'mat_views:bootstrap_demo[1,--yes]'
What it does:
- Seed users, accounts, events, sessions (batched
insert_all
)- Resets PK sequences
- Adds helpful base indexes (
events.occurred_at
,sessions.started_at
,accounts.plan
)
- Define four MV definitions (idempotent):
mv_users
(1 table)mv_user_accounts
(2 tables)mv_user_accounts_events
(3 tables)mv_user_activity
(4 tables)
- Create all MVs via engine tasks (skip confirm)
- Ensure unique indexes on each MV (required for concurrent refresh)
- Refresh all MVs (default
:estimated
row count)
Prefer the bootstrap for a clean slate. If you want the steps separately:
bin/rake 'mat_views:seed_demo[1,--yes]'
bin/rake mat_views:define_demo_views
bin/rake 'mat_views:create_all[,--yes]'
bin/rake 'mat_views:refresh_all[,--yes]'
Notes
mat_views:seed_demo[scale,--yes]
acceptsscale
(integer) and a confirmation skip flag (--yes
orYES=1
).- All actions write progress via
Rails.logger
(notputs
). Make sure your dev logger outputs to STDOUT.
2) Validate & benchmark
From mat_views_demo/
:
# Run the validator for N iterations per view (default: 5)
bin/rake 'mat_views:validate_demo[100]'
# or with an env var:
ITER=100 bin/rake mat_views:validate_demo
What the validator does
- Discovers existing materialized views from
pg_matviews
(excluding system schemas). - Matches them to
MatViews::MatViewDefinition
records (to fetch the baseline SQL). - For each definition present as a physical MV:
- Runs the baseline SQL (the definition’s
SELECT
)iterations
times and records per-iteration duration. - Runs the MV read (a
SELECT
from the MV)iterations
times and records per-iteration duration. - Captures row counts from both paths (simple correctness check).
- Runs the baseline SQL (the definition’s
-
Writes a CSV to:
tmp/mv_validate/<UTC timestamp>/report.csv
-
Logs a per-view summary, e.g.:
[validate_demo] mv_user_activity: baseline_avg=161ms, mv_avg=1ms, speedup≈161.0x
CSV Schema
Header:
view,iterations,
baseline_avg_ms,baseline_min_ms,baseline_max_ms,
mv_avg_ms,mv_min_ms,mv_max_ms,
speedup_avg,
rows_baseline,rows_mv
- speedup_avg =
baseline_avg_ms / mv_avg_ms
(higher is better) - Rows are computed by the validator (counts from each path).
Where the numbers come from
- Timing uses a monotonic clock in
MatViewsDemo::Validator
. - No “warmup” by default (we measure cold/hot mixed reality). If you prefer warm cache, run the task twice and use the second run, or extend the validator with a warmup loop.
Example results (for docs)
Quick smoke (5 iterations)
view | iterations | baseline_avg_ms | baseline_min_ms | baseline_max_ms | mv_avg_ms | mv_min_ms | mv_max_ms | speedup_avg | rows_baseline | rows_mv |
---|---|---|---|---|---|---|---|---|---|---|
mv_user_accounts | 5 | 31 | 16 | 74 | 2 | 1 | 5 | 15.5 | 50000 | 50000 |
mv_user_accounts_events | 5 | 78 | 70 | 108 | 1 | 1 | 2 | 78.0 | 50000 | 50000 |
mv_user_activity | 5 | 161 | 159 | 165 | 1 | 1 | 2 | 161.0 | 50000 | 50000 |
mv_users | 5 | 1 | 1 | 2 | 2 | 1 | 7 | 0.5 | 50000 | 50000 |
Stable averages (100 iterations)
view | iterations | baseline_avg_ms | baseline_min_ms | baseline_max_ms | mv_avg_ms | mv_min_ms | mv_max_ms | speedup_avg | rows_baseline | rows_mv |
---|---|---|---|---|---|---|---|---|---|---|
mv_user_accounts | 100 | 17 | 15 | 69 | 1 | 1 | 20 | 17.0 | 50000 | 50000 |
mv_user_accounts_events | 100 | 70 | 70 | 73 | 1 | 1 | 3 | 70.0 | 50000 | 50000 |
mv_user_activity | 100 | 161 | 158 | 242 | 1 | 1 | 2 | 161.0 | 50000 | 50000 |
mv_users | 100 | 1 | 1 | 1 | 1 | 1 | 2 | 1.0 | 50000 | 50000 |
These numbers are illustrative from the demo dataset. Expect different absolute values in your environment, but the relative gains on multi-join aggregates tend to be dramatic (10×–160×+).
Correctness checks
- The validator compares row counts between the baseline and MV read.
- For tighter guarantees, you can extend
MatViewsDemo::Validator
to:- Compare aggregates (e.g.,
SUM
/COUNT DISTINCT
) - Sample keys and compare row contents
- Do a full set diff in staging (expensive!)
- Compare aggregates (e.g.,
Tips for reliable benchmarks
- Use a quiet database (limit other load).
- Keep autovacuum/ANALYZE on (statistics matter).
- Run more iterations (e.g., 100 or 300) for tighter min/max bands.
- Consider one warmup pass if you want to measure hot-cache behavior.
- Document environment: Postgres version, hardware, shared buffers, etc.
Troubleshooting
-
“No materialized views found” Run the bootstrap or create views. The validator discovers views from
pg_matviews
. -
“Found MVs in DB, but no matching MatViewDefinition records” The validator needs the definition to fetch the baseline SQL. Ensure your
MatViewDefinition
records match the MV names. -
Row counts don’t match Your MV SQL may have drifted from the baseline SQL. Recreate/swap the MV with the current definition, or debug differences (filters/joins/grouping).
-
Variance is high Increase iterations; ensure low background load; check for big autovacuum runs or long transactions.
Artifacts & docs integration
- CSV lives under
tmp/mv_validate/<timestamp>/report.csv
(demo app). - Paste rendered Markdown tables into:
- Root
README.md
→ Why materialized views? Real numbers docs/validation/
→ per-run archives, if you keep historical results
- Root
You can also drop the CSV into /docs/validation/
and link it from the docs site.
What’s included in the demo views
We ship 4 definitions to represent increasing complexity:
mv_users
— single table projectionmv_user_accounts
— two-table aggregate joinmv_user_accounts_events
— three-table aggregate joinmv_user_activity
— four-table aggregate join
All are configured for concurrent refresh with appropriate unique indexes on the result key (id
or user_id
).