ZTD Benchmarking
This benchmark compares traditional migration-style repository tests with ZTD fixture-backed repository tests by executing the same repository implementation in both workflows. The goal is to surface how much of the cost comes from migration/seed/cleanup overhead versus how much comes from query time and the ZTD rewrite pipeline.
Comparison rules
- Both workflows exercise the same repository class and query definitions so differences stem only from the surrounding test infrastructure.
- Traditional runs create a schema per repetition, apply the
benchmarks/ztd-bench-vs-raw/ddl/ecommerce.sqlmigration, seed the required tables, call the repository method, then drop the schema. - ZTD runs skip migration/seed/cleanup, hook into the repository query to capture the emitted SQL, feed that SQL into pg-testkit for rewrite/fixture generation, and execute the rewritten statements.
- Measurements cover variable suite sizes and steady-state loops so the report can show the impact of runner startup, warm runs, and incremental iteration cost.
What It Measures
- End-to-end wall-clock time per scenario, including runner startup when applicable.
- Total DB execution time and aggregate SQL statement count so it is clear why Traditional issues more SQL work.
- ZTD-specific breakdowns for rewrite processing, fixture materialization, SQL generation, and any other overhead the fixture pipeline introduces.
- Steady-state iteration metrics that show average incremental time, SQL count, and DB execution after the runner is warm.
What It Excludes
- Postgres container startup time (the benchmark reuses a single container across all runs).
- External network and application-layer time outside the test process.
- Any caching beyond a single benchmark run (each run rebuilds schema/data for traditional tests).
- Long-lived test runner behavior (watch mode).
How To Run
pnpm ztd:benchThe command runs the Vitest suites defined under benchmarks/ztd-bench-vs-raw, ensuring the benchmark logic and fixtures remain inside this directory rather than depending on the playground workspace.
The report is written to tmp/bench/report.md. Detailed JSON diagnostics stream to tmp/bench/log.jsonl; keep the default quiet level for minimal output or bump BENCH_LOG_LEVEL/use --verbose/--debug when you want to follow the JSON lines on the console.
Running the benchmark also captures one representative Traditional SQL sequence per case and saves it under tmp/bench/traditional-sql/ (e.g., customer-summary.sql, product-ranking.sql, sales-summary.sql) so you can inspect the migration, seeding, query, and cleanup statements without collecting multiple repetitions.
Requirements
- Docker (used by
@testcontainers/postgresql). - Node.js 20+.
Configuration
You can adjust the benchmark without editing code using environment variables:
ZTD_BENCH_RUNS(default: 10) - measured iterations per scenario.ZTD_BENCH_WARMUP(default: 2) - warmup iterations per scenario.ZTD_BENCH_WORKERS(default: 4) - parallel worker count for the parallel runs.ZTD_BENCH_REPORT_PATH- override the report output path.BENCH_CONNECTION_MODELS(default:perWorker,caseLocal) - comma-separated list of connection models to exercise sequentially; each model is applied to both Traditional and ZTD workflows before moving to the next so the report can compare PID/session/worker activity metrics acrossperWorkerandcaseLocal.BENCH_CONNECTION_MODEL(default:perWorker) - the single connection model that both Traditional and ZTD suites use. WhenBENCH_CONNECTION_MODELSis present, this value must match the first entry. The benchmark enforces that this variable agrees with any legacyZTD_BENCH_CONNECTION_MODEL/TRADITIONAL_BENCH_CONNECTION_MODELsettings, so both workflows always share the same connectivity.ZTD_BENCH_CONNECTION_MODEL&TRADITIONAL_BENCH_CONNECTION_MODEL(legacy) - these older variables now act as aliases toBENCH_CONNECTION_MODEL/BENCH_CONNECTION_MODELS. They must be either unset or set to the same normalized model (perWorker,caseLocal, orshared/case-localaliases) and must appear insideBENCH_CONNECTION_MODELSif that list is used.BENCH_PARALLEL_WORKER_COUNTS(default:4,8) - comma-separated worker counts for the parallel suites; ensures the report covers both 4-worker and 8-worker configurations. SettingZTD_BENCH_WORKERSoverrides this list with a single worker count for legacy scripts.BENCH_LOG_LEVEL(quiet|info|debug) - defaults toquietso only the start/end summaries hit the console while detailed JSON diagnostics stream totmp/bench/log.jsonl. Set it toinfoordebug, or pass--verbose/--debugwhen invokingts-node benchmarks/ztd-test-benchmark.ts, to mirror those events on the console for troubleshooting.
Directory Layout
Everything the benchmark needs lives under benchmarks/ztd-bench-vs-raw:
sql/contains the canonical query strings executed by both traditional and ZTD tests.ddl/holds the schema file that pg-testkit uses to validate and plan each rewrite.tests/hosts the Vitest suites, fixtures, and helpers that drive the runner, steady-state, and global setup flows.
Keeping the benchmark code self-contained makes it clear that this directory is the authoritative measurement surface; it does not depend on playground demos and is safe to run from the repository root.
Reproducing Results
The benchmark uses SQL from benchmarks/ztd-bench-vs-raw/sql, fixtures defined under benchmarks/ztd-bench-vs-raw/tests/support, and schema metadata in benchmarks/ztd-bench-vs-raw/ddl. Run it from the repository root to ensure the benchmark runner and package dependencies resolve correctly.
pg-testkit mode comparison
Use pnpm ztd:bench:pg-testkit-mode when you want to look at pg-testkit’s two migration modes in isolation. The script runs every case under both the fixture-driven ZTD path and the Traditional DDL/seeding path inside pg-testkit, then writes tmp/pg-testkit-mode-report.md with per-case averages for duration, SQL count, DB time, migration time, and cleanup time.
The report is a lightweight complement to the full pnpm ztd:bench dataset and is a good place to start when you only need the driver-level comparison without the runner/parallelism instrumentation.
Assumptions
- Traditional unit tests treat SQL generation cost as zero because SQL is provided as raw strings.
- ZTD unit tests include repository-level SQL generation plus SQL-to-ZTD rewrite costs.
- Runner-based scenarios include Vitest startup time; the in-process lower bound does not.
Concurrency diagnostics
- The traditional parallel summary in
tmp/bench/report.mdnow reports the 95th percentile of connection waiting, migration, and cleanup durations, so you can immediately see which of those steps is limiting throughput when parallel workers are added. - The new ZTD Concurrency Diagnostics section highlights the measured parallel run with the largest suite (typically 120 tests) and the highest worker count; it surfaces the waiting p95 plus the peak
pg_stat_activityactive sessions so you can explain why adding more workers stops improving the runtime. - A lightweight Vitest smoke test lives at
benchmarks/ztd-bench-vs-raw/tests/diagnostics/traditional-parallelism.test.ts. It runs a barrieredpg_sleepworkload throughrunTraditionalParallelismValidationand fails if the requested number of PostgreSQL sessions never go active simultaneously. You can rerun it directly with:bashThe test runs before the full benchmark and gives CI/local runs a quick fail-fast surface if PostgreSQL cannot open the expected number of concurrent sessions.pnpm vitest --config benchmarks/ztd-bench-vs-raw/vitest.config.ts run benchmarks/ztd-bench-vs-raw/tests/diagnostics/traditional-parallelism.test.ts
AST stringify microbenchmark
- Purpose: get trustworthy μs/nanosecond measurements for the AST→SQL stringify step so the team can decide whether further optimization is needed.
- Run the dedicated script (it parses the real repository SQL, warms up the formatter, and loops
SqlFormatter.format()repeatedly):bashpnpm ts-node benchmarks/ztd-bench-vs-raw/stringify-only-benchmark.ts - Environment knobs:
STRINGIFY_ITERATIONScontrols how many measured iterations run (default100000).STRINGIFY_WARMUPcontrols the warmup iterations before timing (default10000).
- Output: for each repository SQL the script prints total iteration count, total elapsed time (μs/ns), and average per-stringify time. Use these numbers to decide if μs-level stability is adequate or more optimization is warranted.