ZTD Benchmarking

This benchmark compares traditional migration-style repository tests with ZTD fixture-backed repository tests by executing the same repository implementation in both workflows. The goal is to surface how much of the cost comes from migration/seed/cleanup overhead versus how much comes from query time and the ZTD rewrite pipeline.

Comparison rules

Both workflows exercise the same repository class and query definitions so differences stem only from the surrounding test infrastructure.
Traditional runs create a schema per repetition, apply the benchmarks/ztd-bench-vs-raw/ddl/ecommerce.sql migration, seed the required tables, call the repository method, then drop the schema.
ZTD runs skip migration/seed/cleanup, hook into the repository query to capture the emitted SQL, feed that SQL into testkit-postgres for rewrite/fixture generation, and execute the rewritten statements.
Measurements cover variable suite sizes and steady-state loops so the report can show the impact of runner startup, warm runs, and incremental iteration cost.

What It Measures

End-to-end wall-clock time per scenario, including runner startup when applicable.
Total DB execution time and aggregate SQL statement count so it is clear why Traditional issues more SQL work.
ZTD-specific breakdowns for rewrite processing, fixture materialization, SQL generation, and any other overhead the fixture pipeline introduces.
Steady-state iteration metrics that show average incremental time, SQL count, and DB execution after the runner is warm.

What It Excludes

Postgres container startup time (the benchmark reuses a single container across all runs).
External network and application-layer time outside the test process.
Any caching beyond a single benchmark run (each run rebuilds schema/data for traditional tests).
Long-lived test runner behavior (watch mode).

How To Run

bash

pnpm ztd:bench

The command runs the Vitest suites defined under benchmarks/ztd-bench-vs-raw, ensuring the benchmark logic and fixtures remain inside this directory rather than depending on the playground workspace.

The report is written to tmp/bench/report.md. Detailed JSON diagnostics stream to tmp/bench/log.jsonl; keep the default quiet level for minimal output or bump BENCH_LOG_LEVEL/use --verbose/--debug when you want to follow the JSON lines on the console.

Running the benchmark also captures one representative Traditional SQL sequence per case and saves it under tmp/bench/traditional-sql/ (e.g., customer-summary.sql, product-ranking.sql, sales-summary.sql) so you can inspect the migration, seeding, query, and cleanup statements without collecting multiple repetitions.

Requirements

Docker (used by @testcontainers/postgresql).
Node.js 20+.

Configuration

You can adjust the benchmark without editing code using environment variables:

ZTD_BENCH_RUNS (default: 10) - measured iterations per scenario.
ZTD_BENCH_WARMUP (default: 2) - warmup iterations per scenario.
ZTD_BENCH_WORKERS (default: 4) - parallel worker count for the parallel runs.
ZTD_BENCH_REPORT_PATH - override the report output path.
BENCH_CONNECTION_MODELS (default: perWorker,caseLocal) - comma-separated list of connection models to exercise sequentially; each model is applied to both Traditional and ZTD workflows before moving to the next so the report can compare PID/session/worker activity metrics across perWorker and caseLocal.
BENCH_CONNECTION_MODEL (default: perWorker) - the single connection model that both Traditional and ZTD suites use. When BENCH_CONNECTION_MODELS is present, this value must match the first entry. The benchmark enforces that this variable agrees with any legacy ZTD_BENCH_CONNECTION_MODEL/TRADITIONAL_BENCH_CONNECTION_MODEL settings, so both workflows always share the same connectivity.
ZTD_BENCH_CONNECTION_MODEL & TRADITIONAL_BENCH_CONNECTION_MODEL (legacy) - these older variables now act as aliases to BENCH_CONNECTION_MODEL/BENCH_CONNECTION_MODELS. They must be either unset or set to the same normalized model (perWorker, caseLocal, or shared/case-local aliases) and must appear inside BENCH_CONNECTION_MODELS if that list is used.
BENCH_PARALLEL_WORKER_COUNTS (default: 4,8) - comma-separated worker counts for the parallel suites; ensures the report covers both 4-worker and 8-worker configurations. Setting ZTD_BENCH_WORKERS overrides this list with a single worker count for legacy scripts.
BENCH_LOG_LEVEL (quiet|info|debug) - defaults to quiet so only the start/end summaries hit the console while detailed JSON diagnostics stream to tmp/bench/log.jsonl. Set it to info or debug, or pass --verbose/--debug when invoking ts-node benchmarks/ztd-test-benchmark.ts, to mirror those events on the console for troubleshooting.

Directory Layout

Everything the benchmark needs lives under benchmarks/ztd-bench-vs-raw:

sql/ contains the canonical query strings executed by both traditional and ZTD tests.
ddl/ holds the schema file that testkit-postgres uses to validate and plan each rewrite.
tests/ hosts the Vitest suites, fixtures, and helpers that drive the runner, steady-state, and global setup flows.

Keeping the benchmark code self-contained makes it clear that this directory is the authoritative measurement surface; it does not depend on playground demos and is safe to run from the repository root.

Reproducing Results

The benchmark uses SQL from benchmarks/ztd-bench-vs-raw/sql, fixtures defined under benchmarks/ztd-bench-vs-raw/tests/support, and schema metadata in benchmarks/ztd-bench-vs-raw/ddl. Run it from the repository root to ensure the benchmark runner and package dependencies resolve correctly.

testkit-postgres mode comparison

Use pnpm ztd:bench:testkit-postgres-mode when you want to look at testkit-postgres’s two migration modes in isolation. The script runs every case under both the fixture-driven ZTD path and the Traditional DDL/seeding path inside testkit-postgres, then writes tmp/testkit-postgres-mode-report.md with per-case averages for duration, SQL count, DB time, migration time, and cleanup time.

The report is a lightweight complement to the full pnpm ztd:bench dataset and is a good place to start when you only need the driver-level comparison without the runner/parallelism instrumentation.

Assumptions

Traditional unit tests treat SQL generation cost as zero because SQL is provided as raw strings.
ZTD unit tests include repository-level SQL generation plus SQL-to-ZTD rewrite costs.
Runner-based scenarios include Vitest startup time; the in-process lower bound does not.

Concurrency diagnostics

The traditional parallel summary in tmp/bench/report.md now reports the 95th percentile of connection waiting, migration, and cleanup durations, so you can immediately see which of those steps is limiting throughput when parallel workers are added.
The new ZTD Concurrency Diagnostics section highlights the measured parallel run with the largest suite (typically 120 tests) and the highest worker count; it surfaces the waiting p95 plus the peak pg_stat_activity active sessions so you can explain why adding more workers stops improving the runtime.
A lightweight Vitest smoke test lives at benchmarks/ztd-bench-vs-raw/tests/diagnostics/traditional-parallelism.test.ts. It runs a barriered pg_sleep workload through runTraditionalParallelismValidation and fails if the requested number of PostgreSQL sessions never go active simultaneously. You can rerun it directly with:
bash
```
pnpm vitest --config benchmarks/ztd-bench-vs-raw/vitest.config.ts run benchmarks/ztd-bench-vs-raw/tests/diagnostics/traditional-parallelism.test.ts
```
The test runs before the full benchmark and gives CI/local runs a quick fail-fast surface if PostgreSQL cannot open the expected number of concurrent sessions.

AST stringify microbenchmark

Purpose: get trustworthy μs/nanosecond measurements for the AST→SQL stringify step so the team can decide whether further optimization is needed.
Run the dedicated script (it parses the real repository SQL, warms up the formatter, and loops SqlFormatter.format() repeatedly):
bash
```
pnpm ts-node benchmarks/ztd-bench-vs-raw/stringify-only-benchmark.ts
```
Environment knobs:
- STRINGIFY_ITERATIONS controls how many measured iterations run (default 100000).
- STRINGIFY_WARMUP controls the warmup iterations before timing (default 10000).
Output: for each repository SQL the script prints total iteration count, total elapsed time (μs/ns), and average per-stringify time. Use these numbers to decide if μs-level stability is adequate or more optimization is warranted.

ZTD Benchmarking ​

Comparison rules ​

What It Measures ​

What It Excludes ​

How To Run ​

Requirements ​

Configuration ​

Directory Layout ​

Reproducing Results ​

testkit-postgres mode comparison ​

Assumptions ​

Concurrency diagnostics ​

AST stringify microbenchmark ​