Was the load test run against production or a staging copy?

Production — api.simpleidgen.com and simpleidgen.com — at the real limits, nothing raised. Turnstile was temporarily bypassed (secret-gated, reverted immediately after) only so accounts could be created by script; every other guardrail, including the 5,000-rows/account/day quota, stayed at its production value.

How fast did it generate synthetic records under load?

It sustained 2,192 synthetic patient records per second across 50 concurrent users, with generation latency from p50 449 ms to p99 638 ms and zero errors.

Why are the sample-download latencies higher than the API latencies?

Those reflect the test's single client link saturating at about 47 MB/s while pulling 2.8 GB, not server slowness — the edge served every byte with zero errors.

§ · Worked example

Load-Testing SimpleIDGen: 50 Concurrent Users on Production

50 virtual users, hammering production with zero think time, sent 32,000+ requests and generated 146,800 synthetic patient records — with zero errors, API latency under 180 ms (p99), and the daily-quota guardrail holding to the exact row.

This is the first stress test of the live stack (simpleidgen.com + api.simpleidgen.com), run against the real limits — nothing raised. Three phases: a read/edge flood, a generation storm, and a deliberate quota-saturation run to watch the guardrail engage.

§ · How it was run

A dependency-free closed-loop generator (50 threads, persistent keep-alive connections, back-to-back requests — no pauses) drove production directly. Generation used the MCP path (generate_people, synchronous, ≤ 100 rows/call), which exercises the same deterministic generator and the same per-account daily-quota reservation as the REST job pipeline. Every limit stayed at its production value: 5,000 rows/account/day, and no configuration was loosened. Two honest caveats, stated up front: the load came from a single client link (so the big file-download latencies below reflect our ~47 MB/s test connection, not the server), and requests used HTTP/1.1 keep-alive rather than the HTTP/2 a browser negotiates.

§ · Phase 1 — the read/edge flood

A launch-day read mix — homepage, product and analytics pages, free-sample downloads, and the dynamic API meta endpoints — at 50 VUs for 60 seconds.

Metric	Value
Requests	23,296
Throughput	378 req/s
Data transferred	2,882 MB (46.7 MB/s)
Success rate	100.00% (0 errors, 0 non-2xx)
Dynamic API worker latency	p50 62 ms · p95 ≤ 125 ms · p99 ≤ 180 ms

Phase 1 · p95 response latency by endpoint (ms) — 50 VUs, sustained reads

The dynamic API-worker endpoints — which actually execute code per request — held p99 under 180 ms at 378 req/s with not one failure. Static pages are edge-cached (fast). The only slow tail was the sample-file downloads (CSV p95 862 ms, JSONL p95 1,844 ms), and that is our single test link saturating at 46.7 MB/s — the edge served all 2.8 GB without an error.

§ · Phase 2 — the generation storm

50 VUs calling generate_people (20 rows/call) across a pool of 40 accounts for 60 seconds — real WASM generation plus an atomic daily-quota reservation on every single call.

Metric	Value
Generation calls	6,590
Throughput	110 calls/s · 2,192 rows/s
Records generated	131,800 rows
Success rate	100.00% (0 errors)
Generation latency	p50 449 ms · p95 580 ms · p99 638 ms · max 781 ms
Quota atomicity	max 3,860 rows on any one account (cap 5,000) · 0 accounts over

Phase 2 · generate_people latency percentiles (ms) — 50-VU generation storm

Sustained 2,192 synthetic records per second with a tight latency band (p50 449 ms to p99 638 ms — no runaway tail) and zero errors. Each call is heavier than a read: it generates records in WASM and reserves quota in the database. With 50 VUs racing across 40 shared accounts, the atomic reservation distributed cleanly and no account was over-charged.

§ · Phase 2b — the guardrail under saturation

To watch the daily-quota guardrail actually engage, 50 VUs hammered just 3 accounts (a combined 15,000-row ceiling) at 100 rows/call. Quota exhausts in about 1.5 seconds; the rest of the run is a flood of calls the system must correctly reject.

15,000

rows generated — exactly the 3×5,000 ceiling

rows of overrun · 0 accounts over the 5,000-row cap

92.7%

of 2,060 calls correctly quota-blocked · 0 errors

Phase 2b · calls by outcome (count) — 50 VUs saturating a 15,000-row quota ceiling

This is the result that matters most. Under 50 VUs racing on the same three accounts — the exact race a naive counter loses — the atomic reservation capped generation at precisely 15,000 rows, not one over, and rejected the remaining 1,910 calls cleanly with zero server errors. The quota is enforced by a single conditional database write, and it holds under concurrency.

§ · What it means, and what's next

At 50 concurrent users on production, across 32,000+ requests and 146,800 generated records: zero errors, dynamic API latency under 180 ms at p99, 2,192 records/s sustained, and a quota guardrail that is exact under the concurrent race. The edge absorbs the read flood; the generator and the database hold the write path. The next run scales this to 200–300 VUs from a distributed source (removing the single-link caveat) and adds the async REST job pipeline (queue → R2). Every figure here is reproducible with the seeded generator; the limits shown are the live production limits, unchanged for the test.

§ · Frequently asked

Was this run against production or a staging copy?

Production — api.simpleidgen.com and simpleidgen.com — at the real limits, nothing raised. Turnstile was temporarily bypassed (via a secret-gated flag, reverted immediately after) only so accounts could be created by script; every other guardrail, including the 5,000-rows/account/day quota, stayed at its production value.

Why test the MCP path instead of the REST job pipeline?

Because it was scriptable without the emailed one-time code. The MCP generate_people tool runs the same deterministic generator and the same atomic quota reservation as the REST jobs, so it stresses the generator and database directly. The async REST pipeline (queue → R2) is the next run.

Did the quota ever get exceeded under load?

No. In the saturation run, 50 VUs racing on three accounts generated exactly 15,000 rows — the combined 3×5,000 ceiling — with zero overrun, then correctly rejected 1,910 further calls. The reservation is a single conditional database write, which stays atomic under concurrency.

Why are the sample-download latencies higher?

Those reflect the test's single client link saturating at ~47 MB/s while pulling 2.8 GB, not server slowness — the edge served every byte with zero errors. A distributed load source removes that ceiling; it's noted as a caveat, not a server result.