§ · Worked example

Load-Testing SimpleIDGen: 50 Concurrent Users on Production

50 virtual users, hammering production with zero think time, sent 32,000+ requests and generated 146,800 synthetic patient records — with zero errors, API latency under 180 ms (p99), and the daily-quota guardrail holding to the exact row.

This is the first stress test of the live stack (simpleidgen.com + api.simpleidgen.com), run against the real limits — nothing raised. Three phases: a read/edge flood, a generation storm, and a deliberate quota-saturation run to watch the guardrail engage.

§ · How it was run

A dependency-free closed-loop generator (50 threads, persistent keep-alive connections, back-to-back requests — no pauses) drove production directly. Generation used the MCP path (generate_people, synchronous, ≤ 100 rows/call), which exercises the same deterministic generator and the same per-account daily-quota reservation as the REST job pipeline. Every limit stayed at its production value: 5,000 rows/account/day, and no configuration was loosened. Two honest caveats, stated up front: the load came from a single client link (so the big file-download latencies below reflect our ~47 MB/s test connection, not the server), and requests used HTTP/1.1 keep-alive rather than the HTTP/2 a browser negotiates.

§ · Phase 1 — the read/edge flood

A launch-day read mix — homepage, product and analytics pages, free-sample downloads, and the dynamic API meta endpoints — at 50 VUs for 60 seconds.

MetricValue
Requests23,296
Throughput378 req/s
Data transferred2,882 MB (46.7 MB/s)
Success rate100.00% (0 errors, 0 non-2xx)
Dynamic API worker latencyp50 62 ms · p95 ≤ 125 ms · p99 ≤ 180 ms
Phase 1 · p95 response latency by endpoint (ms) — 50 VUs, sustained reads
api /health117.00api /meta/versions121.00api /meta/openapi125.00data-analytics hub155.00homepage158.00validation study159.00person product page173.00

The dynamic API-worker endpoints — which actually execute code per request — held p99 under 180 ms at 378 req/s with not one failure. Static pages are edge-cached (fast). The only slow tail was the sample-file downloads (CSV p95 862 ms, JSONL p95 1,844 ms), and that is our single test link saturating at 46.7 MB/s — the edge served all 2.8 GB without an error.

§ · Phase 2 — the generation storm

50 VUs calling generate_people (20 rows/call) across a pool of 40 accounts for 60 seconds — real WASM generation plus an atomic daily-quota reservation on every single call.

MetricValue
Generation calls6,590
Throughput110 calls/s · 2,192 rows/s
Records generated131,800 rows
Success rate100.00% (0 errors)
Generation latencyp50 449 ms · p95 580 ms · p99 638 ms · max 781 ms
Quota atomicitymax 3,860 rows on any one account (cap 5,000) · 0 accounts over
Phase 2 · generate_people latency percentiles (ms) — 50-VU generation storm
0.0820.0449.00p50580.00p95638.00p99781.00max

Sustained 2,192 synthetic records per second with a tight latency band (p50 449 ms to p99 638 ms — no runaway tail) and zero errors. Each call is heavier than a read: it generates records in WASM and reserves quota in the database. With 50 VUs racing across 40 shared accounts, the atomic reservation distributed cleanly and no account was over-charged.

§ · Phase 2b — the guardrail under saturation

To watch the daily-quota guardrail actually engage, 50 VUs hammered just 3 accounts (a combined 15,000-row ceiling) at 100 rows/call. Quota exhausts in about 1.5 seconds; the rest of the run is a flood of calls the system must correctly reject.

15,000
rows generated — exactly the 3×5,000 ceiling
0
rows of overrun · 0 accounts over the 5,000-row cap
92.7%
of 2,060 calls correctly quota-blocked · 0 errors
Phase 2b · calls by outcome (count) — 50 VUs saturating a 15,000-row quota ceiling
0.02000.0150.00generated1910.00quota-blocked

This is the result that matters most. Under 50 VUs racing on the same three accounts — the exact race a naive counter loses — the atomic reservation capped generation at precisely 15,000 rows, not one over, and rejected the remaining 1,910 calls cleanly with zero server errors. The quota is enforced by a single conditional database write, and it holds under concurrency.

§ · What it means, and what's next

At 50 concurrent users on production, across 32,000+ requests and 146,800 generated records: zero errors, dynamic API latency under 180 ms at p99, 2,192 records/s sustained, and a quota guardrail that is exact under the concurrent race. The edge absorbs the read flood; the generator and the database hold the write path. The next run scales this to 200–300 VUs from a distributed source (removing the single-link caveat) and adds the async REST job pipeline (queue → R2). Every figure here is reproducible with the seeded generator; the limits shown are the live production limits, unchanged for the test.

§ · Frequently asked

Q1
Was this run against production or a staging copy?

Production — api.simpleidgen.com and simpleidgen.com — at the real limits, nothing raised. Turnstile was temporarily bypassed (via a secret-gated flag, reverted immediately after) only so accounts could be created by script; every other guardrail, including the 5,000-rows/account/day quota, stayed at its production value.

Q2
Why test the MCP path instead of the REST job pipeline?

Because it was scriptable without the emailed one-time code. The MCP generate_people tool runs the same deterministic generator and the same atomic quota reservation as the REST jobs, so it stresses the generator and database directly. The async REST pipeline (queue → R2) is the next run.

Q3
Did the quota ever get exceeded under load?

No. In the saturation run, 50 VUs racing on three accounts generated exactly 15,000 rows — the combined 3×5,000 ceiling — with zero overrun, then correctly rejected 1,910 further calls. The reservation is a single conditional database write, which stays atomic under concurrency.

Q4
Why are the sample-download latencies higher?

Those reflect the test's single client link saturating at ~47 MB/s while pulling 2.8 GB, not server slowness — the edge served every byte with zero errors. A distributed load source removes that ceiling; it's noted as a caveat, not a server result.