Load-Testing SimpleIDGen: 50 Concurrent Users on Production
50 virtual users, hammering production with zero think time, sent 32,000+ requests and generated 146,800 synthetic patient records — with zero errors, API latency under 180 ms (p99), and the daily-quota guardrail holding to the exact row.
This is the first stress test of the live stack (simpleidgen.com + api.simpleidgen.com), run against the real limits — nothing raised. Three phases: a read/edge flood, a generation storm, and a deliberate quota-saturation run to watch the guardrail engage.
§ · How it was run
A dependency-free closed-loop generator (50 threads, persistent keep-alive connections, back-to-back requests — no pauses) drove production directly. Generation used the MCP path (generate_people, synchronous, ≤ 100 rows/call), which exercises the same deterministic generator and the same per-account daily-quota reservation as the REST job pipeline. Every limit stayed at its production value: 5,000 rows/account/day, and no configuration was loosened. Two honest caveats, stated up front: the load came from a single client link (so the big file-download latencies below reflect our ~47 MB/s test connection, not the server), and requests used HTTP/1.1 keep-alive rather than the HTTP/2 a browser negotiates.
§ · Phase 1 — the read/edge flood
A launch-day read mix — homepage, product and analytics pages, free-sample downloads, and the dynamic API meta endpoints — at 50 VUs for 60 seconds.
| Metric | Value |
|---|---|
| Requests | 23,296 |
| Throughput | 378 req/s |
| Data transferred | 2,882 MB (46.7 MB/s) |
| Success rate | 100.00% (0 errors, 0 non-2xx) |
| Dynamic API worker latency | p50 62 ms · p95 ≤ 125 ms · p99 ≤ 180 ms |
The dynamic API-worker endpoints — which actually execute code per request — held p99 under 180 ms at 378 req/s with not one failure. Static pages are edge-cached (fast). The only slow tail was the sample-file downloads (CSV p95 862 ms, JSONL p95 1,844 ms), and that is our single test link saturating at 46.7 MB/s — the edge served all 2.8 GB without an error.
§ · Phase 2 — the generation storm
50 VUs calling generate_people (20 rows/call) across a pool of 40 accounts for 60 seconds — real WASM generation plus an atomic daily-quota reservation on every single call.
| Metric | Value |
|---|---|
| Generation calls | 6,590 |
| Throughput | 110 calls/s · 2,192 rows/s |
| Records generated | 131,800 rows |
| Success rate | 100.00% (0 errors) |
| Generation latency | p50 449 ms · p95 580 ms · p99 638 ms · max 781 ms |
| Quota atomicity | max 3,860 rows on any one account (cap 5,000) · 0 accounts over |
Sustained 2,192 synthetic records per second with a tight latency band (p50 449 ms to p99 638 ms — no runaway tail) and zero errors. Each call is heavier than a read: it generates records in WASM and reserves quota in the database. With 50 VUs racing across 40 shared accounts, the atomic reservation distributed cleanly and no account was over-charged.
§ · Phase 2b — the guardrail under saturation
To watch the daily-quota guardrail actually engage, 50 VUs hammered just 3 accounts (a combined 15,000-row ceiling) at 100 rows/call. Quota exhausts in about 1.5 seconds; the rest of the run is a flood of calls the system must correctly reject.
This is the result that matters most. Under 50 VUs racing on the same three accounts — the exact race a naive counter loses — the atomic reservation capped generation at precisely 15,000 rows, not one over, and rejected the remaining 1,910 calls cleanly with zero server errors. The quota is enforced by a single conditional database write, and it holds under concurrency.
§ · What it means, and what's next
At 50 concurrent users on production, across 32,000+ requests and 146,800 generated records: zero errors, dynamic API latency under 180 ms at p99, 2,192 records/s sustained, and a quota guardrail that is exact under the concurrent race. The edge absorbs the read flood; the generator and the database hold the write path. The next run scales this to 200–300 VUs from a distributed source (removing the single-link caveat) and adds the async REST job pipeline (queue → R2). Every figure here is reproducible with the seeded generator; the limits shown are the live production limits, unchanged for the test.
§ · Frequently asked
Production — api.simpleidgen.com and simpleidgen.com — at the real limits, nothing raised. Turnstile was temporarily bypassed (via a secret-gated flag, reverted immediately after) only so accounts could be created by script; every other guardrail, including the 5,000-rows/account/day quota, stayed at its production value.
Because it was scriptable without the emailed one-time code. The MCP generate_people tool runs the same deterministic generator and the same atomic quota reservation as the REST jobs, so it stresses the generator and database directly. The async REST pipeline (queue → R2) is the next run.
No. In the saturation run, 50 VUs racing on three accounts generated exactly 15,000 rows — the combined 3×5,000 ceiling — with zero overrun, then correctly rejected 1,910 further calls. The reservation is a single conditional database write, which stays atomic under concurrency.
Those reflect the test's single client link saturating at ~47 MB/s while pulling 2.8 GB, not server slowness — the edge served every byte with zero errors. A distributed load source removes that ceiling; it's noted as a caveat, not a server result.