SimpleIDGen
§ · Mockaroo Alternative

A free Mockaroo alternative for realistic person data

Mockaroo is a capable, flexible mock-data tool: many field types, a formula engine, and a schema designer that exports to several formats. Where it generates each column independently, SimpleIDGen draws every attribute on a record from one jointly-calibrated US population — so a 44-year-old's age, BMI, A1c, and income cohere instead of colliding. It's free, with a no-login sample you can download right now.

No signup. Generate your own — 5,000 rows/day free →

§ · The difference: joint calibration

Most mock-data generators — Mockaroo among them — fill each column from its own list or formula. Age comes from one generator, weight from another, income from a third. Each field looks plausible in isolation, but the row as a whole doesn't: you get teenagers with retiree incomes, or a normal BMI sitting next to a diabetic A1c, because nothing ties the columns together.

SimpleIDGen takes the opposite approach. Every record is a synthetic person sampled from distributions fitted to public US references — NHANES 2017–2020, ACS 2022, CDC NDSS 2022, KFF 2023, BLS 2023, US Census 2020. Age and sex condition the health values; geography conditions the ZIP; diabetes status gates whether insulin appears. Cross-field invariants are enforced — BMI follows from height and weight, insulin only appears for diagnosed diabetics, and ZIP always matches state. The result is 65 attributes per record that behave like a real cohort, not unrelated random draws. See exactly how the calibration works →

§ · Side by side
DimensionMockarooSimpleIDGen
What it isGeneral-purpose mock-data tool with many field typesCalibrated synthetic person dataset + API
How fields relateGenerated independently per columnJointly distributed across the whole record
Population realismPlausible values, not fitted to a populationMarginals fitted to NHANES, ACS, CDC, KFF, BLS, Census
Cross-field invariantsManual — via formulas you writeEnforced (BMI, insulin↔diabetes, ZIP↔state)
Health depthGeneric fields, not clinically modeledA1c, BMI, blood pressure, diabetes, hypertension, CKD, meds
ReproducibilityFresh random data each runDeterministic by seed — same seed, same people
OutputCSV, JSON, SQL, and moreCSV or JSONL, instant
Try before accountBrowser tool, free tier1,000-row sample, no login required

Qualitative comparison of the general approach; Mockaroo's exact features change over time — check their site for current details.

§ · When to use which

If you need an arbitrary schema — invoice numbers, product SKUs, free-text fields, custom column names in a shape you define — Mockaroo's flexibility is hard to beat, and its formula engine handles bespoke logic well. Reach for it when the shape of the data matters more than its statistical realism.

Reach for SimpleIDGen when you specifically need realistic people: demographics, geography, finances, and health basics that hold together under analysis. It's built for testing health and population software, seeding demos that survive a second glance, and training or benchmarking models that would otherwise learn from incoherent rows. No real PII ever enters the system — records are built from public reference distributions, not learned from real people — so the data is GDPR- and DPDP-safe for environments where production data can't be used. Inspect every field on the generator page →

§ · Frequently asked
Q1
Is SimpleIDGen really free?

Yes. The 1,000-row sample above needs no account. A free account generates up to 5,000 rows per day in CSV or JSONL. See the pricing page for the details.

Q2
How is this different from Mockaroo?

Mockaroo generates each column independently from field types you choose. SimpleIDGen samples whole people from distributions fitted to real US references, so attributes within a record are statistically consistent and cross-field invariants are enforced.

Q3
Can I define my own custom schema?

Not arbitrarily — that's where a general tool like Mockaroo shines. SimpleIDGen produces a fixed, calibrated person schema of 65 attributes. You pick how many rows and which format; the columns are designed to cohere as a population.

Q4
Is the output reproducible?

Yes. Generation is deterministic by seed — the same seed always yields the same people, so test fixtures and benchmarks stay stable across runs.

Q5
Is any of it real personal data?

No. Every record is synthetic, built from public reference distributions rather than learned from real records. There is no PII to protect, which keeps it GDPR- and DPDP-safe.