§ · Synthetic Patient Data

Free Synthetic Patient Data — as a Flat CSV

Realistic but entirely fake patient records: demographics, vitals — A1c, BMI, blood pressure, height, weight, waist — and the prevalence of common conditions like diabetes and hypertension, delivered as a plain CSV or JSONL — one flat row per patient, no Java or longitudinal EHR to parse (FHIR R4 is available too, opt-in). Just the columns most testing, demo, and ML work actually needs.

Free 1,000-row sample — on the Person generator →

No signup for the sample. Generate your own — 5,000 rows/day free →

§ · The gap Synthea leaves

If you need a full longitudinal patient history — encounters, claims, medication timelines — in FHIR or C-CDA, Synthea (from MITRE) is the right tool. It is clinically rich, open source, and well established. The cost is weight: you run Java, generate bundles of resources, and then flatten them into something tabular before most analytics or test fixtures can use them.

A great deal of work doesn't need any of that. A QA seed for a patient table, a demo dataset for a dashboard, or a feature matrix for an early model usually wants one row per patient with demographics, a few vitals, and a condition flag or two. SimpleIDGen produces exactly that — a calibrated, flat patient table — with no clinical-record tooling in the way.

§ · What's in each patient record

Every field is jointly calibrated to public US references, so a record's age, vitals, and conditions hang together the way they do in a real population — not as independent random columns.

Patient field group	Examples & calibration source
Demographics	Age, sex, race/ethnicity — ACS 2022 · US Census 2020
Geography	State and ZIP, with ZIP constrained to match state
Vitals & body measures	A1c, BMI, height, weight, waist — NHANES 2017–2020
Blood pressure	Systolic / diastolic — NHANES 2017–2020
Conditions	Diabetes, hypertension, CKD — CDC NDSS 2022 + NHANES, by age & sex
Insurance type	Coverage category — KFF 2023

Cross-field invariants are enforced: BMI equals weight / (height/100)², insulin appears only for diagnosed diabetics, and ZIP matches state. Generation is deterministic by seed — the same seed yields the same patients, so your fixtures are reproducible. For the full list of 69 attributes and a published fidelity report, see the Person Profile generator →

§ · Why "calibrated" matters here

Plenty of tools can emit fake patient rows, and most general-purpose generators produce each column independently — a random age, a random A1c, a random condition flag — so a 24-year-old can land with stage-3 CKD and an A1c of 11. That noise quietly breaks anything downstream that assumes the data behaves like people: prevalence dashboards, risk models, demo charts.

SimpleIDGen fits each attribute's distribution to the published reference for the matching age and sex, then keeps the fields consistent with one another. The result reads like a real US patient population without containing a single real patient. The same calibration powers the NHANES-calibrated dataset →

§ · Frequently asked

Is this real patient data?

No. Every record is synthetically generated from published reference distributions (NHANES, ACS, CDC NDSS, KFF). No real person's data ever enters the system, and no real patient's record is reproduced.

Can I use it for HIPAA-regulated or GDPR-bound work?

The data is synthetic by construction — there is no protected health information and no personal data, because nothing is learned from real records. That makes it well suited to environments where production patient data can't be used. Consult your compliance officer for your specific case.

How is it different from Synthea?

Synthea generates full longitudinal EHRs in FHIR/C-CDA and needs Java to run. SimpleIDGen gives you a flat, one-row-per-patient table of demographics, vitals, and condition prevalences as CSV or JSONL — instantly, with no setup. Use Synthea when you need clinical histories; use this when you need a simple calibrated patient table.

What format is it in, and what does it cost?

CSV or JSONL — one record per line, ready to load anywhere. The 1,000-row sample above needs no account. A free account generates up to 5,000 rows per UTC day; see pricing for the details.