Free Synthetic Patient Data — as a Flat CSV
Realistic but entirely fake patient records: demographics, vitals — A1c, BMI, blood pressure, height, weight, waist — and the prevalence of common conditions like diabetes and hypertension, delivered as a plain CSV or JSONL. No FHIR, no Java, no longitudinal EHR to parse. Just the columns most testing, demo, and ML work actually needs.
No signup for the sample. Generate your own — 5,000 rows/day free →
If you need a full longitudinal patient history — encounters, claims, medication timelines — in FHIR or C-CDA, Synthea (from MITRE) is the right tool. It is clinically rich, open source, and well established. The cost is weight: you run Java, generate bundles of resources, and then flatten them into something tabular before most analytics or test fixtures can use them.
A great deal of work doesn't need any of that. A QA seed for a patient table, a demo dataset for a dashboard, or a feature matrix for an early model usually wants one row per patient with demographics, a few vitals, and a condition flag or two. SimpleIDGen produces exactly that — a calibrated, flat patient table — with no clinical-record tooling in the way.
Every field is jointly calibrated to public US references, so a record's age, vitals, and conditions hang together the way they do in a real population — not as independent random columns.
| Patient field group | Examples & calibration source |
|---|---|
| Demographics | Age, sex, race/ethnicity — ACS 2022 · US Census 2020 |
| Geography | State and ZIP, with ZIP constrained to match state |
| Vitals & body measures | A1c, BMI, height, weight, waist — NHANES 2017–2020 |
| Blood pressure | Systolic / diastolic — NHANES 2017–2020 |
| Conditions | Diabetes, hypertension, CKD — CDC NDSS 2022 + NHANES, by age & sex |
| Insurance type | Coverage category — KFF 2023 |
Cross-field invariants are enforced: BMI equals weight / (height/100)², insulin appears only for diagnosed diabetics, and ZIP matches state. Generation is deterministic by seed — the same seed yields the same patients, so your fixtures are reproducible. For the full list of 65 attributes and a published fidelity report, see the Person Profile generator →
Plenty of tools can emit fake patient rows, and most general-purpose generators produce each column independently — a random age, a random A1c, a random condition flag — so a 24-year-old can land with stage-3 CKD and an A1c of 11. That noise quietly breaks anything downstream that assumes the data behaves like people: prevalence dashboards, risk models, demo charts.
SimpleIDGen fits each attribute's distribution to the published reference for the matching age and sex, then keeps the fields consistent with one another. The result reads like a real US patient population without containing a single real patient. The same calibration powers the NHANES-calibrated dataset →
No. Every record is synthetically generated from published reference distributions (NHANES, ACS, CDC NDSS, KFF). No real person's data ever enters the system, and no real patient's record is reproduced.
The data is synthetic by construction — there is no protected health information and no personal data, because nothing is learned from real records. That makes it well suited to environments where production patient data can't be used. Consult your compliance officer for your specific case.
Synthea generates full longitudinal EHRs in FHIR/C-CDA and needs Java to run. SimpleIDGen gives you a flat, one-row-per-patient table of demographics, vitals, and condition prevalences as CSV or JSONL — instantly, with no setup. Use Synthea when you need clinical histories; use this when you need a simple calibrated patient table.
CSV or JSONL — one record per line, ready to load anywhere. The 1,000-row sample above needs no account. A free account generates up to 5,000 rows per UTC day; see pricing for the details.