§ · Calibrated Reference

NHANES-Calibrated Synthetic Data

Synthetic person records whose health distributions are fitted to the CDC's NHANES 2017–2020 cycle — A1c, BMI, blood pressure, and diabetes & hypertension prevalence by age and sex — so your test data behaves like a real US population without containing a single real person.

Free 1,000-row sample — on the Person generator →

No signup. Generate your own — 5,000 rows/day free →

Need a bigger cut or a custom population — an age range or a condition mix? Email me — I'll generate it for you, free.

§ · What "NHANES-calibrated" means

Each health attribute is fitted to published NHANES 2017–2020 estimates (and CDC NDSS for diabetes) — and drawn conditioned on the rest of the record, not in isolation. A 55-year-old man's A1c, BMI, and blood pressure come from the distributions NHANES measured for that age and sex, and they move together: blood pressure tracks his hypertension, A1c tracks his diabetes and BMI. Cross-field invariants hold: BMI = weight / (height/100)², insulin only appears for diagnosed diabetics, ZIP matches state.

It is not real NHANES data. NHANES supplies the target distributions; SimpleIDGen generates fresh synthetic people that match them. No NHANES respondent's record is ever reproduced — so there is no protected health information to safeguard.

§ · Calibrated health attributes

Pull synthetic A1c data, synthetic blood-pressure data, BMI, and condition prevalence (diabetes, hypertension, CKD) — each fitted to NHANES by age and sex, and drawn so they cohere across the record rather than independently.

Attribute	Calibrated to
A1c (HbA1c)	NHANES 2017–2020 glycohemoglobin, by age × sex
BMI · height · weight · waist	NHANES 2017–2020 body measures
Systolic / diastolic blood pressure	NHANES 2017–2020 blood pressure
Diabetes status & prevalence	CDC NDSS 2022 + NHANES 2017–2020
Hypertension	NHANES 2017–2020
Demographics (age, sex, race, geography)	ACS 2022 · US Census 2020
Insurance type	KFF 2023

Want the evidence? The Person Profile page shows an independent uniqueness & randomness audit and a full validation against real NHANES microdata on every build — with a machine-readable engine self-report at /engine-report.json.

§ · Frequently asked

Is this real NHANES data?

No. It is entirely synthetic. NHANES provides the target distributions; we generate fake people that match them. No real respondent's record is reproduced — so there is no PII to protect.

How is it calibrated?

Attributes are generated using dependency-ordered conditional sampling — each is drawn conditioned on those before it — so marginals match NHANES 2017–2020 (and CDC NDSS) by age and sex and attributes co-vary (blood pressure tracks hypertension, A1c tracks diabetes & BMI). Cross-field invariants hold (BMI, insulin↔diabetes, ZIP↔state). Deterministic by seed. See the validation vs real NHANES report →

Is it safe for testing, ML, and demos?

Yes. No real PII ever enters the system — the data is built from public reference distributions, not learned from real records. That makes it GDPR- and DPDP-safe for environments where production data can't be used.

What does it cost?

Free. The 1,000-row sample above needs no account; a free account generates up to 5,000 rows per day in CSV or JSONL.