Synthetic Data Guides
A small, factual library on synthetic data: what the term means, how it differs from anonymized records, and how SimpleIDGen builds fake-but-calibrated person profiles from public US reference distributions. Each guide is short and plain-English; the generator and reference pages let you download data right away.
No signup for the sample. Generate your own — 5,000 rows/day free →
Start here if synthetic data is new to you. These guides cover the concept itself and the distinction people most often get wrong — synthetic versus anonymized.
| Guide | What it covers |
|---|---|
| What is synthetic data? → | A definition, how synthetic records are generated from reference distributions rather than copied from real people, and the common uses: testing, demos, machine-learning fixtures, and teaching. |
| Synthetic vs. anonymized data → | Why anonymization starts from real records — and carries some re-identification risk — while synthetic data contains no real individual at all. When each approach fits, and why synthetic is the safer default for shared environments. |
When you want the data itself, these pages produce or document it. Every record is entirely fake, deterministic by seed, and calibrated to published US references — NHANES 2017–2020, ACS 2022, CDC NDSS, KFF, and the US Census.
| Page | What it is |
|---|---|
| Person Profile generator → | The core product. 65 jointly-distributed attributes per record — identity, geography, finances, behavior, vitals, and health conditions — with cross-field invariants enforced. CSV or JSONL, instantly, no code or setup. |
| NHANES-calibrated data → | How the health attributes — A1c, BMI, blood pressure, and diabetes and hypertension prevalence — are fitted to NHANES by age and sex, with a fidelity report comparing the generated distributions against their targets. |
No real personal data ever enters the system. Records are built from published reference distributions, not learned from real individuals — which keeps the output GDPR- and DPDP-safe for environments where production data can't be used. They are simple, population-shaped tables — demographics, vitals, and conditions — not full longitudinal medical records, so they stay easy to load and reason about. The free tier covers most testing and demo work; see pricing for the daily limit.
The guides are plain-English explainers of the concepts. The generator and reference pages — the Person Profile generator and the NHANES reference — produce and document the actual data.
Yes. The 1,000-row sample above needs no account. A free account generates up to 5,000 rows per UTC day, in CSV or JSONL. See pricing for the details.
No. The calibration runs underneath: you choose a row count and a seed, then download a file. The guides explain the reasoning, and the introduction assumes no prior knowledge.
No, and the distinction matters. Anonymized data is derived from real records and can sometimes be re-identified; synthetic data contains no real person. The comparison guide walks through when each is appropriate.