SimpleIDGen
§ · Reference Library

Synthetic Data Guides

A small, factual library on synthetic data: what the term means, how it differs from anonymized records, and how SimpleIDGen builds fake-but-calibrated person profiles from public US reference distributions. Each guide is short and plain-English; the generator and reference pages let you download data right away.

No signup for the sample. Generate your own — 5,000 rows/day free →

§ · Explainers

Start here if synthetic data is new to you. These guides cover the concept itself and the distinction people most often get wrong — synthetic versus anonymized.

GuideWhat it covers
What is synthetic data? → A definition, how synthetic records are generated from reference distributions rather than copied from real people, and the common uses: testing, demos, machine-learning fixtures, and teaching.
Synthetic vs. anonymized data → Why anonymization starts from real records — and carries some re-identification risk — while synthetic data contains no real individual at all. When each approach fits, and why synthetic is the safer default for shared environments.
§ · Generators & reference

When you want the data itself, these pages produce or document it. Every record is entirely fake, deterministic by seed, and calibrated to published US references — NHANES 2017–2020, ACS 2022, CDC NDSS, KFF, and the US Census.

PageWhat it is
Person Profile generator → The core product. 65 jointly-distributed attributes per record — identity, geography, finances, behavior, vitals, and health conditions — with cross-field invariants enforced. CSV or JSONL, instantly, no code or setup.
NHANES-calibrated data → How the health attributes — A1c, BMI, blood pressure, and diabetes and hypertension prevalence — are fitted to NHANES by age and sex, with a fidelity report comparing the generated distributions against their targets.

No real personal data ever enters the system. Records are built from published reference distributions, not learned from real individuals — which keeps the output GDPR- and DPDP-safe for environments where production data can't be used. They are simple, population-shaped tables — demographics, vitals, and conditions — not full longitudinal medical records, so they stay easy to load and reason about. The free tier covers most testing and demo work; see pricing for the daily limit.

§ · Frequently asked
Q1
What is the difference between the guides and the generator pages?

The guides are plain-English explainers of the concepts. The generator and reference pages — the Person Profile generator and the NHANES reference — produce and document the actual data.

Q2
Is the data free?

Yes. The 1,000-row sample above needs no account. A free account generates up to 5,000 rows per UTC day, in CSV or JSONL. See pricing for the details.

Q3
Do I need a statistics background to use these?

No. The calibration runs underneath: you choose a row count and a seed, then download a file. The guides explain the reasoning, and the introduction assumes no prior knowledge.

Q4
Is synthetic data the same as anonymized data?

No, and the distinction matters. Anonymized data is derived from real records and can sometimes be re-identified; synthetic data contains no real person. The comparison guide walks through when each is appropriate.