Synthetic T2DM Patient Data Generator
Synthetic type-2-diabetes patient records — biologically staged, with calibrated A1c, complications (CKD, neuropathy, retinopathy, nephropathy), insulin use and mortality. A severity dial from newly-diagnosed to advanced.
T2DM cohort generator built on the same engine. Returns synthetic diabetic patient records — the full 68-attribute person plus the T2DM staging block (biological & clinical stage, detection lag, A1c, microvascular complications, insulin). The diabetic cohort is shifted to the real US diabetic population (older, ~85% diagnosed / 15% undiagnosed). A `stage` dial (1–5) concentrates severity from newly-diagnosed to advanced; complications, A1c, insulin and mortality all rise monotonically with stage. Calibrated to CDC NDSS, ADA Standards of Care, USRDS and NHANES. Deterministic by seed. Async bulk generation: submit a job, then download JSONL / CSV / FHIR R4.
No signup required. Calibrated to CDC NDSS & ADA Standards of Care — built on the person engine →
Need more than the free tier — a bigger one-off dataset, or a specific stage mix? Email me — free, by request.
Parameters
| Name | Type | Req | Default | Description |
|---|---|---|---|---|
clientId |
string |
required | — | Your account's client ID (from /v1/auth/register or /v1/auth/me). |
count |
integer |
optional | 10000 |
Number of diabetic patient records to generate. Range: 1–1,000,000 (free tier: 5,000 rows per UTC day). |
seed |
integer |
optional | (derived from job_id) |
RNG seed for reproducibility. Same seed + count + stage = byte-identical records. |
stage |
integer |
optional | (natural severity mix) |
T2DM severity dial, 1–5 (1 = newly-diagnosed, 5 = advanced). Concentrates A1c, complications, insulin use and mortality at the chosen stage. Omit for the natural severity spread. |
cohort |
string |
optional | diabetic |
"diabetic" (default) forces a real diabetic-population cohort (older, ~85% diagnosed / 15% undiagnosed). "general" returns the general adult population with the T2DM staging block attached to its diabetics. |
formats |
array |
optional | — | Extra output formats. jsonl + csv are always produced; pass ["fhir"] to also emit FHIR R4 bulk NDJSON (Patient / Condition / Observation / MedicationRequest / Coverage / Encounter). |
Example record
{
"id": "7Z9K2M4QXB8D6FJ3PNR5VWT1AC",
"given_name": "Robert", "family_name": "Hayes",
"age": 67, "sex_at_birth": "male",
"race": "white", "ethnicity": "non_hispanic",
"state": "OH", "urbanicity": "suburban", "insurance_type": "medicare",
"height_cm": 177.4, "weight_kg": 99.8, "bmi": 31.7, "waist_circumference_cm": 113.2,
"diabetes_status": "diagnosed_t2dm", "a1c_value": 7.4, "diabetes_duration_years": 8.0,
"on_insulin": false, "number_of_prescriptions": 6,
"hypertension_status": "diagnosed", "systolic_bp_avg": 134.0, "diastolic_bp_avg": 81.3,
"ckd_status": "stage_1_2", "hyperlipidemia_status": "diagnosed",
"t2dm_biological_stage": 3, "t2dm_clinical_stage": 2, "t2dm_detection_lag_years": 4.5,
"t2dm_neuropathy": true, "t2dm_retinopathy": false, "t2dm_nephropathy": false
// ... 40+ more attributes
}
Call it
# 1. Register once — returns your clientId and sets a session cookie
curl -sS -c cookies.txt -X POST https://api.simpleidgen.com/v1/auth/register \
-H 'Content-Type: application/json' \
-d '{"name":"You","email":"you@company.com","password":"your-password"}'
# 2. Submit a T2DM job — stage 3 concentrates mid-severity diabetics
curl -sS -b cookies.txt -X POST https://api.simpleidgen.com/v1/datasets/t2dm \
-H 'Content-Type: application/json' \
-d '{"clientId":"<your client id>","count":100000,"seed":42,"stage":3}'
# 3. Poll status, then download the JSONL once completed
curl -sS -b cookies.txt https://api.simpleidgen.com/v1/datasets/<job_id>// After registering or logging in (session cookie set), submit a T2DM job:
const res = await fetch('https://api.simpleidgen.com/v1/datasets/t2dm', {
method: 'POST',
credentials: 'include',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ clientId: '<your client id>', count: 100000, seed: 42, stage: 3 }),
});
const { jobId, statusUrl } = await res.json();import requests
s = requests.Session()
s.post('https://api.simpleidgen.com/v1/auth/login', json={'email': 'you@company.com', 'password': '...'})
job = s.post('https://api.simpleidgen.com/v1/datasets/t2dm', json={'clientId': '<your client id>', 'count': 100000, 'seed': 42, 'stage': 3}).json()
print(job['jobId'], job['statusUrl'])Generation requires a free account — it takes about 10 seconds and gives you a client ID and an API session.
Create a free accountAlready have one? Log in.
You're signed in. Generate datasets and download CSV or JSONL from your profile.
Go to your profile/v1/datasets/t2dm
Async — submit a job, poll /v1/datasets/{job_id}, then download JSONL / CSV / FHIR.
Data Quality Validation vs CDC NDSS / ADA / USRDS / NHANES — GOLD
An independent data-quality validation of the synthetic T2DM cohort against CDC NDSS, ADA Standards of Care, USRDS and real NHANES microdata. It verifies the severity dial (every clinical axis rises monotonically with stage), patient-level consistency across the dial, and diabetic-cohort complication co-occurrence against specific cited benchmarks. Verdict: GOLD — engine v1.0.46; generator logic byte-identical through the current build.
Severity dial — ?stage=1..5
Generate the cohort at each stage (1 = newly-diagnosed, 5 = advanced). Every clinical axis rises monotonically — A1c, complications, insulin use and mortality track real diabetes progression.
| Axis | s1 | s2 | s3 | s4 | s5 | mono |
|---|---|---|---|---|---|---|
| biological stage (mean) | 1.45 | 2.23 | 2.82 | 3.32 | 4.04 | ✓ |
| A1c (mean %) | 6.99 | 8.01 | 8.78 | 9.23 | 9.35 | ✓ |
| CKD — any (%) | 37.6 | 52.3 | 63.3 | 72.0 | 82.2 | ✓ |
| neuropathy (%) | 22.1 | 33.7 | 42.8 | 51.2 | 62.4 | ✓ |
| retinopathy (%) | 22.5 | 32.6 | 42.0 | 49.5 | 60.1 | ✓ |
| nephropathy (%) | 20.1 | 29.9 | 37.5 | 45.0 | 57.0 | ✓ |
| on insulin (%) | 17.8 | 26.6 | 35.9 | 44.9 | 56.9 | ✓ |
| mortality (%) | 8.7 | 10.3 | 11.9 | 14.1 | 17.0 | ✓ |
| prescriptions (mean) | 6.1 | 6.7 | 7.2 | 7.7 | 8.3 | ✓ |
Patient-pinned: the same 5,000 patients regenerated across stages stay byte-identical in identity (id / name / DOB / sex), with 0 / 5,000 per-row stage regressions and clinical stage ≤ biological stage at every stage — the dial moves disease severity, not the person.
Diabetic-cohort co-occurrence (natural mix, n=20,000)
Complication and comorbidity prevalence in the natural diabetic cohort, each against a specific cited US benchmark (±4 pp tolerance on the prevalence anchors).
| Measure | Synthetic | Benchmark | Source | |
|---|---|---|---|---|
| Mean A1c (diabetics) | 7.26 | 7.1–7.31% | Kim, Diabetes Care 2021; Inoue, JAMA 2025 | ✓ |
| A1c <7% / >9% | 56.1 / 12.2 | 50–57% / ~13% | Kim, Diabetes Care 2021 (NHANES) | ✓ |
| CKD — any | 40.7% | 37–41% | CDC 2024 | ✓ |
| Retinopathy (≥40) | 25.4% | 26.4–28.5% | NHANES ’05–08; CDC VEHSS 2021 | ✓ |
| Peripheral neuropathy | 24.9% | 27–28.5% | Gregg, Diabetes Care 2004 (NHANES) | ✓ |
| Nephropathy | 22.2% | 24.2% | meta-analysis, PMC11419527 | ✓ |
| On insulin (diabetics) | 20.9% | 24–28% | NIDDK, Diabetes in America | ✓ |
| Detection lag (to diagnosis) | 4.8 yr | 4–7 yr | Harris Diab. Care 1992; UKPDS | ✓ |
| Hypertension | 80.5% | 67.6% | NHANES ’17–18, PMC7803033 | WIP |
WIP — Hypertension runs above the cited NHANES figure: the synthetic cohort flags HTN at the ACC/AHA ≥130/80 threshold (which lands its general-population rate at a validated 49.2% vs the 47.7% ≥130/80 benchmark), while the cited 67.6% diabetic figure uses a stricter threshold — so the two are not directly comparable. BP-threshold alignment is queued for the next engine version.
CKD stage → mortality
Mortality rises with kidney-disease stage within the diabetic cohort — the dose-response a real dataset shows.
| CKD stage | none | stage 1–2 | stage 3–4 | stage 5 |
|---|---|---|---|---|
| mortality (%) | 6.34 | 12.74 | 16.27 | 16.91 |
2.67× higher mortality at stage 5 vs no CKD — monotone, clinically coherent.
Method: an independent biostatistician-grade validation of the diabetic cohort + the 1→5 stage dial (200K main cohort, 20K natural diabetic, 5K×5 stage cohorts) against CDC NDSS, ADA Standards of Care, USRDS and NHANES. Full report retained in the project’s validation records.