Synthetic Data Generator Engine v0.5

Person Profile (Advanced) — engine v0.5

Synthetic person records with 65 jointly-distributed attributes — demographics, health, behavioral, financial — calibrated against public US reference data.

Demographic-first generator. Returns synthetic person records with 65 jointly-distributed attributes across 9 domains (identity, geography, social, financial, behavioral, health basics, health conditions, healthcare utilization, medications). Each marginal distribution cites a public source (ACS 2022, NHANES 2017-2020, CDC NDSS, KFF 2023, MEPS 2022, BLS 2023, USPS L005 2024). Cross-field invariants are enforced: BMI = weight/(height/100)², ZIP matches state per USPS SCF ranges, insulin only fires for diabetics. Deterministic by seed. Three locales (en-US full fidelity; en-GB / en-IN identity-native with en-US health fallback, disclosed via locale_data_source). Cohort engines (T2DM, Rx pharmacy, future) wrap this generator with longitudinal records — they're separately versioned.

Parameters
NameTypeRequiredDefaultDescription
count integer optional 1 Number of person records to return. Range: 1–100.
seed integer optional (time-derived, non-deterministic) RNG seed for reproducibility. Same seed + same params = byte-identical records.
locale string optional en-US Locale: en-US, en-GB, en-IN. Health attributes use en-US fallback for en-GB / en-IN (Phase 2 will add native data).
idFormat string optional ulid ID format: ulid, uuidv7, uuid, nanoid, cuid2.
Example output
{
  "id": "64PG6RYQXXD7XFEKZJ6AW616M7",
  "given_name": "Elizabeth", "family_name": "Robinson",
  "age": 31, "sex_at_birth": "female",
  "race": "white", "ethnicity": "hispanic",
  "locale": "en-US", "country": "US", "state": "IL", "urbanicity": "suburban",
  "education": "some_college", "insurance_type": "marketplace",
  "height_cm": 171.8, "weight_kg": 76.4, "bmi": 25.9, "waist_circumference_cm": 89.1,
  "diabetes_status": "diagnosed_t2dm", "family_history_diabetes": true,
  "visits_past_year": 7, "number_of_prescriptions": 1, "on_insulin": false
  // ... 49 more attributes
}
API call
curl -s 'https://api.simpleidgen.com/v1/mock/person'
const res = await fetch('https://api.simpleidgen.com/v1/mock/person');
const data = await res.json();
console.log(data.data);
import requests
resp = requests.get('https://api.simpleidgen.com/v1/mock/person')
print(resp.json())
Try it live

  
Endpoint
POST /v1/mock/person

Multiple datasets — 10 × 200K records

Variance evidence: 10 independent regenerations, 45 pairwise comparisons. Each ~200K-row dataset is generated with a different base seed.

Open in new tab ↗

Single large dataset — 1 × 2M records

Scale evidence: 2M-row dataset generated in ~60s via the async /v1/datasets/person endpoint. JSONL streamed to S3 via multipart upload.

Open in new tab ↗