Back to browse
Synthea Fhir Data in BigQuery

Synthea Fhir Data in BigQuery

by brady_bastian·Mar 16, 2026·2 points·0 comments

AI Analysis

●●SolidNiche GemSolve My Problem

90x query cost reduction by flattening 459 nested FHIR fields to 15 columns.

Strengths
  • Wire-level FHIR normalization saves 90x on query costs with pre-extracted IDs
  • Column descriptions sourced from FHIR R4 OpenAPI spec, not hand-written
  • Weekly updates keep synthetic patient data fresh for testing
Weaknesses
  • BigQuery-only limits adoption for teams on Snowflake or Redshift
  • Synthetic data means edge cases may not match real EHR complexity
Category
Target Audience

Healthcare developers, FHIR engineers, data analysts

Similar To

CMS SynPUF · MIMIC-III · OHDSI datasets

Post Description

We generated ~1,100 synthetic patients with Synthea, processed the FHIR R4 output through our normalization engine (Forge), and published it as a free public dataset on BigQuery Analytics Hub.

8 resource types: Patient, Encounter, Observation, Condition, Procedure, Immunization, MedicationRequest, DiagnosticReport.

The raw Synthea output has 459 nested fields per resource, urn:uuid: references, and no column descriptions. We flatten it to clean views with ~15 columns each, pre-extracted IDs, and descriptions sourced from the FHIR R4 OpenAPI spec. Example:

-- Raw FHIR: SELECT id, code.text FROM diagnostic_report WHERE subject.reference = CONCAT("urn:uuid:", patient_id) -- Forge view: SELECT report_name, patient_id FROM v_diagnostic_report Data scanned per query drops ~90x (450 MB → 5 MB).

Free to subscribe: https://console.cloud.google.com/bigquery/analytics-hub/exch...

Updated weekly. Useful if you're building anything against FHIR data and want a realistic test dataset without standing up your own Synthea pipeline.

Happy to answer questions about the normalization approach or FHIR data modeling tradeoffs.

Similar Projects

Health●●Solid

Avalon - Synthetic FHIR R4 patient data as OMOP CDM 5.4 views

Free OMOP CDM views on BigQuery when healthcare researchers need PHI-free test data.

Solve My ProblemBig Brain
brady_bastian
102mo ago