Abstract

BACKGROUND: Generating synthetic patient data is crucial for medical research, but common approaches build up on black-box models which do not allow for expert verification or intervention. We propose a highly available method which enables synthetic data generation from real patient records in a privacy preserving and compliant fashion, is interpretable and allows for expert intervention.

METHODS: Our approach ties together two established tools in medical informatics, namely OMOP as a data standard for electronic health records and Synthea as a data synthetization method. For this study, data pipelines were built which extract data from OMOP, convert them into time series format, learn temporal rules by 2 statistical algorithms (Markov chain, TARM) and 3 algorithms of causal discovery (DYNOTEARS, J-PCMCI+, LiNGAM) and map the outputs into Synthea graphs. The graphs are evaluated quantitatively by their individual and relative complexity and qualitatively by medical experts.

RESULTS: The algorithms were found to learn qualitatively and quantitatively different graph representations. Whereas the Markov chain results in extremely large graphs, TARM, DYNOTEARS, and J-PCMCI+ were found to reduce the data dimension during learning. The MultiGroupDirect LiNGAM algorithm was found to not be applicable to the problem statement at hand.

CONCLUSION: Only TARM and DYNOTEARS are practical algorithms for real-world data in this use case. As causal discovery is a method to debias purely statistical relationships, the gradient-based causal discovery algorithm DYNOTEARS was found to be most suitable.

OriginalspracheEnglisch
Aufsatznummer136
ZeitschriftBMC Medical Research Methodology
Jahrgang24
Ausgabenummer1
Seiten (von - bis)136
DOIs
PublikationsstatusVeröffentlicht - 22.06.2024

Strategische Forschungsbereiche und Zentren

  • Profilbereich: Zentrum für Bevölkerungsmedizin und Versorgungsforschung (ZBV)

DFG-Fachsystematik

  • 2.22-02 Public Health, gesundheitsbezogene Versorgungsforschung, Sozial- und Arbeitsmedizin

Fingerprint

Untersuchen Sie die Forschungsthemen von „Learning debiased graph representations from the OMOP common data model for synthetic data generation“. Zusammen bilden sie einen einzigartigen Fingerprint.

Zitieren