Abstract
BACKGROUND: Generating synthetic patient data is crucial for medical research, but common approaches build up on black-box models which do not allow for expert verification or intervention. We propose a highly available method which enables synthetic data generation from real patient records in a privacy preserving and compliant fashion, is interpretable and allows for expert intervention.
METHODS: Our approach ties together two established tools in medical informatics, namely OMOP as a data standard for electronic health records and Synthea as a data synthetization method. For this study, data pipelines were built which extract data from OMOP, convert them into time series format, learn temporal rules by 2 statistical algorithms (Markov chain, TARM) and 3 algorithms of causal discovery (DYNOTEARS, J-PCMCI+, LiNGAM) and map the outputs into Synthea graphs. The graphs are evaluated quantitatively by their individual and relative complexity and qualitatively by medical experts.
RESULTS: The algorithms were found to learn qualitatively and quantitatively different graph representations. Whereas the Markov chain results in extremely large graphs, TARM, DYNOTEARS, and J-PCMCI+ were found to reduce the data dimension during learning. The MultiGroupDirect LiNGAM algorithm was found to not be applicable to the problem statement at hand.
CONCLUSION: Only TARM and DYNOTEARS are practical algorithms for real-world data in this use case. As causal discovery is a method to debias purely statistical relationships, the gradient-based causal discovery algorithm DYNOTEARS was found to be most suitable.
| Original language | English |
|---|---|
| Article number | 136 |
| Journal | BMC Medical Research Methodology |
| Volume | 24 |
| Issue number | 1 |
| Pages (from-to) | 136 |
| DOIs | |
| Publication status | Published - 22.06.2024 |
Funding
| Funders | Funder number |
|---|---|
| Ministry of Science, ICT and Future Planning | |
| Hessian Office of Health and Care | |
| British Institute of Persian Studies | |
| Universitätsklinikum Frankfurt | |
| Leibniz-Institute for Prevention Research and Epidemiology | |
| Deutsche Forschungsgemeinschaft | |
| University Hospital Hamburg-Eppendorf | |
| Deutsches Forschungszentrum für Künstliche Intelligenz | |
| German Ministry of Health | |
| Institut für Medizininformatik | 17475 |
Research Areas and Centers
- Research Area: Center for Population Medicine and Public Health (ZBV)
DFG Research Classification Scheme
- 2.22-02 Public Health, Healthcare Research, Social and Occupational Medicine