TY - JOUR
T1 - Population Bias in Polygenic Risk Prediction Models for Coronary Artery Disease
AU - Gola, Damian
AU - Erdmann, Jeanette
AU - Läll, Kristi
AU - Mägi, Reedik
AU - Müller-Myhsok, Bertram
AU - Schunkert, Heribert
AU - König, Inke R.
N1 - Publisher Copyright:
© 2020 Cambridge University Press. All rights reserved.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2020
Y1 - 2020
N2 - Background: Individual risk prediction based on genome-wide polygenic risk scores (PRSs) using millions of genetic variants has attracted much attention. It is under debate whether PRS models can be applied-without loss of precision-to populations of similar ethnic but different geographic background than the one the scores were trained on. Here, we examine how PRS trained in population-specific but European data sets perform in other European subpopulations in distinguishing between coronary artery disease patients and healthy individuals. Methods: We use data from UK and Estonian biobanks (UKB, EB) as well as case-control data from the German population (DE) to develop and evaluate PRS in the same and different populations. Results: PRSs have the highest performance in their corresponding population testing data sets, whereas their performance significantly drops if applied to testing data sets from different European populations. Models trained on DE data revealed area under the curves in independent testing sets in DE: 0.6752, EB: 0.6156, and UKB: 0.5989; trained on EB and tested on EB: 0.6565, DE: 0.5407, and UKB: 0.6043; trained on UKB and tested on UKB: 0.6133, DE: 0.5143, and EB: 0.6049. Conclusions: This result has a direct impact on the clinical usability of PRS for risk prediction models using PRS: A population effect must be kept in mind when applying risk estimation models, which are based on additional genetic information even for individuals from different European populations of the same ethnicity.
AB - Background: Individual risk prediction based on genome-wide polygenic risk scores (PRSs) using millions of genetic variants has attracted much attention. It is under debate whether PRS models can be applied-without loss of precision-to populations of similar ethnic but different geographic background than the one the scores were trained on. Here, we examine how PRS trained in population-specific but European data sets perform in other European subpopulations in distinguishing between coronary artery disease patients and healthy individuals. Methods: We use data from UK and Estonian biobanks (UKB, EB) as well as case-control data from the German population (DE) to develop and evaluate PRS in the same and different populations. Results: PRSs have the highest performance in their corresponding population testing data sets, whereas their performance significantly drops if applied to testing data sets from different European populations. Models trained on DE data revealed area under the curves in independent testing sets in DE: 0.6752, EB: 0.6156, and UKB: 0.5989; trained on EB and tested on EB: 0.6565, DE: 0.5407, and UKB: 0.6043; trained on UKB and tested on UKB: 0.6133, DE: 0.5143, and EB: 0.6049. Conclusions: This result has a direct impact on the clinical usability of PRS for risk prediction models using PRS: A population effect must be kept in mind when applying risk estimation models, which are based on additional genetic information even for individuals from different European populations of the same ethnicity.
UR - http://www.scopus.com/inward/record.url?scp=85097963821&partnerID=8YFLogxK
U2 - 10.1161/CIRCGEN.120.002932
DO - 10.1161/CIRCGEN.120.002932
M3 - Journal articles
C2 - 33170024
AN - SCOPUS:85097963821
SN - 2574-8300
SP - 569
EP - 575
JO - Circulation: Genomic and Precision Medicine
JF - Circulation: Genomic and Precision Medicine
M1 - 002932
ER -