Statistical learning approaches in the genetic epidemiology of complex diseases

Anne Laure Boulesteix*, Marvin N. Wright, Sabine Hoffmann, Inke R. König

*Corresponding author for this work
3 Citations (Scopus)


In this paper, we give an overview of methodological issues related to the use of statistical learning approaches when analyzing high-dimensional genetic data. The focus is set on regression models and machine learning algorithms taking genetic variables as input and returning a classification or a prediction for the target variable of interest; for example, the present or future disease status, or the future course of a disease. After briefly explaining the basic motivation and principle of these methods, we review different procedures that can be used to evaluate the accuracy of the obtained models and discuss common flaws that may lead to over-optimistic conclusions with respect to their prediction performance and usefulness.

Original languageEnglish
JournalHuman Genetics
Issue number1
Pages (from-to)73-84
Number of pages12
Publication statusPublished - 01.01.2020

Research Areas and Centers

  • Academic Focus: Center for Brain, Behavior and Metabolism (CBBM)


Dive into the research topics of 'Statistical learning approaches in the genetic epidemiology of complex diseases'. Together they form a unique fingerprint.

Cite this