Zur Hauptnavigation wechseln Zur Suche wechseln Zum Hauptinhalt wechseln

Risk estimation using probability machines

Abhijit Dasgupta*, Silke Szymczak, Jason H. Moore, Joan E. Bailey-Wilson, James D. Malley

*Korrespondierende/r Autor/-in für diese Arbeit

Abstract

Background: Logistic regression has been the de facto, and often the only, model used in the description and analysis of relationships between a binary outcome and observed features. It is widely used to obtain the conditional probabilities of the outcome given predictors, as well as predictor effect size estimates using conditional odds ratios. Results: We show how statistical learning machines for binary outcomes, provably consistent for the nonparametric regression problem, can be used to provide both consistent conditional probability estimation and conditional effect size estimates. Effect size estimates from learning machines leverage our understanding of counterfactual arguments central to the interpretation of such estimates. We show that, if the data generating model is logistic, we can recover accurate probability predictions and effect size estimates with nearly the same efficiency as a correct logistic model, both for main effects and interactions. We also propose a method using learning machines to scan for possible interaction effects quickly and efficiently. Simulations using random forest probability machines are presented. Conclusions: The models we propose make no assumptions about the data structure, and capture the patterns in the data by just specifying the predictors involved and not any particular model structure. So they do not run the same risks of model mis-specification and the resultant estimation biases as a logistic model. This methodology, which we call a "risk machine", will share properties from the statistical machine that it is derived from.

OriginalspracheEnglisch
Aufsatznummer2
ZeitschriftBioData Mining
Jahrgang7
Ausgabenummer1
DOIs
PublikationsstatusVeröffentlicht - 01.03.2014

UN SDGs

Dieser Output leistet einen Beitrag zu folgendem(n) Ziel(en) für nachhaltige Entwicklung

  1. SDG 3 – Gesundheit und Wohlergehen
    SDG 3 – Gesundheit und Wohlergehen

Fingerprint

Untersuchen Sie die Forschungsthemen von „Risk estimation using probability machines“. Zusammen bilden sie einen einzigartigen Fingerprint.

Zitieren