TY - JOUR
T1 - Survival analysis for lung cancer patients
T2 - A comparison of Cox regression and machine learning models
AU - Germer, Sebastian
AU - Rudolph, Christiane
AU - Labohm, Louisa
AU - Katalinic, Alexander
AU - Rath, Natalie
AU - Rausch, Katharina
AU - Holleczek, Bernd
AU - Handels, Heinz
AU - AI-CARE Working Group
N1 - Copyright © 2024. Published by Elsevier B.V.
PY - 2024/8/26
Y1 - 2024/8/26
N2 - INTRODUCTION: Survival analysis based on cancer registry data is of paramount importance for monitoring the effectiveness of health care. As new methods arise, the compendium of statistical tools applicable to cancer registry data grows. In recent years, machine learning approaches for survival analysis were developed. The aim of this study is to compare the model performance of the well established Cox regression and novel machine learning approaches on a previously unused dataset.MATERIAL AND METHODS: The study is based on lung cancer data from the Schleswig-Holstein Cancer Registry. Four survival analysis models are compared: Cox Proportional Hazard Regression (CoxPH) as the most commonly used statistical model, as well as Random Survival Forests (RSF) and two neural network architectures based on the DeepSurv and TabNet approaches. The models are evaluated using the concordance index (C-I), the Brier score and the AUC-ROC score. In addition, to gain more insight in the decision process of the models, we identified the features that have an higher impact on patient survival using permutation feature importance scores and SHAP values.RESULTS: Using a dataset including the cancer stage established by the Union for International Cancer Control (UICC), the best performing model is the CoxPH (C-I: 0.698±0.005), while using a dataset which includes the tumor size, lymph node and metastasis status (TNM) leads to the RSF as best performing model (C-I: 0.703±0.004). The explainability metrics show that the models rely on the combined UICC stage and the metastasis status in the first place, which corresponds to other studies.DISCUSSION: The studied methods are highly relevant for epidemiological researchers to create more accurate survival models, which can help physicians make informed decisions about appropriate therapies and management of patients with lung cancer, ultimately improving survival and quality of life.
AB - INTRODUCTION: Survival analysis based on cancer registry data is of paramount importance for monitoring the effectiveness of health care. As new methods arise, the compendium of statistical tools applicable to cancer registry data grows. In recent years, machine learning approaches for survival analysis were developed. The aim of this study is to compare the model performance of the well established Cox regression and novel machine learning approaches on a previously unused dataset.MATERIAL AND METHODS: The study is based on lung cancer data from the Schleswig-Holstein Cancer Registry. Four survival analysis models are compared: Cox Proportional Hazard Regression (CoxPH) as the most commonly used statistical model, as well as Random Survival Forests (RSF) and two neural network architectures based on the DeepSurv and TabNet approaches. The models are evaluated using the concordance index (C-I), the Brier score and the AUC-ROC score. In addition, to gain more insight in the decision process of the models, we identified the features that have an higher impact on patient survival using permutation feature importance scores and SHAP values.RESULTS: Using a dataset including the cancer stage established by the Union for International Cancer Control (UICC), the best performing model is the CoxPH (C-I: 0.698±0.005), while using a dataset which includes the tumor size, lymph node and metastasis status (TNM) leads to the RSF as best performing model (C-I: 0.703±0.004). The explainability metrics show that the models rely on the combined UICC stage and the metastasis status in the first place, which corresponds to other studies.DISCUSSION: The studied methods are highly relevant for epidemiological researchers to create more accurate survival models, which can help physicians make informed decisions about appropriate therapies and management of patients with lung cancer, ultimately improving survival and quality of life.
UR - http://www.scopus.com/inward/record.url?scp=85202302909&partnerID=8YFLogxK
U2 - 10.1016/j.ijmedinf.2024.105607
DO - 10.1016/j.ijmedinf.2024.105607
M3 - Journal articles
C2 - 39208536
SN - 1386-5056
VL - 191
SP - 105607
JO - International Journal of Medical Informatics
JF - International Journal of Medical Informatics
M1 - 105607
ER -