TY - JOUR
T1 - Overcoming data scarcity in biomedical imaging with a foundational multi-task model
AU - Schäfer, Raphael
AU - Nicke, Till
AU - Höfener, Henning
AU - Lange, Annkristin
AU - Merhof, Dorit
AU - Feuerhake, Friedrich
AU - Schulz, Volkmar
AU - Lotz, Johannes
AU - Kiessling, Fabian
N1 - Publisher Copyright:
© The Author(s) 2024.
PY - 2024/7
Y1 - 2024/7
N2 - Foundational models, pretrained on a large scale, have demonstrated substantial success across non-medical domains. However, training these models typically requires large, comprehensive datasets, which contrasts with the smaller and more specialized datasets common in biomedical imaging. Here we propose a multi-task learning strategy that decouples the number of training tasks from memory requirements. We trained a universal biomedical pretrained model (UMedPT) on a multi-task database including tomographic, microscopic and X-ray images, with various labeling strategies such as classification, segmentation and object detection. The UMedPT foundational model outperformed ImageNet pretraining and previous state-of-the-art models. For classification tasks related to the pretraining database, it maintained its performance with only 1% of the original training data and without fine-tuning. For out-of-domain tasks it required only 50% of the original training data. In an external independent validation, imaging features extracted using UMedPT proved to set a new standard for cross-center transferability.
AB - Foundational models, pretrained on a large scale, have demonstrated substantial success across non-medical domains. However, training these models typically requires large, comprehensive datasets, which contrasts with the smaller and more specialized datasets common in biomedical imaging. Here we propose a multi-task learning strategy that decouples the number of training tasks from memory requirements. We trained a universal biomedical pretrained model (UMedPT) on a multi-task database including tomographic, microscopic and X-ray images, with various labeling strategies such as classification, segmentation and object detection. The UMedPT foundational model outperformed ImageNet pretraining and previous state-of-the-art models. For classification tasks related to the pretraining database, it maintained its performance with only 1% of the original training data and without fine-tuning. For out-of-domain tasks it required only 50% of the original training data. In an external independent validation, imaging features extracted using UMedPT proved to set a new standard for cross-center transferability.
UR - https://www.scopus.com/pages/publications/85198950562
U2 - 10.1038/s43588-024-00662-z
DO - 10.1038/s43588-024-00662-z
M3 - Journal articles
AN - SCOPUS:85198950562
SN - 2662-8457
VL - 4
SP - 495
EP - 509
JO - Nature Computational Science
JF - Nature Computational Science
IS - 7
ER -