TY - JOUR
T1 - Automation Bias in Mammography
T2 - The Impact of Artificial Intelligence BI-RADS Suggestions on Reader Performance
AU - Dratsch, Thomas
AU - Chen, Xue
AU - Mehrizi, Mohammad Rezazade
AU - Kloeckner, Roman
AU - Mähringer-Kunz, Aline
AU - Püsken, Michael
AU - Baeßler, Bettina
AU - Sauer, Stephanie
AU - Maintz, David
AU - dos Santos, Daniel Pinto
N1 - Publisher Copyright:
© RSNA, 2023.
PY - 2023/5
Y1 - 2023/5
N2 - Background: Automation bias (the propensity for humans to favor suggestions from automated decision-making systems) is a known source of error in human-machine interactions, but its implications regarding artificial intelligence (AI)–aided mammography reading are unknown. Purpose: To determine how automation bias can affect inexperienced, moderately experienced, and very experienced radiologists when reading mammograms with the aid of an artificial intelligence (AI) system. Materials and Methods: In this prospective experiment, 27 radiologists read 50 mammograms and provided their Breast Imaging Reporting and Data System (BI-RADS) assessment assisted by a purported AI system. Mammograms were obtained between January 2017 and December 2019 and were presented in two randomized sets. The first was a training set of 10 mammograms, with the correct BI-RADS category suggested by the AI system. The second was a set of 40 mammograms in which an incorrect BI-RADS category was suggested for 12 mammograms. Reader performance, degree of bias in BI-RADS scoring, perceived accuracy of the AI system, and reader confidence in their own BI-RADS ratings were assessed using analysis of variance (ANOVA) and repeated-measures ANOVA followed by post hoc tests and Kruskal-Wallis tests followed by the Dunn post hoc test. Results: The percentage of correctly rated mammograms by inexperienced (mean, 79.7% ± 11.7 [SD] vs 19.8% ± 14.0; P <.001; r = 0.93), moderately experienced (mean, 81.3% ± 10.1 vs 24.8% ± 11.6; P <.001; r = 0.96), and very experienced (mean, 82.3% ± 4.2 vs 45.5% ± 9.1; P =.003; r = 0.97) radiologists was significantly impacted by the correctness of the AI prediction of BI-RADS category. Inexperienced radiologists were significantly more likely to follow the suggestions of the purported AI when it incorrectly suggested a higher BI-RADS category than the actual ground truth compared with both moderately (mean degree of bias, 4.0 ± 1.8 vs 2.4 ± 1.5; P =.044; r = 0.46) and very (mean degree of bias, 4.0 ± 1.8 vs 1.2 ± 0.8; P =.009; r = 0.65) experienced readers. Conclusion: The results show that inexperienced, moderately experienced, and very experienced radiologists reading mammograms are prone to automation bias when being supported by an AI-Based system. This and other effects of human and machine interaction must be considered to ensure safe deployment and accurate diagnostic performance when combining human readers and AI.
AB - Background: Automation bias (the propensity for humans to favor suggestions from automated decision-making systems) is a known source of error in human-machine interactions, but its implications regarding artificial intelligence (AI)–aided mammography reading are unknown. Purpose: To determine how automation bias can affect inexperienced, moderately experienced, and very experienced radiologists when reading mammograms with the aid of an artificial intelligence (AI) system. Materials and Methods: In this prospective experiment, 27 radiologists read 50 mammograms and provided their Breast Imaging Reporting and Data System (BI-RADS) assessment assisted by a purported AI system. Mammograms were obtained between January 2017 and December 2019 and were presented in two randomized sets. The first was a training set of 10 mammograms, with the correct BI-RADS category suggested by the AI system. The second was a set of 40 mammograms in which an incorrect BI-RADS category was suggested for 12 mammograms. Reader performance, degree of bias in BI-RADS scoring, perceived accuracy of the AI system, and reader confidence in their own BI-RADS ratings were assessed using analysis of variance (ANOVA) and repeated-measures ANOVA followed by post hoc tests and Kruskal-Wallis tests followed by the Dunn post hoc test. Results: The percentage of correctly rated mammograms by inexperienced (mean, 79.7% ± 11.7 [SD] vs 19.8% ± 14.0; P <.001; r = 0.93), moderately experienced (mean, 81.3% ± 10.1 vs 24.8% ± 11.6; P <.001; r = 0.96), and very experienced (mean, 82.3% ± 4.2 vs 45.5% ± 9.1; P =.003; r = 0.97) radiologists was significantly impacted by the correctness of the AI prediction of BI-RADS category. Inexperienced radiologists were significantly more likely to follow the suggestions of the purported AI when it incorrectly suggested a higher BI-RADS category than the actual ground truth compared with both moderately (mean degree of bias, 4.0 ± 1.8 vs 2.4 ± 1.5; P =.044; r = 0.46) and very (mean degree of bias, 4.0 ± 1.8 vs 1.2 ± 0.8; P =.009; r = 0.65) experienced readers. Conclusion: The results show that inexperienced, moderately experienced, and very experienced radiologists reading mammograms are prone to automation bias when being supported by an AI-Based system. This and other effects of human and machine interaction must be considered to ensure safe deployment and accurate diagnostic performance when combining human readers and AI.
UR - http://www.scopus.com/inward/record.url?scp=85159779314&partnerID=8YFLogxK
U2 - 10.1148/radiol.222176
DO - 10.1148/radiol.222176
M3 - Journal articles
C2 - 37129490
AN - SCOPUS:85159779314
SN - 0033-8419
VL - 307
JO - Radiology
JF - Radiology
IS - 4
M1 - e222176
ER -