Zur Hauptnavigation wechseln Zur Suche wechseln Zum Hauptinhalt wechseln

Using ChatGPT-4 for Lay Summarization in Prostate Cancer Research to Advance Patient-Centered Communication: Large-Scale Generative AI Performance Evaluation

Emily Rinderknecht*, Simon U. Engelmann, Veronika Saberi, Clemens Kirschner, Anton P. Kravchuk, Anna Schmelzer, Johannes Breyer, Christopher Goßler, Roman Mayr, Christian Gilfrich, Maximilian Burger, Dominik von Winning, Hendrik Borgmann, Christian Wülfing, Axel S. Merseburger, Maximilian Haas, Matthias May

*Korrespondierende/r Autor/-in für diese Arbeit

Abstract

Background: The increasing volume and complexity of biomedical literature pose challenges for making scientific knowledge accessible to lay audiences. Lay summaries, now widely encouraged or required by journals, aim to bridge this gap by promoting health literacy, patient engagement, and public trust. However, many are written by scientists without formal training in plain-language communication, often resulting in limited clarity, readability, and consistency. Generative large language models such as ChatGPT-4 offer a scalable opportunity to support lay summary creation, though their effectiveness within specific clinical domains has not been systematically evaluated at scale. Objective: This study aimed to assess ChatGPT-4’s performance in generating lay summaries for prostate cancer studies. A secondary objective was to evaluate how prompt design influences summary quality, aiming to provide practical guidance for the use of generative artificial intelligence (AI) in scientific publishing. Methods: A total of 204 consecutive articles on prostate cancer were extracted from a high-ranking oncology journal mandating lay summaries. Each abstract was processed with ChatGPT-4 using 2 prompts: a simple prompt based on the journal’s guidelines and an extended prompt refined to improve readability. AI-generated and original summaries were evaluated using 3 criteria: readability (Flesch-Kincaid Reading Ease [FKRE]), factual accuracy (5-point Likert scale, blinded rating by 2 clinical experts), and compliance with word count instructions (120‐150 words). Summaries were classified as high-quality as a composite outcome if they met all 3 benchmarks: FKRE >30, accuracy ≥4 from both raters, and word count within range. Statistical comparisons used Wilcoxon signed-rank and paired 2-tailed t tests (P<.05). Results: ChatGPT-4-generated lay summaries showed an improvement in readability compared to human-written versions, with the extended prompt achieving higher scores than the simple prompt (median FKRE: extended prompt 47, IQR 42-56; simple prompt 36, IQR 29-43; original 20, IQR 9.5‐29; P<.001).

original 5, IQR 4-5; P<.001) in this dataset. Compliance with word count instructions was greater for both AI-generated summaries in comparison to originals (wrong number of words; extended prompt 39 (19%), simple prompt 40 (20%), original 140 (69%); P<.001). Between simple and extended prompts, there were no significant differences in accuracy (P=.53) and word count compliance (P=.87). The proportion rated as high-quality was 79.4% for the extended prompt, 54.9% for the simple prompt, and 5.4% for original summaries (P<.001). Conclusions: With optimized prompting, ChatGPT-4 produced lay summaries that, on average, scored higher than author-written versions in readability, factual accuracy, and structural compliance within our dataset. These results support integrating generative AI into editorial workflows to improve science communication for nonexpert audiences. Limitations include focus on a single clinical domain and journal, and absence of layperson evaluation.

OriginalspracheEnglisch
Aufsatznummere76598
ZeitschriftJournal of Medical Internet Research
Jahrgang27
DOIs
PublikationsstatusVeröffentlicht - 2025

UN SDGs

Dieser Output leistet einen Beitrag zu folgendem(n) Ziel(en) für nachhaltige Entwicklung

  1. SDG 3 – Gesundheit und Wohlergehen
    SDG 3 – Gesundheit und Wohlergehen

Strategische Forschungsbereiche und Zentren

  • Profilbereich: Lübeck Integrated Oncology Network (LION)
  • Zentren: Zentrum für Künstliche Intelligenz Lübeck (ZKIL)

DFG-Fachsystematik

  • 2.22-23 Reproduktionsmedizin, Urologie
  • 4.43-04 Künstliche Intelligenz und Maschinelles Lernverfahren

KDSF-Klassifikation für Forschungsfelder

  • 073 - Künstliche Intelligenz und Big Data

Fingerprint

Untersuchen Sie die Forschungsthemen von „Using ChatGPT-4 for Lay Summarization in Prostate Cancer Research to Advance Patient-Centered Communication: Large-Scale Generative AI Performance Evaluation“. Zusammen bilden sie einen einzigartigen Fingerprint.

Zitieren