Home > Pulmonology > ERS 2025 > Lung Cancer and Other Clinical Insights > How accurate are large language models for spirometry interpretation?

How accurate are large language models for spirometry interpretation?

Presented by
Dr Ece Ocak , İzmir City Hospital, Turkey
Conference
ERS 2025
Large language artificial intelligence (AI) models could help interpret spirometry results in children, but current accuracy remains limited.

For this study, 2 trained pulmonologists and 3 large-language models (ChatGPT 4o, DeepSeek R1, and Claude 3.5 Sonnet) assessed 100 random spirometry tests from children with varied underlying conditions, according to ERS/ATS 2019 guidelines. The large language models did not receive specific training, and all were tested using standardised prompts. Agreement with pulmonologists’ interpretations was assessed for test acceptability, clinical usability, and pulmonary function categorisation.

Dr Ece Ocak (İzmir City Hospital, Turkey) presented the results [1]. For pulmonary function testing, ChatGPT 4o achieved the highest agreement with the trained pulmonologists (Cohen’s kappa=0.515; moderate agreement), followed by DeepSeek R1 and Claude 3.5 Sonnet (both showing low agreement). However, “none of the large language models achieved substantial agreement for test acceptability or clinical usability,” said Dr Ocak. Furthermore, all models showed good accuracy for normal and obstructive spirometry patterns, but poor accuracy for restrictive and mixed patterns.

“Large language models may serve as a second reader to reduce variability,” concluded Dr Ocak. “Expanding training with larger and more diverse patient datasets could enhance diagnostic accuracy and support timely, standardised decision-making. Future research should address recognition of complex spirometry patterns and assess integration into real-world clinical workflows.”

  1. Ocak E, et al. Evaluation of artificial intelligence accuracy in interpreting pulmonary function tests: A comparison of ChatGPT 4o, DeepSeek R1, and Claude 3.5 Sonnet. ERS Congress, 27 September–1 October 2025, Amsterdam, the Netherlands.

Copyright ©2025 Medicom Publishing Group



Posted on