Home > Neurology > EAN 2025 > Is ChatGPT helpful in diagnosing polyneuropathies?

Is ChatGPT helpful in diagnosing polyneuropathies?

Presented by
Dr Alberto De Lorenzo, University of Milan, Italy
Conference
EAN 2025
In a comparative analysis with neurologists, GPT-4o showed promise as a tool to support the diagnosis of polyneuropathy, improving the accuracy of non-specialists and guiding confirmatory testing. Supervised integration could help bridge expertise gaps in neurological care, especially in more rural areas.

A lack of clinicians frequently causes misdiagnosis of neuropathies and delay of treatment, especially in rural and underserved settings, claimed Dr Alberto De Lorenzo (University of Milan, Italy) [1]. As AI shows promise in aiding clinicians in both diagnosis and management decisions, Dr De Lorenzo and his group compared GPT-4o’s performance with that of specialised and non-specialised neurologists in diagnosing real-life cases of polyneuropathy and in guiding confirmatory testing [1]. Standardised clinical summaries from 100 polyneuropathy cases were presented to GPT-4o in order to generate a leading diagnosis, two differential diagnoses, and a confirmatory test. The same 100 cases were reviewed by 36 neurologists from 10 countries, both specialists and non-specialists. They were allowed to revise their findings after reviewing GPT-4o’s output.

Compared with specialised neurologists, GPT-4o’s accuracy of the leading diagnosis was inferior (73.9% vs 65.5%; P=0.024). However, GPT-4o outperformed non-specialists (accuracy 65.5% vs 54.4%; P=0.007). When differential diagnoses were also considered, GPT-4o performed less well than specialists (88.1% vs 82.0%; P=0.042) but outperformed non-specialists (82.0% vs 68.5%; P<0.001). Dr De Lorenzo added that specialists only revised their diagnosis in 9% of cases after reviewing GPT-4o output, slightly and non-significantly increasing their performance from 73.9% to 75.0% (P=0.069); while non-specialised doctors revised their diagnosis in 21% of cases, significantly improving accuracy of their leading diagnosis and differential diagnosis from 54.4% to 57.0% (P=0.007). The most common GPT-4o errors were over-reliance on laboratory findings or patient history (38%), overlooking clinical information (16%), vague conclusions (16%), limited internal knowledge (9%), and reasonable but incorrect responses (22%).

ā€œThis may be a first step towards managing and collecting evidence for the integration of these models in real-life clinical care,ā€ Dr De Lorenzo concluded, ā€œespecially benefitting non-specialised clinicians when faced with a complex case of peripheral neuropathy.ā€

  1. De Lorenzo A, et al. Chat-GPT-4o in diagnosis and management of real-life polyneuropathy cases: Comparative analysis with neurologists. OPR-023, EAN Congress 2025, 21-24 June 2025, Helsinki, Finland.

 

Medical writing support was provided by Michiel Tent.

Copyright ©2025 Medicom Medical Publishers



Posted on