https://doi.org/10.55788/c6bd4c96
Dr Cem Şimşek (Hacettepe University, Turkey) and his research team designed a specialty-specific LLM to perform clinical tasks in gastroenterology [1]. The current proof-of-concept study compared this gastroenterology-specific model named GastroGPT against 3 general state-of-the art LLMs, being chatGPT4, Google Bard, and Anthropic’s Claude. An expert panel with reviewers from varying sub-specialities across Europe compared the models' performances across 7 clinical tasks for various simulated patient cases. “The clinical tasks included assessment, collecting additional history, recommending diagnostic tests and treatment, patient education, planning follow-up visitations, and referring patients to specialists,” clarified Dr Şimşek. The cases varied in complexity, rarity, and setting/urgency. The expert panel rated the accuracy, relevance, alignment with clinical guidelines, usability, interpretability, and potential clinical impact of the models’ output on a 10-point Likert scale.
The experts executed 480 evaluations. GastroGPT had a higher overall score across tasks than the other models (8.05 vs 4.95, 5.63, and 6.92; P<0.001 for all) for all 10 cases that were evaluated. GastroGPT performed significantly better than all general models with respect to ‘overall evaluation’, ‘additional history’, and ‘referrals’, scored better than ChatGPT and Google Bard in terms of ‘assessment’, ‘treatment’, and ‘patient education’, and was associated with better outcomes than ChatGPT regarding ‘recommended diagnostic tests’ (see Figure). Furthermore, GastroGPT was more consistent than the other models across different cases, clinical tasks, complexity levels, and rarity (Levene’s test<0.001). Finally, the panel scored a Cronbach’s alpha of 0.76 for coherency.
Figure: Clinical task outcomes for GastroGPT and general LLMs [1]
LLM, large language model.
GastroGPT outperformed general-purpose LLMs across key clinical tasks, indicating that speciality-specific LLMs have potential in medical practice. However, this approach must be tested across specialities and compared with physicians’ real-world evaluations.
- Şimşek C, et al. GastroGPT: first specialty-specific AI language model outperforms general models across key clinical tasks. LB16, UEG Week 2023, 14–17 October, Copenhagen, Denmark.
Copyright ©2023 Medicom Medical Publishers
Posted on
Previous Article
« Primary results from MAESTRO-NASH trial: resmetirom efficacious for NASH Next Article
Digital intervention relieves symptoms and improves QoL in IBS »
« Primary results from MAESTRO-NASH trial: resmetirom efficacious for NASH Next Article
Digital intervention relieves symptoms and improves QoL in IBS »
Table of Contents: UEGW 2023
Featured articles
SEQUENCE: Risankizumab doubles endoscopic remission rates compared with ustekinumab in CD
What’s New in Artificial Intelligence
Digital intervention relieves symptoms and improves QoL in IBS
GastroGPT: Successful proof-of-concept study of gastroenterology-specific large language model
Other Therapeutics and Outcomes
Primary results from MAESTRO-NASH trial: resmetirom efficacious for NASH
Apraglutide: Advancing the treatment of short bowel syndrome
Endobiliary radiofrequency ablation in pCCA: a pilot study
Raising awareness for microscopic colitis: disease course and predictors
Outcomes of IBD Trials
DIVERSITY1: Filgotinib results in Crohn’s disease leave investigators puzzled
SEQUENCE: Risankizumab doubles endoscopic remission rates compared with ustekinumab in CD
Guselkumab provides benefits in UC regardless of advanced therapy history
INSPIRE: Risankizumab meets all efficacy endpoints in UC
Risankizumab resolves extraintestinal manifestations in CD
Obefazimod takes the spotlight as promising UC treatment
Rapid response to upadacitinib boosts outcomes in severe Crohn’s disease
LUCENT trials: Mirikizumab works in UC, regardless of targeted therapy history
ARTEMIS-UC: New kid in town for UC
Breakthroughs in Colorectal Lesions
Safer removal of large polyps with cold snare technique
Higher recurrence rates with cold snare EMR than with conventional EMR
How to deal with at-risk patients above the CRC screening age limit?
European CRC screening needs to be revised
Advances in Upper Endoscopy and Colonoscopy
Epinephrine boosts efficiency in gastric ESD
Artificial intelligence-aided colonoscopy did not improve outcomes in Lynch syndrome
Can computer technology improve our everyday colonoscopy results?
Is AI-assisted colonoscopy ready for clinical practice?
Should we use E-SEMS or EVT for traumatic oesophageal perforations?
Related Articles
December 7, 2023
Can computer technology improve our everyday colonoscopy results?
December 7, 2023
Is AI-assisted colonoscopy ready for clinical practice?
© 2024 Medicom Medical Publishers. All rights reserved. Terms and Conditions | Privacy Policy
HEAD OFFICE
Laarderhoogtweg 25
1101 EB Amsterdam
The Netherlands
T: +31 85 4012 560
E: publishers@medicom-publishers.com