Scopus:
Evaluation of Advanced Artificial Intelligence Algorithms’ Diagnostic Efficacy in Acute Ischemic Stroke: A Comparative Analysis of ChatGPT-4o and Claude 3.5 Sonnet Models

dc.contributor.authorKoyun, M.
dc.contributor.authorTaskent, I.
dc.date.accessioned2025-02-10T14:28:25Z
dc.date.available2025-02-10T14:28:25Z
dc.date.issued2025
dc.description.abstractBackground/Objectives: Acute ischemic stroke (AIS) is a leading cause of mortality and disability worldwide, with early and accurate diagnosis being critical for timely intervention and improved patient outcomes. This retrospective study aimed to assess the diagnostic performance of two advanced artificial intelligence (AI) models, Chat Generative Pre-trained Transformer (ChatGPT-4o) and Claude 3.5 Sonnet, in identifying AIS from diffusion-weighted imaging (DWI). Methods: The DWI images of a total of 110 cases (AIS group: n = 55, healthy controls: n = 55) were provided to the AI models via standardized prompts. The models’ responses were compared to radiologists’ gold-standard evaluations, and performance metrics such as sensitivity, specificity, and diagnostic accuracy were calculated. Results: Both models exhibited a high sensitivity for AIS detection (ChatGPT-4o: 100%, Claude 3.5 Sonnet: 94.5%). However, ChatGPT-4o demonstrated a significantly lower specificity (3.6%) compared to Claude 3.5 Sonnet (74.5%). The agreement with radiologists was poor for ChatGPT-4o (κ = 0.036; %95 CI: −0.013, 0.085) but good for Claude 3.5 Sonnet (κ = 0.691; %95 CI: 0.558, 0.824). In terms of the AIS hemispheric localization accuracy, Claude 3.5 Sonnet (67.2%) outperformed ChatGPT-4o (32.7%). Similarly, for specific AIS localization, Claude 3.5 Sonnet (30.9%) showed greater accuracy than ChatGPT-4o (7.3%), with these differences being statistically significant (p < 0.05). Conclusions: This study highlights the superior diagnostic performance of Claude 3.5 Sonnet compared to ChatGPT-4o in identifying AIS from DWI. Despite its advantages, both models demonstrated notable limitations in accuracy, emphasizing the need for further development before achieving full clinical applicability. These findings underline the potential of AI tools in radiological diagnostics while acknowledging their current limitations.
dc.identifier10.3390/jcm14020571
dc.identifier.doi10.3390/jcm14020571
dc.identifier.issn20770383
dc.identifier.issue2
dc.identifier.scopus2-s2.0-85215805522
dc.identifier.urihttps://hdl.handle.net/20.500.12597/34066
dc.identifier.volume14
dc.language.isoen
dc.publisherMultidisciplinary Digital Publishing Institute (MDPI)
dc.relation.ispartofJournal of Clinical Medicine
dc.relation.ispartofseriesJournal of Clinical Medicine
dc.rightsinfo:eu-repo/semantics/openAccess
dc.subjectAcute ischemic stroke, artificial intelligence, ChatGPT, Claude, magnetic resonance imaging, radiology
dc.titleEvaluation of Advanced Artificial Intelligence Algorithms’ Diagnostic Efficacy in Acute Ischemic Stroke: A Comparative Analysis of ChatGPT-4o and Claude 3.5 Sonnet Models
dc.typearticle
dspace.entity.typeScopus
local.indexed.atScopus
oaire.citation.issue2
oaire.citation.volume14
person.affiliation.nameKastamonu Training and Research Hospital
person.affiliation.nameKastamonu University
person.identifier.orcid0000-0002-9811-4385
person.identifier.orcid0000-0001-6278-7863
person.identifier.scopus-author-id59501465900
person.identifier.scopus-author-id57192662849

Files