Yayın:
Evaluation of Advanced Artificial Intelligence Algorithms' Diagnostic Efficacy in Acute Ischemic Stroke: A Comparative Analysis of ChatGPT-4o and Claude 3.5 Sonnet Models

dc.contributor.authorKoyun, Mustafa
dc.contributor.authorTaskent, Ismail
dc.date.accessioned2026-01-04T21:14:41Z
dc.date.issued2024-12-26
dc.description.abstractBackground/Objectives: Acute ischemic stroke (AIS) is a leading cause of mortality and disability worldwide, with early and accurate diagnosis being critical for timely intervention and improved patient outcomes. This retrospective study aimed to assess the diagnostic performance of two advanced artificial intelligence (AI) models, Chat Generative Pre-trained Transformer (ChatGPT-4o) and Claude 3.5 Sonnet, in identifying AIS from diffusion-weighted imaging (DWI). Methods: The DWI images of a total of 110 cases (AIS group: n=55, healthy controls: n=55) were provided to the AI models via standardized prompts. Their responses were compared to the gold-standard evaluations by radiologists, and performance metrics, including sensitivity, specificity, positive predictive value, negative predictive value, diagnostic accuracy, and inter-model agreement levels, were calculated. Results: Both models exhibited high sensitivity for AIS detection (ChatGPT-4o: 100%, Claude 3.5 Sonnet: 94.5%). However, ChatGPT-4o demonstrated significantly lower specificity (3.6%) compared to Claude 3.5 Sonnet (74.5%). Agreement with radiologists was poor for ChatGPT-4o (κ=0.036) but good for Claude 3.5 Sonnet (κ=0.691). In terms of hemispheric localization accuracy, Claude 3.5 Sonnet (67.2%) outperformed ChatGPT-4o (32.7%). Similarly, for specific AIS localization, Claude 3.5 Sonnet (30.9%) showed greater accuracy than ChatGPT-4o (7.2%), with these differences being statistically significant (p<0.05). Conclusions: This study highlights the superior diagnostic performance of Claude 3.5 Sonnet compared to ChatGPT-4o in identifying AIS from DWI. Despite its advantages, both models demonstrated notable limitations in accuracy, emphasizing the need for further development before achieving full clinical applicability. These findings underline the potential of AI tools in radiological diagnostics while acknowledging their current limitations.
dc.description.urihttps://doi.org/10.20944/preprints202412.2282.v1
dc.description.urihttps://doi.org/10.3390/jcm14020571
dc.description.urihttp://dx.doi.org/10.3390/jcm14020571
dc.identifier.doi10.20944/preprints202412.2282.v1
dc.identifier.eissn2077-0383
dc.identifier.openairedoi_dedup___::62db39034268be1e087da5b4d322bb5a
dc.identifier.orcid0000-0002-9811-4385
dc.identifier.orcid0000-0001-6278-7863
dc.identifier.startpage571
dc.identifier.urihttps://hdl.handle.net/20.500.12597/42339
dc.identifier.volume14
dc.publisherMDPI AG
dc.relation.ispartofJournal of Clinical Medicine
dc.rightsOPEN
dc.subjectArticle
dc.titleEvaluation of Advanced Artificial Intelligence Algorithms' Diagnostic Efficacy in Acute Ischemic Stroke: A Comparative Analysis of ChatGPT-4o and Claude 3.5 Sonnet Models
dc.typeArticle
dspace.entity.typePublication
local.api.response{"authors":[{"fullName":"Mustafa Koyun","name":"Mustafa","surname":"Koyun","rank":1,"pid":{"id":{"scheme":"orcid","value":"0000-0002-9811-4385"},"provenance":null}},{"fullName":"Ismail Taskent","name":"Ismail","surname":"Taskent","rank":2,"pid":{"id":{"scheme":"orcid","value":"0000-0001-6278-7863"},"provenance":null}}],"openAccessColor":"gold","publiclyFunded":false,"type":"publication","language":{"code":"und","label":"Undetermined"},"countries":null,"subjects":[{"subject":{"scheme":"keyword","value":"Article"},"provenance":null}],"mainTitle":"Evaluation of Advanced Artificial Intelligence Algorithms' Diagnostic Efficacy in Acute Ischemic Stroke: A Comparative Analysis of ChatGPT-4o and Claude 3.5 Sonnet Models","subTitle":null,"descriptions":["<jats:p>Background/Objectives: Acute ischemic stroke (AIS) is a leading cause of mortality and disability worldwide, with early and accurate diagnosis being critical for timely intervention and improved patient outcomes. This retrospective study aimed to assess the diagnostic performance of two advanced artificial intelligence (AI) models, Chat Generative Pre-trained Transformer (ChatGPT-4o) and Claude 3.5 Sonnet, in identifying AIS from diffusion-weighted imaging (DWI). Methods: The DWI images of a total of 110 cases (AIS group: n=55, healthy controls: n=55) were provided to the AI models via standardized prompts. Their responses were compared to the gold-standard evaluations by radiologists, and performance metrics, including sensitivity, specificity, positive predictive value, negative predictive value, diagnostic accuracy, and inter-model agreement levels, were calculated. Results: Both models exhibited high sensitivity for AIS detection (ChatGPT-4o: 100%, Claude 3.5 Sonnet: 94.5%). However, ChatGPT-4o demonstrated significantly lower specificity (3.6%) compared to Claude 3.5 Sonnet (74.5%). Agreement with radiologists was poor for ChatGPT-4o (κ=0.036) but good for Claude 3.5 Sonnet (κ=0.691). In terms of hemispheric localization accuracy, Claude 3.5 Sonnet (67.2%) outperformed ChatGPT-4o (32.7%). Similarly, for specific AIS localization, Claude 3.5 Sonnet (30.9%) showed greater accuracy than ChatGPT-4o (7.2%), with these differences being statistically significant (p&amp;amp;lt;0.05). Conclusions: This study highlights the superior diagnostic performance of Claude 3.5 Sonnet compared to ChatGPT-4o in identifying AIS from DWI. Despite its advantages, both models demonstrated notable limitations in accuracy, emphasizing the need for further development before achieving full clinical applicability. These findings underline the potential of AI tools in radiological diagnostics while acknowledging their current limitations.</jats:p>"],"publicationDate":"2024-12-26","publisher":"MDPI AG","embargoEndDate":null,"sources":["Crossref","J Clin Med"],"formats":null,"contributors":null,"coverages":null,"bestAccessRight":{"code":"c_abf2","label":"OPEN","scheme":"http://vocabularies.coar-repositories.org/documentation/access_rights/"},"container":{"name":"Journal of Clinical Medicine","issnPrinted":null,"issnOnline":"2077-0383","issnLinking":null,"ep":null,"iss":null,"sp":"571","vol":"14","edition":null,"conferencePlace":null,"conferenceDate":null},"documentationUrls":null,"codeRepositoryUrl":null,"programmingLanguage":null,"contactPeople":null,"contactGroups":null,"tools":null,"size":null,"version":null,"geoLocations":null,"id":"doi_dedup___::62db39034268be1e087da5b4d322bb5a","originalIds":["10.20944/preprints202412.2282.v1","50|doiboost____|62db39034268be1e087da5b4d322bb5a","jcm14020571","10.3390/jcm14020571","50|doiboost____|cffc4ae56e22c337d6b1a031e3be1837","50|od_______267::ea9cff6d47c1980440edbd0b28018ed6","oai:pubmedcentral.nih.gov:11765597"],"pids":[{"scheme":"doi","value":"10.20944/preprints202412.2282.v1"},{"scheme":"doi","value":"10.3390/jcm14020571"}],"dateOfCollection":null,"lastUpdateTimeStamp":null,"indicators":{"citationImpact":{"citationCount":10,"influence":2.9602008e-9,"popularity":1.022959e-8,"impulse":10,"citationClass":"C5","influenceClass":"C5","impulseClass":"C4","popularityClass":"C4"}},"instances":[{"pids":[{"scheme":"doi","value":"10.20944/preprints202412.2282.v1"}],"license":"CC BY","type":"Article","urls":["https://doi.org/10.20944/preprints202412.2282.v1"],"publicationDate":"2024-12-26","refereed":"peerReviewed"},{"pids":[{"scheme":"doi","value":"10.3390/jcm14020571"}],"license":"CC BY","type":"Article","urls":["https://doi.org/10.3390/jcm14020571"],"publicationDate":"2025-01-17","refereed":"peerReviewed"},{"alternateIdentifiers":[{"scheme":"doi","value":"10.3390/jcm14020571"}],"license":"CC BY","type":"Other literature type","urls":["http://dx.doi.org/10.3390/jcm14020571"],"publicationDate":"2025-01-17","refereed":"nonPeerReviewed"}],"isGreen":true,"isInDiamondJournal":false}
local.import.sourceOpenAire

Dosyalar

Koleksiyonlar