Yayın:
Evaluation of artificial intelligence in thoracic surgery internship education: accuracy and usability of AI-generated exam questions

dc.contributor.authorDal, İsmail
dc.date.accessioned2026-01-04T22:02:08Z
dc.date.issued2025-05-30
dc.description.abstractAims: This study aims to evaluate the usefulness and reliability of artificial intelligence (AI) applications in thoracic surgery internship education and exam preparation. Methods: Claude Sonnet 3.7 AI was provided with core topics covered in the 5th-year thoracic surgery internship and was instructed to generate a 20-question multiple-choice exam, including an answer key. Four thoracic surgery specialists assessed the AI-generated questions using the Delphi panel method, classifying them as correct, minor error, or major error. Major errors included the absence of the correct answer among choices, incorrect AI-marked answers, or contradictions with established medical knowledge. A second exam was manually created by a thoracic surgery specialist and evaluated using the same methodology. Seven volunteer 5th-year medical students completed both exams, and the correlation between their scores was statistically analyzed. Results: Among AI-generated questions, 8 (40%) contained major errors, while 1 (5%) had a minor error. The expert-generated exam had a perfect accuracy rate, whereas the AI-generated exam had significantly lower accuracy (p=0.001). Median scores were 75 (67-100) for the AI exam and 85 (70-95) for the expert exam. No significant correlation was found between students’ scores (r=0.042, p=0.929). Conclusion: AI-generated questions had a high error rate (40% major, 5% minor), making them unreliable for unsupervised use in medical education. While AI may provide partial benefits under expert supervision, it currently lacks the accuracy required for independent implementation in thoracic surgery education.
dc.description.urihttps://doi.org/10.32322/jhsm.1660603
dc.description.urihttps://dergipark.org.tr/tr/pub/jhsm/issue/92158/1660603
dc.identifier.doi10.32322/jhsm.1660603
dc.identifier.eissn2636-8579
dc.identifier.endpage528
dc.identifier.openairedoi_dedup___::e57f6e135955fe3d6c89cd98d2709fba
dc.identifier.orcid0000-0002-5118-0780
dc.identifier.startpage524
dc.identifier.urihttps://hdl.handle.net/20.500.12597/42698
dc.identifier.volume8
dc.publisherJournal of Health Sciences and Medicine
dc.relation.ispartofJournal of Health Sciences and Medicine
dc.rightsOPEN
dc.subjectArtificial ıntelligence
dc.subjectthoracic surgery education
dc.subjectmultiple choice tests
dc.subjectdelphi technique
dc.subjectThoracic Surgery
dc.subjectGöğüs Cerrahisi
dc.subjectYapay Zekâ
dc.subjectGöğüs Cerrahisi Eğitimi
dc.subjectÇoktan Seçmeli Testler
dc.subjectDelphi Tekniği.
dc.titleEvaluation of artificial intelligence in thoracic surgery internship education: accuracy and usability of AI-generated exam questions
dc.typeArticle
dspace.entity.typePublication
local.api.response{"authors":[{"fullName":"İsmail Dal","name":"İsmail","surname":"Dal","rank":1,"pid":{"id":{"scheme":"orcid_pending","value":"0000-0002-5118-0780"},"provenance":null}}],"openAccessColor":"gold","publiclyFunded":false,"type":"publication","language":{"code":"und","label":"Undetermined"},"countries":null,"subjects":[{"subject":{"scheme":"keyword","value":"Artificial ıntelligence;thoracic surgery education;multiple choice tests;delphi technique"},"provenance":null},{"subject":{"scheme":"keyword","value":"Thoracic Surgery"},"provenance":null},{"subject":{"scheme":"keyword","value":"Göğüs Cerrahisi"},"provenance":null},{"subject":{"scheme":"keyword","value":"Yapay Zekâ;Göğüs Cerrahisi Eğitimi;Çoktan Seçmeli Testler;Delphi Tekniği."},"provenance":null}],"mainTitle":"Evaluation of artificial intelligence in thoracic surgery internship education: accuracy and usability of AI-generated exam questions","subTitle":null,"descriptions":["<jats:p xml:lang=\"en\">Aims: This study aims to evaluate the usefulness and reliability of artificial intelligence (AI) applications in thoracic surgery internship education and exam preparation. Methods: Claude Sonnet 3.7 AI was provided with core topics covered in the 5th-year thoracic surgery internship and was instructed to generate a 20-question multiple-choice exam, including an answer key. Four thoracic surgery specialists assessed the AI-generated questions using the Delphi panel method, classifying them as correct, minor error, or major error. Major errors included the absence of the correct answer among choices, incorrect AI-marked answers, or contradictions with established medical knowledge. A second exam was manually created by a thoracic surgery specialist and evaluated using the same methodology. Seven volunteer 5th-year medical students completed both exams, and the correlation between their scores was statistically analyzed. Results: Among AI-generated questions, 8 (40%) contained major errors, while 1 (5%) had a minor error. The expert-generated exam had a perfect accuracy rate, whereas the AI-generated exam had significantly lower accuracy (p=0.001). Median scores were 75 (67-100) for the AI exam and 85 (70-95) for the expert exam. No significant correlation was found between students’ scores (r=0.042, p=0.929). Conclusion: AI-generated questions had a high error rate (40% major, 5% minor), making them unreliable for unsupervised use in medical education. While AI may provide partial benefits under expert supervision, it currently lacks the accuracy required for independent implementation in thoracic surgery education.</jats:p>"],"publicationDate":"2025-05-30","publisher":"Journal of Health Sciences and Medicine","embargoEndDate":null,"sources":["Crossref","Volume: 8, Issue: 3524-528","2636-8579","Journal of Health Sciences and Medicine"],"formats":["application/pdf"],"contributors":null,"coverages":null,"bestAccessRight":{"code":"c_abf2","label":"OPEN","scheme":"http://vocabularies.coar-repositories.org/documentation/access_rights/"},"container":{"name":"Journal of Health Sciences and Medicine","issnPrinted":null,"issnOnline":"2636-8579","issnLinking":null,"ep":"528","iss":null,"sp":"524","vol":"8","edition":null,"conferencePlace":null,"conferenceDate":null},"documentationUrls":null,"codeRepositoryUrl":null,"programmingLanguage":null,"contactPeople":null,"contactGroups":null,"tools":null,"size":null,"version":null,"geoLocations":null,"id":"doi_dedup___::e57f6e135955fe3d6c89cd98d2709fba","originalIds":["10.32322/jhsm.1660603","50|doiboost____|e57f6e135955fe3d6c89cd98d2709fba","oai:dergipark.org.tr:article/1660603","50|tubitakulakb::ccc356cc76c5f07ca85f95926deb27f2"],"pids":[{"scheme":"doi","value":"10.32322/jhsm.1660603"}],"dateOfCollection":null,"lastUpdateTimeStamp":null,"indicators":{"citationImpact":{"citationCount":0,"influence":2.5349236e-9,"popularity":2.8669784e-9,"impulse":0,"citationClass":"C5","influenceClass":"C5","impulseClass":"C5","popularityClass":"C5"}},"instances":[{"pids":[{"scheme":"doi","value":"10.32322/jhsm.1660603"}],"type":"Article","urls":["https://doi.org/10.32322/jhsm.1660603"],"publicationDate":"2025-05-30","refereed":"peerReviewed"},{"alternateIdentifiers":[{"scheme":"doi","value":"10.32322/jhsm.1660603"}],"type":"Article","urls":["https://dergipark.org.tr/tr/pub/jhsm/issue/92158/1660603"],"publicationDate":"2025-03-18","refereed":"nonPeerReviewed"}],"isGreen":false,"isInDiamondJournal":false}
local.import.sourceOpenAire

Dosyalar

Koleksiyonlar