Yayın:
Predicting the predisposition to colorectal cancer based on SNP profiles of immune phenotypes using supervised learning models

dc.contributor.authorCakmak, Ali
dc.contributor.authorAyaz, Huzeyfe
dc.contributor.authorArıkan, Soykan
dc.contributor.authorIbrahimzada, Ali R.
dc.contributor.authorDemirkol, Şeyda
dc.contributor.authorSönmez, Dilara
dc.contributor.authorHakan, Mehmet T.
dc.contributor.authorSürmen, Saime T.
dc.contributor.authorHorozoğlu, Cem
dc.contributor.authorDoğan, Mehmet B.
dc.contributor.authorKüçükhüseyin, Özlem
dc.contributor.authorCacına, Canan
dc.contributor.authorKıran, Bayram
dc.contributor.authorZeybek, Ümit
dc.contributor.authorBaysan, Mehmet
dc.contributor.authorYaylım, İlhan
dc.date.accessioned2026-01-05T23:03:44Z
dc.date.issued2022-11-11
dc.description.abstractThis study explores the machine learning-based assessment of predisposition to colorectal cancer based on single nucleotide polymorphisms (SNP). Such a computational approach may be used as a risk indicator and an auxiliary diagnosis method that complements the traditional methods such as biopsy and CT scan. Moreover, it may be used to develop a low-cost screening test for the early detection of colorectal cancers to improve public health. We employ several supervised classification algorithms. Besides, we apply data imputation to fill in the missing genotype values. The employed dataset includes SNPs observed in particular colorectal cancer-associated genomic loci that are located within DNA regions of 11 selected genes obtained from 115 individuals. We make the following observations: (i) random forest-based classifier using one-hot encoding and K-nearest neighbor (KNN)-based imputation performs the best among the studied classifiers with an F1 score of 89% and area under the curve (AUC) score of 0.96. (ii) One-hot encoding together with K-nearest neighbor-based data imputation increases the F1 scores by around 26% in comparison to the baseline approach which does not employ them. (iii) The proposed model outperforms a commonly employed state-of-the-art approach, ColonFlag, under all evaluated settings by up to 24% in terms of the AUC score. Based on the high accuracy of the constructed predictive models, the studied 11 genes may be considered a gene panel candidate for colon cancer risk screening.
dc.description.urihttps://doi.org/10.1007/s11517-022-02707-9
dc.description.urihttps://pubmed.ncbi.nlm.nih.gov/36357628
dc.description.urihttps://hdl.handle.net/20.500.12445/2709
dc.identifier.doi10.1007/s11517-022-02707-9
dc.identifier.eissn1741-0444
dc.identifier.endpage258
dc.identifier.issn0140-0118
dc.identifier.openairedoi_dedup___::6265a033d7eff3c20f53d823b21affe4
dc.identifier.orcid0000-0002-1382-6130
dc.identifier.orcid0000-0002-0585-2400
dc.identifier.orcid0000-0002-3797-818x
dc.identifier.orcid0000-0002-7748-0757
dc.identifier.orcid0000-0001-7359-2965
dc.identifier.pubmed36357628
dc.identifier.scopus2-s2.0-85141696341
dc.identifier.startpage243
dc.identifier.urihttps://hdl.handle.net/20.500.12597/43542
dc.identifier.volume61
dc.identifier.wos000881669900001
dc.language.isoeng
dc.publisherSpringer Science and Business Media LLC
dc.relation.ispartofMedical & Biological Engineering & Computing
dc.rightsOPEN
dc.subjectColorectal Cancer
dc.subjectGenotype
dc.subjectClassification
dc.subjectMachine Learning
dc.subjectPhenotype
dc.subjectColonic Neoplasms
dc.subjectHumans
dc.subjectSupervised Machine Learning
dc.subjectCancer Screening
dc.subjectImmune Checkpoints
dc.subjectAlgorithms
dc.subject.sdg3. Good health
dc.titlePredicting the predisposition to colorectal cancer based on SNP profiles of immune phenotypes using supervised learning models
dc.typeArticle
dspace.entity.typePublication
local.api.response{"authors":[{"fullName":"Ali Cakmak","name":"Ali","surname":"Cakmak","rank":1,"pid":{"id":{"scheme":"orcid_pending","value":"0000-0002-1382-6130"},"provenance":null}},{"fullName":"Huzeyfe Ayaz","name":"Huzeyfe","surname":"Ayaz","rank":2,"pid":{"id":{"scheme":"orcid","value":"0000-0002-0585-2400"},"provenance":null}},{"fullName":"Soykan Arıkan","name":"Soykan","surname":"Arıkan","rank":3,"pid":null},{"fullName":"Ali R. Ibrahimzada","name":"Ali R.","surname":"Ibrahimzada","rank":4,"pid":{"id":{"scheme":"orcid","value":"0000-0002-3797-818x"},"provenance":null}},{"fullName":"Şeyda Demirkol","name":"Şeyda","surname":"Demirkol","rank":5,"pid":null},{"fullName":"Dilara Sönmez","name":"Dilara","surname":"Sönmez","rank":6,"pid":null},{"fullName":"Mehmet T. Hakan","name":"Mehmet T.","surname":"Hakan","rank":7,"pid":null},{"fullName":"Saime T. Sürmen","name":"Saime T.","surname":"Sürmen","rank":8,"pid":{"id":{"scheme":"orcid","value":"0000-0002-7748-0757"},"provenance":null}},{"fullName":"Cem Horozoğlu","name":"Cem","surname":"Horozoğlu","rank":9,"pid":null},{"fullName":"Mehmet B. Doğan","name":"Mehmet B.","surname":"Doğan","rank":10,"pid":null},{"fullName":"Özlem Küçükhüseyin","name":"Özlem","surname":"Küçükhüseyin","rank":11,"pid":null},{"fullName":"Canan Cacına","name":"Canan","surname":"Cacına","rank":12,"pid":null},{"fullName":"Bayram Kıran","name":"Bayram","surname":"Kıran","rank":13,"pid":null},{"fullName":"Ümit Zeybek","name":"Ümit","surname":"Zeybek","rank":14,"pid":null},{"fullName":"Mehmet Baysan","name":"Mehmet","surname":"Baysan","rank":15,"pid":{"id":{"scheme":"orcid","value":"0000-0001-7359-2965"},"provenance":null}},{"fullName":"İlhan Yaylım","name":"İlhan","surname":"Yaylım","rank":16,"pid":null}],"openAccessColor":null,"publiclyFunded":false,"type":"publication","language":{"code":"eng","label":"English"},"countries":null,"subjects":[{"subject":{"scheme":"keyword","value":"Colorectal Cancer"},"provenance":null},{"subject":{"scheme":"keyword","value":"Genotype"},"provenance":null},{"subject":{"scheme":"keyword","value":"Classification"},"provenance":null},{"subject":{"scheme":"SDG","value":"3. Good health"},"provenance":null},{"subject":{"scheme":"keyword","value":"Machine Learning"},"provenance":null},{"subject":{"scheme":"FOS","value":"03 medical and health sciences"},"provenance":null},{"subject":{"scheme":"keyword","value":"Phenotype"},"provenance":null},{"subject":{"scheme":"FOS","value":"0302 clinical medicine"},"provenance":null},{"subject":{"scheme":"keyword","value":"Colonic Neoplasms"},"provenance":null},{"subject":{"scheme":"keyword","value":"Humans"},"provenance":null},{"subject":{"scheme":"keyword","value":"Supervised Machine Learning"},"provenance":null},{"subject":{"scheme":"keyword","value":"Cancer Screening"},"provenance":null},{"subject":{"scheme":"keyword","value":"Immune Checkpoints"},"provenance":null},{"subject":{"scheme":"keyword","value":"Algorithms"},"provenance":null}],"mainTitle":"Predicting the predisposition to colorectal cancer based on SNP profiles of immune phenotypes using supervised learning models","subTitle":null,"descriptions":["This study explores the machine learning-based assessment of predisposition to colorectal cancer based on single nucleotide polymorphisms (SNP). Such a computational approach may be used as a risk indicator and an auxiliary diagnosis method that complements the traditional methods such as biopsy and CT scan. Moreover, it may be used to develop a low-cost screening test for the early detection of colorectal cancers to improve public health. We employ several supervised classification algorithms. Besides, we apply data imputation to fill in the missing genotype values. The employed dataset includes SNPs observed in particular colorectal cancer-associated genomic loci that are located within DNA regions of 11 selected genes obtained from 115 individuals. We make the following observations: (i) random forest-based classifier using one-hot encoding and K-nearest neighbor (KNN)-based imputation performs the best among the studied classifiers with an F1 score of 89% and area under the curve (AUC) score of 0.96. (ii) One-hot encoding together with K-nearest neighbor-based data imputation increases the F1 scores by around 26% in comparison to the baseline approach which does not employ them. (iii) The proposed model outperforms a commonly employed state-of-the-art approach, ColonFlag, under all evaluated settings by up to 24% in terms of the AUC score. Based on the high accuracy of the constructed predictive models, the studied 11 genes may be considered a gene panel candidate for colon cancer risk screening."],"publicationDate":"2022-11-11","publisher":"Springer Science and Business Media LLC","embargoEndDate":null,"sources":["Crossref"],"formats":["application/pdf"],"contributors":["Mühendislik ve Doğa Bilimleri Fakültesi"],"coverages":null,"bestAccessRight":{"code":"c_abf2","label":"OPEN","scheme":"http://vocabularies.coar-repositories.org/documentation/access_rights/"},"container":{"name":"Medical & Biological Engineering & Computing","issnPrinted":"0140-0118","issnOnline":"1741-0444","issnLinking":null,"ep":"258","iss":null,"sp":"243","vol":"61","edition":null,"conferencePlace":null,"conferenceDate":null},"documentationUrls":null,"codeRepositoryUrl":null,"programmingLanguage":null,"contactPeople":null,"contactGroups":null,"tools":null,"size":null,"version":null,"geoLocations":null,"id":"doi_dedup___::6265a033d7eff3c20f53d823b21affe4","originalIds":["2707","10.1007/s11517-022-02707-9","50|doiboost____|6265a033d7eff3c20f53d823b21affe4","36357628","50|od______4610::f35f5696667faef8441bf20962683e1a","oai:openaccess.biruni.edu.tr:20.500.12445/2709"],"pids":[{"scheme":"doi","value":"10.1007/s11517-022-02707-9"},{"scheme":"pmid","value":"36357628"},{"scheme":"handle","value":"20.500.12445/2709"}],"dateOfCollection":null,"lastUpdateTimeStamp":null,"indicators":{"citationImpact":{"citationCount":4,"influence":2.6669362e-9,"popularity":4.942375e-9,"impulse":4,"citationClass":"C5","influenceClass":"C5","impulseClass":"C5","popularityClass":"C4"}},"instances":[{"pids":[{"scheme":"doi","value":"10.1007/s11517-022-02707-9"}],"license":"Springer Nature TDM","type":"Article","urls":["https://doi.org/10.1007/s11517-022-02707-9"],"publicationDate":"2022-11-11","refereed":"peerReviewed"},{"pids":[{"scheme":"pmid","value":"36357628"}],"alternateIdentifiers":[{"scheme":"doi","value":"10.1007/s11517-022-02707-9"}],"type":"Article","urls":["https://pubmed.ncbi.nlm.nih.gov/36357628"],"publicationDate":"2023-01-09","refereed":"nonPeerReviewed"},{"pids":[{"scheme":"handle","value":"20.500.12445/2709"}],"alternateIdentifiers":[{"scheme":"doi","value":"10.1007/s11517-022-02707-9"}],"type":"Article","urls":["https://doi.org/10.1007/s11517-022-02707-9","https://hdl.handle.net/20.500.12445/2709"],"publicationDate":"2022-11-29","refereed":"nonPeerReviewed"}],"isGreen":true,"isInDiamondJournal":false}
local.import.sourceOpenAire
local.indexed.atWOS
local.indexed.atScopus
local.indexed.atPubMed

Dosyalar

Koleksiyonlar