Artificial intelligence in colonoscopy: from detection to diagnosis

Article information

Korean J Intern Med. 2024;39(4):555-562

Publication date (electronic) : 2024 May 2

doi : https://doi.org/10.3904/kjim.2023.332

Eun Sun Kim ¹

, Kwang-Sig Lee ²

¹Department of Gastroenterology, Korea University Anam Hospital, Seoul, Korea

²AI Center, Korea University Anam Hospital, Seoul, Korea

Correspondence to Eun Sun Kim Department of Gastroenterology, Korea University Anam Hospital, 73 Goryeodae-ro, Seongbuk-gu, Seoul 02841, Korea Tel: +82-2-920-6555 E-mail: silverkes@naver.com

Kwang-Sig Lee AI Center, Korea University Anam Hospital, 73 Goryeodae-ro, Seongbuk-gu, Seoul 02841, Korea Tel: +82-2-2286-1057 E-mail: ecophy@hanmail.net

Received 2023 August 10; Revised 2023 October 30; Accepted 2023 November 13.

Abstract

This study reviews the recent progress of artificial intelligence for colonoscopy from detection to diagnosis. The source of data was 27 original studies in PubMed. The search terms were “colonoscopy” (title) and “deep learning” (abstract). The eligibility criteria were: (1) the dependent variable of gastrointestinal disease; (2) the interventions of deep learning for classification, detection and/or segmentation for colonoscopy; (3) the outcomes of accuracy, sensitivity, specificity, area under the curve (AUC), precision, F1, intersection of union (IOU), Dice and/or inference frames per second (FPS); (3) the publication year of 2021 or later; (4) the publication language of English. Based on the results of this study, different deep learning methods would be appropriate for different tasks for colonoscopy, e.g., Efficientnet with neural architecture search (AUC 99.8%) in the case of classification, You Only Look Once with the instance tracking head (F1 96.3%) in the case of detection, and Unet with dense-dilation-residual blocks (Dice 97.3%) in the case of segmentation. Their performance measures reported varied within 74.0–95.0% for accuracy, 60.0–93.0% for sensitivity, 60.0–100.0% for specificity, 71.0–99.8% for the AUC, 70.1–93.3% for precision, 81.0–96.3% for F1, 57.2–89.5% for the IOU, 75.1–97.3% for Dice and 66–182 for FPS. In conclusion, artificial intelligence provides an effective, non-invasive decision support system for colonoscopy from detection to diagnosis.

Keywords: Colonoscopy; Artificial intelligence; Detection; Segmentation; Diagnosis

INTRODUCTION

Gastrointestinal disease (GID) is a main contributor for disease burden in the world [1-6]. One popular definition of GID would be “the disease of the gastrointestinal tract including the esophagus, liver, stomach, small and large intestines, gallbladder and pancreas” [1]. GID causes 8 million deaths in the world in a year [2] and costs 120 billion dollars in the United States for 2018 [3]. GID comes from various factors including bad health behavior, unhealthy bowel habits, excessive anti-diarrheal/antacid medication and pregnancy [6]. Colonoscopy is usually considered to be the most effective approach for the diagnosis of GID [7-12]. Based on micro-simulation, the incremental cost effectiveness ratio of computed tomography colonoscopy for those aged 50–75 years every 5 years was minimal ($1,092) with respect to fecal immunochemical test every year [7]. According to cohort simulation, likewise, the incremental cost effectiveness ratio of organized colonoscopy for those aged 55–64 years once in their lifetime was $6,500 (below the accepted willingness to pay threshold) with respect to no screening [8]. Moreover, artificial intelligence is expected to aid in colonoscopy effectively [12]. The performance of colonoscopy varies depending on tumor sizes and screening conditions such as screen shaking and fluid injection. Artificial intelligence would be an invaluable decision supporting system to solve this problem [12].

Based on the Merriam-Webster dictionary, artificial intelligence can be defined as “the capability of a machine to imitate intelligent human behavior”. An artificial neural network, a popular artificial intelligence approach, consists of information units (so called “neurons”) that are networked with weights. It usually includes one input layer, one, two, or three intermediate layers, and one output layer. An artificial neural network with many intermediate layers is called a deep neural network or deep learning [13-15]. Various deep learning models have been developed for various data forms. For example, the convolutional neural network is designed for extracting the global information of image data. A kernel operates across input data, calculating the maximum/average of its corresponding input data elements (“max/average pooling”) or the dot product of its own elements and their input data counterparts (“convolution”). These operations classify certain features of the input data, e.g., the form of a tumor vs. that of a normal cell [16]. On the other hand, the recurrent neural network is designed for extracting the local information of sequence data. The current output information comes in a repetitive (or “recurrent”) pattern from the current input information and the previous hidden state (the memory of the network on what happened in all previous periods) [17].

Unet is a common convolutional neural network for colonoscopy now. Its “U-shaped” encoder-decoder structure is designed to combine the strengths of the contracting path for down-sampling on input image tiles (i.e., extracting global information) and the expanding path for up-sampling on output segmentation maps (i.e., extracting local information). Its contracting path for down-sampling consists of the repeated application of two 3 × 3 convolutional layers (each layer followed by a rectified-linear-unit and a 2 × 2 max-pooling layer). Here, 3 × 3 (or 2 × 2) denotes the size of the convolutional (or max pooling) kernel. Its expanding path for up-sampling consists of (1) up-sampling/de-convolution by a 2 × 2 convolutional layer, (2) the concatenation (copy-crop) of feature maps from the contracting path and (3) the repeated application of two 3 × 3 convolutional layers (each layer followed by a rectified-linear-unit layer). Its overlap tile minimizes overlap and maximizes efficiency as well [18]. Efficientnet is another popular convolutional neural network for colonoscopy at this point, finding the optimal balance of network depth, width and resolution with neural architecture search [19,20]. There has been a rapid expansion of literature on the application of artificial intelligence for colonoscopy and this study reviews the recent progress of artificial intelligence for colonoscopy from detection to diagnosis.

METHODS

Figure 1 shows the flow diagram of this study. The source of data was 27 original studies in PubMed [21-47]. The search terms were “colonoscopy” (title) and “deep learning” (abstract). The eligibility criteria were: (1) the dependent variable of GID; (2) the interventions of deep learning for classification, detection and/or segmentation for colonoscopy; (3) the outcomes of accuracy, sensitivity, specificity, area under the curve (AUC), precision, F1, intersection of union (IOU), Dice and/or inference frames per second (FPS); (3) the publication year of 2021 or later; (4) the publication language of English.

Figure 1.

Flow diagram.

RESULTS

Review summary

The summary of review is shown in Tables 1–3 for classification, detection and segmentation. The tables have four summary measures, i.e., sample size, deep learning methods, performance measures compared to baseline models and tasks for colonoscopy. Based on the results of this review, different deep learning methods would be appropriate for different tasks for colonoscopy, e.g., Efficientnet (AUC 99.8%) in the case of classification, You Only Look Once with the instance tracking head (ITH; F1 96.3%) in the case of detection, and Unet with dense-dilation-residual blocks (Dice 97.3%) in the case of segmentation. Their performance measures reported varied within 74.0–95.0% for accuracy, 60.0–93.0% for sensitivity, 60.0–100.0% for specificity, 71.0–99.8% for the AUC, 70.1–93.3% for precision, 81.0–96.3% for F1, 57.2–89.5% for the IOU, 75.1–97.3% for Dice and 66–182 for FPS. However, artificial intelligence is a data-driven method and more study is to be done with more external data for greater external validity.

Table 1.

Review summary: classification

Table 2.

Review summary: detection

Table 3.

Review summary: segmentation

Classification

The review of major studies regarding deep learning classification for colonoscopy is given in this section. The task of deep learning classification for colonoscopy centered on the states of the polyp, the colon and Crohn’s disease. Here, the sample size varied from 99 to 56,872, while Bidirectional Encoder Representations from Transformers (BERT), Efficientnet, fuzzy inference, region-convolutional neural network (R-CNN), Resnet and its Inception/Xception ensemble were common approaches. The range of their performance indicators were 74.0–95.0% for accuracy, 60.0–93.0% for sensitivity, 60.0–100.0% for specificity and 71.0–99.8% for the AUC. Among these approaches, Efficientnet registered the best performance with the AUC of 99.8% [37]. The aim of this study was to develop and validate deep learning classification models for colonoscopy on six states of the colon, i.e., advanced tubular adenocarcinoma, tubular adenoma, traditional serrated adenoma, sessile serrated adenoma, hyperplastic polyp and non-specific change. Data came from 1,865 images from 703 patients who had colonoscopy at a general hospital in a metropolitan area around Seoul during 2017–2019. The 1,865 images were split into training, validation and test sets with an 80:10:10 ratio (1,484:173:208 images). A major criterion for the test of the trained and validated models was the AUC. Efficientnet-B7 and Densenet-161 (baseline) were trained, validated, tested and compared. Based on the results of this study, the AUCs of Efficientnet were higher than those of Densenet in general: 99.7% vs. 100.0% for advanced tubular adenocarcinoma, 99.7% vs. 99.5% for tubular adenoma, 100.0% vs. 99.9% for traditional serrated adenoma, 99.5% vs. 99.3% for sessile serrated adenoma, 99.5% vs. 99.1% for hyperplastic polyp, 99.7% vs. 99.5% for non-specific change, and 99.8% vs. 99.5% on average. The sensitivity of Efficientnet was superior to that of Densenet as well, i.e., 98.5% vs. 97.1%. According to the findings of Gradient-Weighted Class Activation Mapping, both CNNs did put more focus on epithelial lesions than their stroma counterparts.

Detection

The review of major studies regarding detection for colonoscopy is presented in this section. The emphasis of deep learning detection for colonoscopy was on the object of the polyp. In this task, the sample size showed a variation from 700 to 37,899, whereas Single Shot Detector, Unet, You Only Look Once, and their variations such as Generative Adversarial Network data augmentation were popular choices. The scope of their performance measures were 70.1–93.3% for precision, 81.0–96.3% for F1, 57.2–89.5% for the Intersection over the Union, and 66–180 for FPS. Among these choices, You Only Look Once with the ITH gave the best performance with the F1 of 96.3% [39]. The purpose of this study was to develop and validate deep learning detection models for colonoscopy on the object of the polyp. The data source was 14,202 images from one private and three public sources including CVC-ClincDB, CVC-VideoClinicDB and ETIS-LARIB. A major criteria for the test of the trained and validated models were F1 and FPS. You Only Look Once with the ITH and its Single Shot Detector counterpart (baseline) were trained, validated, tested and compared. Here, the ITH was introduced to improve the performance by tracking the embedding extractions for the regions of interests from three consecutive images together with conducting detection tasks. Based on the findings of this study, the former model outperformed its baseline counterpart in terms of accuracy and speed at the same time, i.e., F1 96.3% vs. 93.8%, FPS 66 vs. 43.

Segmentation

The review of important studies regarding segmentation for colonoscopy is reported in this section. The focus of deep learning segmentation for colonoscopy was on the lesion of the polyp as well. In this area, the smallest (or biggest) sample size was 1,000 (or 777,627), and Unet and its extensions including dense-dilation-residual blocks were usual models. The lower and upper bounds of their performance scores were 75.1–97.3% for Dice and 112–182 for FPS. Among these models, Unet with dense-dilation-residual blocks presented the best performance with the Dice of 97.3% [42]. This study strived to develop and validate deep learning segmentation models for colonoscopy on the lesion of the polyp. The data origin was 1,612 images from two public sources, Kvasir-SEG and CVC-ClinicDB. The 1,612 images were split into training, validation and test sets with a 70:10:20 ratio (1,144:164:312 images). A major criteria for the test of the trained and validated models were Dice and FPS. Unet (baseline) and its various extensions (e.g., dense-dilation-residual blocks in this study) were trained, validated, tested and compared. This study made a unique contribution, given that previous studies employed only one or two of dense, dilation and residual blocks. Unet with dense-dilation-residual blocks (so called Nnet) surpassed Unet (baseline) and its previous extensions, e.g., Dice 97.3% vs. 91.6% (Unet).

DISCUSSION

This study reviewed the recent progress of artificial intelligence for colonoscopy from detection to diagnosis. Different deep learning methods were found to be appropriate for different tasks for colonoscopy, e.g., Efficientnet with neural architecture search (AUC 99.8%) in the case of classification, You Only Look Once with the ITH (F1 96.3%) in the case of detection, and Unet with dense-dilation-residual blocks (Dice 97.3%) in the case of segmentation. Their performance measures reported varied within 74.0–95.0% for accuracy, 60.0–93.0% for sensitivity, 60.0–100.0% for specificity, 97.7–99.8% for the AUC, 70.1–93.3% for precision, 81.0–96.3% for F1, 57.2–89.5% for the IOU, 75.1–97.3% for Dice and 66–180 for FPS. However, it can be noted that this study focused on performance outcomes and ignored data characteristics including the categories and the structures. The selection of major studies based on performance results can be biased for this reason. It will be important for future research to give a full consideration regarding this important issue.

Indeed, little examination has been done and more investigation is needed on reinforcement learning for colonoscopy. Reinforcement learning is an artificial intelligence approach with the following components: the environment presents a series of rewards; an agent takes a series of actions to maximize the cumulative reward in response; and the environment moves to the next period with given transition probabilities [48]. Reinforcement learning has been known for its revolutionary idea of temporal difference learning: artificial intelligence (e.g., Alpha-Go) began as if a human player takes a series of actions and maximizes the cumulative reward (e.g., the chance of victory) from the limited information available in limited periods only; then it goes very far beyond the best human player ever with the absolute power of big data absorbing all human players up to now [49]. It is reinforcement learning (or temporal difference learning) that encapsulates the crucial qualities of artificial intelligence as “being similar with but superior to human intelligence” [49]. However, little literature has been available and more research is to be done on reinforcement learning for colonoscopy. Especially, it can be pointed out that more effort is essential for data collection and standardization in this direction. Reinforcement learning requires the collection and standardization of massive high-quality data with respect to its major components, i.e., rewards, actions, transition probabilities. But such endeavor has been very limited for colonoscopy because of ethical concerns and little interest on this issue. Overcoming this challenge is expected to be a major breakthrough for the application of artificial intelligence for colonoscopy.

In spite of this limitation, however, this study demonstrates that artificial intelligence provides an effective, non-invasive decision support system for colonoscopy from detection to diagnosis.

Notes

CRedit authorship contributions

Eun Sun Kim: conceptualization, methodology, resources, investigation, data curation, formal analysis, validation, software, writing - original draft, writing - review & editing, visualization, supervision, project administration, funding acquisition), Kwang-Sig Lee: conceptualization, methodology, resources, investigation, data curation, formal analysis, validation, software, writing - original draft, writing - review & editing, visualization, supervision, project administration, funding acquisition

Conflicts of interest

The authors disclose no conflicts.

Funding

This work was supported by (1) a Technology Innovation Program grant (Development of AI Base Multimodal Endomicroscope for In Situ Diagnosis Cancer) funded by the Ministry of Trade, Industry, and Energy of South Korea (No. 20001533) and (2) a Korea Health Industry Development Institute grant (Korea Health Technology R&D Project) funded by the Ministry of Health and Welfare of South Korea (No. HI22C1302). The funders had no role in the design of the study; in the collection, analysis, and interpretation of the data; or in the writing and review of the manuscript.

References

1. Johns Hopkins Medicine. Digestive disorders. Baltimore (MD): Johns Hopkins Medicine, c2023 [cited 2023 Jul 1]. Available from: https://www.hopkinsmedicine.org/health/wellness-and-prevention/digestive-disorders.

2. Milivojevic V, Milosavljevic T. Burden of gastroduodenal diseases from the global perspective. Curr Treat Options Gastroenterol 2020;18:148–157.

3. Peery AF, Crockett SD, Murphy CC, et al. Burden and cost of gastrointestinal, liver, and pancreatic diseases in the United States: update 2021. Gastroenterology 2022;162:621–644.

4. Kim YE, Park H, Jo MW, et al. Trends and patterns of burden of disease and injuries in Korea using disability-adjusted life years. J Korean Med Sci 2019;34(Suppl 1)e75.

5. Jung HK, Jang B, Kim YH, et al. [Health care costs of digestive diseases in Korea]. Korean J Gastroenterol 2011;58:323–331. Korean.

6. Cleveland Clinic. Health: gastrointestinal diseases. Cleveland (OH): Cleveland Clinic, c2023 [cited 2023 Jul 1]. Available from: https://my.clevelandclinic.org/health/articles/7040-gastrointestinal-diseases.

7. Peterse EFP, Meester RGS, de Jonge L, et al. Comparing the cost-effectiveness of innovative colorectal cancer screening tests. J Natl Cancer Inst 2021;113:154–161.

8. Krzeczewski B, Hassan C, Krzeczewska O, et al. Cost-effectiveness of colonoscopy in an organized screening program. Pol Arch Intern Med 2021;131:128–135.

9. Areia M, Mori Y, Correale L, et al. Cost-effectiveness of artificial intelligence for screening colonoscopy: a modelling study. Lancet Digit Health 2022;4e436.

10. Ren Y, Zhao M, Zhou D, Xing Q, Gong F, Tang W. Cost-effectiveness analysis of colonoscopy and fecal immunochemical testing for colorectal cancer screening in China. Front Public Health 2022;10:952378.

11. Mori Y, East JE, Hassan C, et al. Benefits and challenges in implementation of artificial intelligence in colonoscopy: World Endoscopy Organization position statement. Dig Endosc 2023;35:422–429.

12. Lee KS, Son SH, Park SH, Kim ES. Automated detection of colorectal tumors based on artificial intelligence. BMC Med Inform Decis Mak 2021;21:33.

13. Lee KS, Ahn KH. Application of artificial intelligence in early diagnosis of spontaneous preterm labor and birth. Diagnostics (Basel) 2020;10:733.

14. Lee KS, Park H. Machine learning on thyroid disease: a review. Front Biosci (Landmark Ed) 2022;27:101.

15. Lee KS, Ham BJ. Machine learning on early diagnosis of depression. Psychiatry Investig 2022;19:597–605.

16. Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems; 2011 Dec 12-17; Siem Reap, Cambodia: Neural Information Processing Systems, 2011: 1097-1105.

17. Lee KS, Park KW. Social determinants of the association among cerebrovascular disease, hearing loss and cognitive impairment in a middle-aged or older population: recurrent neural network analysis of the Korean Longitudinal Study of Aging (2014-2016). Geriatr Gerontol Int 2019;19:711–716.

18. Ronneberger O, Fischer P, Brox T. U-Net: convolutional networks for biomedical image segmentation. arXiv [Preprint] 2015;[cited 2023 Jul 1]. Available from: https://doi.org/10.48550/arXiv.1505.04597.

19. Tan M, Le QV. EfficientNet: rethinking model scaling for Convolutional Neural Networks. arXiv [Preprint] 2019;[cited 2023 Jul 1]. Available from: https://doi.org/10.48550/arXiv.1905.11946.

20. Tan M, Le QV. EfficientNetV2: smaller models and faster training. arXiv [Preprint] 2021;[cited 2023 Jul 1]. Availble from: https://doi.org/10.48550/arXiv.2104.00298.

21. Lai LL, Blakely A, Invernizzi M, et al. Separation of color channels from conventional colonoscopy images improves deep neural network detection of polyps. J Biomed Opt 2021;26:015001.

22. Golhar M, Bobrow TL, Khoshknab MP, Jit S, Ngamruengphong S, Durr NJ. Improving colonoscopy lesion classification using semi-supervised deep learning. IEEE Access 2021;9:631–640.

23. Safarov S, Whangbo TK. A-DenseUNet: Adaptive densely connected UNet for polyp segmentation in colonoscopy images with atrous convolution. Sensors (Basel) 2021;21:1441.

24. Jha D, Ali S, Tomar NK, et al. Real-time polyp detection, localization and segmentation in colonoscopy using deep learning. IEEE Access 2021;9:40496–40510.

25. Wang Y, Feng Z, Song L, Liu X, Liu S. Multiclassification of endoscopic colonoscopy images based on deep transfer learning. Comput Math Methods Med 2021;2021:2485934.

26. Su H, Lin B, Huang X, Li J, Jiang K, Duan X. MBFFNet: multibranch feature fusion network for colonoscopy. Front Bioeng Biotechnol 2021;9:696251.

27. Sziová B, Nagy S, Fazekas Z. Application of structural entropy and spatial filling factor in colonoscopy image classification. Entropy (Basel) 2021;23:936.

28. Ma R, Wang R, Zhang Y, et al. RNNSLAM: reconstructing the 3D colon to visualize missing regions during a colonoscopy. Med Image Anal 2021;72:102100.

29. Li K, Fathan MI, Patel K, et al. Colonoscopy polyp detection and classification: dataset creation and comparative evaluations. PLoS One 2021;16e0255809.

30. Yeung M, Sala E, Schönlieb CB, Rundo L. Focus U-Net: a novel dual attention-gated CNN for polyp segmentation during colonoscopy. Comput Biol Med 2021;137:104815.

31. Syed S, Angel AJ, Syeda HB, et al. The h-ANN Model: comprehensive colonoscopy concept compilation using combined contextual embeddings. Biomed Eng Syst Technol Int Jt Conf BIOSTEC Revis Sel Pap 2022;5:189–200.

32. Wang YP, Jheng YC, Sung KY, et al. Use of U-Net convolutional neural networks for automated segmentation of fecal material for objective evaluation of bowel preparation quality in colonoscopy. Diagnostics (Basel) 2022;12:613.

33. Nogueira-Rodríguez A, Reboiro-Jato M, Glez-Peña D, López-Fernández H. Performance of convolutional neural networks for polyp localization on public colonoscopy image datasets. Diagnostics (Basel) 2022;12:898.

34. Wang L, Chen L, Wang X, et al. Development of a convolutional neural network-based colonoscopy image assessment model for differentiating Crohn’s disease and ulcerative colitis. Front Med (Lausanne) 2022;9:789862.

35. Chen S, Urban G, Baldi P. Weakly supervised polyp segmentation in colonoscopy images using deep neural networks. J Imaging 2022;8:121.

36. Sharma P, Balabantaray BK, Bora K, Mallik S, Kasugai K, Zhao Z. An ensemble-based deep convolutional neural network for computer-aided polyps identification from colonoscopy. Front Genet 2022;13:844391.

37. Byeon SJ, Park J, Cho YA, Cho BJ. Automated histological classification for digital pathology images of colonoscopy specimen via deep learning. Sci Rep 2022;12:12804.

38. Souaidi M, El Ansari M. Multi-scale hybrid network for polyp detection in wireless capsule endoscopy and colonoscopy images. Diagnostics (Basel) 2022;12:2030.

39. Yu T, Lin N, Zhang X, et al. An end-to-end tracking method for polyp detectors in colonoscopy videos. Artif Intell Med 2022;131:102363.

40. Mathew S, Nadeem S, Kaufman A. CLTS-GAN: color-lighting-texture-specular reflection augmentation for colonoscopy. Med Image Comput Comput Assist Interv 2022;2022:519–529.

41. Ramzan M, Raza M, Sharif MI, Kadry S. Gastrointestinal tract polyp anomaly segmentation on colonoscopy images using graft-U-Net. J Pers Med 2022;12:1459.

42. Cui R, Yang R, Liu F, Cai C. N-Net: Lesion region segmentations using the generalized hybrid dilated convolutions for polyps in colonoscopy images. Front Bioeng Biotechnol 2022;10:963590.

43. Yamada M, Shino R, Kondo H, et al. Robust automated prediction of the revised Vienna Classification in colonoscopy using deep learning: development and initial external validation. J Gastroenterol 2022;57:879–889.

44. González-Bueno Puyal J, Brandao P, Ahmad OF, et al. Polyp detection on video colonoscopy using a hybrid 2D/3D CNN. Med Image Anal 2022;82:102625.

45. Raymann J, Rajalakshmi R. GAR-Net: guided attention residual network for polyp segmentation from colonoscopy video frames. Diagnostics (Basel) 2022;13:123.

46. Tang CP, Chang HY, Wang WC, Hu WX. A novel computer-aided detection/diagnosis system for detection and classification of polyps in colonoscopy. Diagnostics (Basel) 2023;13:170.

47. Lewis J, Cha YJ, Kim J. Dual encoder-decoder-based deep polyp segmentation network for colonoscopy images. Sci Rep 2023;13:1183.

48. Li Y. Deep reinforcement learning: an overview. arXiv [Preprint] 2017;[cited 2023 Jul 1]. Available from: https://doi.org/10.48550/arXiv.1701.07274.

49. Silver D, Huang A, Maddison CJ, et al. Mastering the game of Go with deep neural networks and tree search. Nature 2016;529:484–489.

Article information Continued

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Study	Sample Size	Methods	Performance vs. Baseline	Dependent variable
[21]	112	Mask RCNN NBI	Accuracy 95.0%, Sensitivity 93.0%, Specificity 100.0%	Polyp
[21]	150	Mask RCNN WLI	Accuracy 74.0%, Sensitivity 60.0%, Specificity 100.0%	Polyp
[22]	6,649	Resnet Jigsaw Learning for 2,554 Unlabeled Images	AUC 71% vs. 53% (Resnet)	Lesion
[25]	2,150	Resnet Transfer Learning	Accuracy 86.4% vs. 79.8% (Resnet)	4 Colon States
[27]	99	Fuzzy Inference	Sensitivity 65.0% Specificity 60.0%	Polyp
[31]	17,879	BERT-FLAIR	F1 91.8%/92.3%/88.6% vs. 90.0%/90.0%/83.4% for Colonoscopy/Pathology/Radiology (BERT)	Polyp Text
[34]	15,330	Resnet	Accuracy 92.0% vs. 90.7% (Best Clinician)	Crohn vs. UC vs. Normal
[36]	1,897	Resnet-Xception Ensemble	AUC 97.7% vs. 83.0% (Inception)	Polyp
[37]	1,865	Efficientnet	AUC 99.8% vs. 99.5% (Densenet)	6 Colon States
[43]	56,872	Resnet-Inception Ensemble	Sensitivity 90.1% vs. 89.6% Specificity 72.3% vs. 67.1% SPI 0.01 vs. 1.72 (Resnet)	4 Colon States

Study	Sample size	Methods	Performance vs. Baseline	Dependent variable
[24]	1,000	ColonSegNet: Unet Residual Blocks	Precision 80.0% vs. 85.1%, IOU 81.0% vs. 80.3%, FPS 180 vs. 48 (YOLOv4)	Polyp
[26]	1,450	MBFFNet: Unet Multi-Branch Feature Fusion	F1 94.5% vs. 93.5%, IOU 89.5% vs. 88.8%, FPS 112 vs. 90 (Unet)	Polyp
[29]	37,899	RefineDet: SSD Two Stages	F1 92.7%/82.2% vs. 91.7%/77.9% for Adenomatous/Hyperplastic (SSD)	Polyp
[33]	49,136	YOLO	F1 81.0%	Polyp
[38]	3,726	Inception-Based SSD	Precision 93.3% vs. 90.0% (VGG-Based)	Polyp WCE
[39]	14,203	YOLO Instance Tracking Head	F1 96.3% vs. 93.8%, FPS 66 vs. 43 (SSD Instance Tracking Head)	Polyp
[46]	700	YOLO GAN Data Augmentation	Precision 70.1% vs. 66.7%, IOU 57.2% vs. 54.8% (YOLO)	Polyp
[47]	1,450	PSnet: Unet Dual Encoder & Dual Decoder	IOU 79.7% vs. 58.1% (Unet)	Polyp

Study	Sample size	Methods	Performance vs. Baseline	Dependent variable
[23]	1,612	DenseUnet: Unet DADR Blocks	Dice 90.9% vs. 70.6% (Unet)	Polyp
[24]	1,000	ColonSegNet: Unet Residual Blocks	Dice 82.1% vs. 87.6%, FPS 182 vs. 35 (Unet)	Polyp
[26]	1,450	MBFFNet: Unet Multi-Branch Feature Fusion	Dice 84.0% vs. 83.8%, FPS 112 vs. 90 (Unet)	Polyp
[28]	12,000	RNNSLAM: RNN Simultaneous Localization & Mapping	Depth RMSE 0.335 vs. 0.544 (RNN Depth & Pose Estimation)	Polyp 3D
[30]	2,394	FocusUnet: Unet Spatial & Channel-Based Attention	Dice 87.8% vs. 56.1% (Unet)	Polyp
[32]	10,118	Unet	Dice 94.7%	Fecal Material
[35]	4,070	Unet Bounding Boxes	Dice 85.5% vs. 81.5% (Unet)	Polyp
[40]	3,000	PRAnet GAN-CLTS	Dice 89.3% vs. 87.1% (PRAnet)	Polyp
[41]	1,612	Unet Graft (Proprocessing Role Added)	Dice 96.6% vs. 71.5% (Unet)	Polyp
[42]	1,612	Nnet: Unet Dense-Dilation-Residual Blocks	Dice 97.3% vs. 91.6% (Unet)	Polyp
[44]	777,627	Unet 2D Encoder & 3D Decoder	Dice 75.1% vs. 72.2% (Unet)	Polyp
[45]	1,612	Unet Guided Attention Resnet	Dice 91.0% vs. 88.0% (Unet)	Polyp
[47]	1,450	PSnet: Unet Dual Encoder & Dual Decoder	Dice 86.3% vs. 65.2% (Unet)	Polyp