Artificial intelligence in colonoscopy: from detection to diagnosis
Article information
Abstract
This study reviews the recent progress of artificial intelligence for colonoscopy from detection to diagnosis. The source of data was 27 original studies in PubMed. The search terms were “colonoscopy” (title) and “deep learning” (abstract). The eligibility criteria were: (1) the dependent variable of gastrointestinal disease; (2) the interventions of deep learning for classification, detection and/or segmentation for colonoscopy; (3) the outcomes of accuracy, sensitivity, specificity, area under the curve (AUC), precision, F1, intersection of union (IOU), Dice and/or inference frames per second (FPS); (3) the publication year of 2021 or later; (4) the publication language of English. Based on the results of this study, different deep learning methods would be appropriate for different tasks for colonoscopy, e.g., Efficientnet with neural architecture search (AUC 99.8%) in the case of classification, You Only Look Once with the instance tracking head (F1 96.3%) in the case of detection, and Unet with dense-dilation-residual blocks (Dice 97.3%) in the case of segmentation. Their performance measures reported varied within 74.0–95.0% for accuracy, 60.0–93.0% for sensitivity, 60.0–100.0% for specificity, 71.0–99.8% for the AUC, 70.1–93.3% for precision, 81.0–96.3% for F1, 57.2–89.5% for the IOU, 75.1–97.3% for Dice and 66–182 for FPS. In conclusion, artificial intelligence provides an effective, non-invasive decision support system for colonoscopy from detection to diagnosis.
INTRODUCTION
Gastrointestinal disease (GID) is a main contributor for disease burden in the world [1-6]. One popular definition of GID would be “the disease of the gastrointestinal tract including the esophagus, liver, stomach, small and large intestines, gallbladder and pancreas” [1]. GID causes 8 million deaths in the world in a year [2] and costs 120 billion dollars in the United States for 2018 [3]. GID comes from various factors including bad health behavior, unhealthy bowel habits, excessive anti-diarrheal/antacid medication and pregnancy [6]. Colonoscopy is usually considered to be the most effective approach for the diagnosis of GID [7-12]. Based on micro-simulation, the incremental cost effectiveness ratio of computed tomography colonoscopy for those aged 50–75 years every 5 years was minimal ($1,092) with respect to fecal immunochemical test every year [7]. According to cohort simulation, likewise, the incremental cost effectiveness ratio of organized colonoscopy for those aged 55–64 years once in their lifetime was $6,500 (below the accepted willingness to pay threshold) with respect to no screening [8]. Moreover, artificial intelligence is expected to aid in colonoscopy effectively [12]. The performance of colonoscopy varies depending on tumor sizes and screening conditions such as screen shaking and fluid injection. Artificial intelligence would be an invaluable decision supporting system to solve this problem [12].
Based on the Merriam-Webster dictionary, artificial intelligence can be defined as “the capability of a machine to imitate intelligent human behavior”. An artificial neural network, a popular artificial intelligence approach, consists of information units (so called “neurons”) that are networked with weights. It usually includes one input layer, one, two, or three intermediate layers, and one output layer. An artificial neural network with many intermediate layers is called a deep neural network or deep learning [13-15]. Various deep learning models have been developed for various data forms. For example, the convolutional neural network is designed for extracting the global information of image data. A kernel operates across input data, calculating the maximum/average of its corresponding input data elements (“max/average pooling”) or the dot product of its own elements and their input data counterparts (“convolution”). These operations classify certain features of the input data, e.g., the form of a tumor vs. that of a normal cell [16]. On the other hand, the recurrent neural network is designed for extracting the local information of sequence data. The current output information comes in a repetitive (or “recurrent”) pattern from the current input information and the previous hidden state (the memory of the network on what happened in all previous periods) [17].
Unet is a common convolutional neural network for colonoscopy now. Its “U-shaped” encoder-decoder structure is designed to combine the strengths of the contracting path for down-sampling on input image tiles (i.e., extracting global information) and the expanding path for up-sampling on output segmentation maps (i.e., extracting local information). Its contracting path for down-sampling consists of the repeated application of two 3 × 3 convolutional layers (each layer followed by a rectified-linear-unit and a 2 × 2 max-pooling layer). Here, 3 × 3 (or 2 × 2) denotes the size of the convolutional (or max pooling) kernel. Its expanding path for up-sampling consists of (1) up-sampling/de-convolution by a 2 × 2 convolutional layer, (2) the concatenation (copy-crop) of feature maps from the contracting path and (3) the repeated application of two 3 × 3 convolutional layers (each layer followed by a rectified-linear-unit layer). Its overlap tile minimizes overlap and maximizes efficiency as well [18]. Efficientnet is another popular convolutional neural network for colonoscopy at this point, finding the optimal balance of network depth, width and resolution with neural architecture search [19,20]. There has been a rapid expansion of literature on the application of artificial intelligence for colonoscopy and this study reviews the recent progress of artificial intelligence for colonoscopy from detection to diagnosis.
METHODS
Figure 1 shows the flow diagram of this study. The source of data was 27 original studies in PubMed [21-47]. The search terms were “colonoscopy” (title) and “deep learning” (abstract). The eligibility criteria were: (1) the dependent variable of GID; (2) the interventions of deep learning for classification, detection and/or segmentation for colonoscopy; (3) the outcomes of accuracy, sensitivity, specificity, area under the curve (AUC), precision, F1, intersection of union (IOU), Dice and/or inference frames per second (FPS); (3) the publication year of 2021 or later; (4) the publication language of English.
RESULTS
Review summary
The summary of review is shown in Tables 1–3 for classification, detection and segmentation. The tables have four summary measures, i.e., sample size, deep learning methods, performance measures compared to baseline models and tasks for colonoscopy. Based on the results of this review, different deep learning methods would be appropriate for different tasks for colonoscopy, e.g., Efficientnet (AUC 99.8%) in the case of classification, You Only Look Once with the instance tracking head (ITH; F1 96.3%) in the case of detection, and Unet with dense-dilation-residual blocks (Dice 97.3%) in the case of segmentation. Their performance measures reported varied within 74.0–95.0% for accuracy, 60.0–93.0% for sensitivity, 60.0–100.0% for specificity, 71.0–99.8% for the AUC, 70.1–93.3% for precision, 81.0–96.3% for F1, 57.2–89.5% for the IOU, 75.1–97.3% for Dice and 66–182 for FPS. However, artificial intelligence is a data-driven method and more study is to be done with more external data for greater external validity.
Classification
The review of major studies regarding deep learning classification for colonoscopy is given in this section. The task of deep learning classification for colonoscopy centered on the states of the polyp, the colon and Crohn’s disease. Here, the sample size varied from 99 to 56,872, while Bidirectional Encoder Representations from Transformers (BERT), Efficientnet, fuzzy inference, region-convolutional neural network (R-CNN), Resnet and its Inception/Xception ensemble were common approaches. The range of their performance indicators were 74.0–95.0% for accuracy, 60.0–93.0% for sensitivity, 60.0–100.0% for specificity and 71.0–99.8% for the AUC. Among these approaches, Efficientnet registered the best performance with the AUC of 99.8% [37]. The aim of this study was to develop and validate deep learning classification models for colonoscopy on six states of the colon, i.e., advanced tubular adenocarcinoma, tubular adenoma, traditional serrated adenoma, sessile serrated adenoma, hyperplastic polyp and non-specific change. Data came from 1,865 images from 703 patients who had colonoscopy at a general hospital in a metropolitan area around Seoul during 2017–2019. The 1,865 images were split into training, validation and test sets with an 80:10:10 ratio (1,484:173:208 images). A major criterion for the test of the trained and validated models was the AUC. Efficientnet-B7 and Densenet-161 (baseline) were trained, validated, tested and compared. Based on the results of this study, the AUCs of Efficientnet were higher than those of Densenet in general: 99.7% vs. 100.0% for advanced tubular adenocarcinoma, 99.7% vs. 99.5% for tubular adenoma, 100.0% vs. 99.9% for traditional serrated adenoma, 99.5% vs. 99.3% for sessile serrated adenoma, 99.5% vs. 99.1% for hyperplastic polyp, 99.7% vs. 99.5% for non-specific change, and 99.8% vs. 99.5% on average. The sensitivity of Efficientnet was superior to that of Densenet as well, i.e., 98.5% vs. 97.1%. According to the findings of Gradient-Weighted Class Activation Mapping, both CNNs did put more focus on epithelial lesions than their stroma counterparts.
Detection
The review of major studies regarding detection for colonoscopy is presented in this section. The emphasis of deep learning detection for colonoscopy was on the object of the polyp. In this task, the sample size showed a variation from 700 to 37,899, whereas Single Shot Detector, Unet, You Only Look Once, and their variations such as Generative Adversarial Network data augmentation were popular choices. The scope of their performance measures were 70.1–93.3% for precision, 81.0–96.3% for F1, 57.2–89.5% for the Intersection over the Union, and 66–180 for FPS. Among these choices, You Only Look Once with the ITH gave the best performance with the F1 of 96.3% [39]. The purpose of this study was to develop and validate deep learning detection models for colonoscopy on the object of the polyp. The data source was 14,202 images from one private and three public sources including CVC-ClincDB, CVC-VideoClinicDB and ETIS-LARIB. A major criteria for the test of the trained and validated models were F1 and FPS. You Only Look Once with the ITH and its Single Shot Detector counterpart (baseline) were trained, validated, tested and compared. Here, the ITH was introduced to improve the performance by tracking the embedding extractions for the regions of interests from three consecutive images together with conducting detection tasks. Based on the findings of this study, the former model outperformed its baseline counterpart in terms of accuracy and speed at the same time, i.e., F1 96.3% vs. 93.8%, FPS 66 vs. 43.
Segmentation
The review of important studies regarding segmentation for colonoscopy is reported in this section. The focus of deep learning segmentation for colonoscopy was on the lesion of the polyp as well. In this area, the smallest (or biggest) sample size was 1,000 (or 777,627), and Unet and its extensions including dense-dilation-residual blocks were usual models. The lower and upper bounds of their performance scores were 75.1–97.3% for Dice and 112–182 for FPS. Among these models, Unet with dense-dilation-residual blocks presented the best performance with the Dice of 97.3% [42]. This study strived to develop and validate deep learning segmentation models for colonoscopy on the lesion of the polyp. The data origin was 1,612 images from two public sources, Kvasir-SEG and CVC-ClinicDB. The 1,612 images were split into training, validation and test sets with a 70:10:20 ratio (1,144:164:312 images). A major criteria for the test of the trained and validated models were Dice and FPS. Unet (baseline) and its various extensions (e.g., dense-dilation-residual blocks in this study) were trained, validated, tested and compared. This study made a unique contribution, given that previous studies employed only one or two of dense, dilation and residual blocks. Unet with dense-dilation-residual blocks (so called Nnet) surpassed Unet (baseline) and its previous extensions, e.g., Dice 97.3% vs. 91.6% (Unet).
DISCUSSION
This study reviewed the recent progress of artificial intelligence for colonoscopy from detection to diagnosis. Different deep learning methods were found to be appropriate for different tasks for colonoscopy, e.g., Efficientnet with neural architecture search (AUC 99.8%) in the case of classification, You Only Look Once with the ITH (F1 96.3%) in the case of detection, and Unet with dense-dilation-residual blocks (Dice 97.3%) in the case of segmentation. Their performance measures reported varied within 74.0–95.0% for accuracy, 60.0–93.0% for sensitivity, 60.0–100.0% for specificity, 97.7–99.8% for the AUC, 70.1–93.3% for precision, 81.0–96.3% for F1, 57.2–89.5% for the IOU, 75.1–97.3% for Dice and 66–180 for FPS. However, it can be noted that this study focused on performance outcomes and ignored data characteristics including the categories and the structures. The selection of major studies based on performance results can be biased for this reason. It will be important for future research to give a full consideration regarding this important issue.
Indeed, little examination has been done and more investigation is needed on reinforcement learning for colonoscopy. Reinforcement learning is an artificial intelligence approach with the following components: the environment presents a series of rewards; an agent takes a series of actions to maximize the cumulative reward in response; and the environment moves to the next period with given transition probabilities [48]. Reinforcement learning has been known for its revolutionary idea of temporal difference learning: artificial intelligence (e.g., Alpha-Go) began as if a human player takes a series of actions and maximizes the cumulative reward (e.g., the chance of victory) from the limited information available in limited periods only; then it goes very far beyond the best human player ever with the absolute power of big data absorbing all human players up to now [49]. It is reinforcement learning (or temporal difference learning) that encapsulates the crucial qualities of artificial intelligence as “being similar with but superior to human intelligence” [49]. However, little literature has been available and more research is to be done on reinforcement learning for colonoscopy. Especially, it can be pointed out that more effort is essential for data collection and standardization in this direction. Reinforcement learning requires the collection and standardization of massive high-quality data with respect to its major components, i.e., rewards, actions, transition probabilities. But such endeavor has been very limited for colonoscopy because of ethical concerns and little interest on this issue. Overcoming this challenge is expected to be a major breakthrough for the application of artificial intelligence for colonoscopy.
In spite of this limitation, however, this study demonstrates that artificial intelligence provides an effective, non-invasive decision support system for colonoscopy from detection to diagnosis.
Notes
CRedit authorship contributions
Eun Sun Kim: conceptualization, methodology, resources, investigation, data curation, formal analysis, validation, software, writing - original draft, writing - review & editing, visualization, supervision, project administration, funding acquisition), Kwang-Sig Lee: conceptualization, methodology, resources, investigation, data curation, formal analysis, validation, software, writing - original draft, writing - review & editing, visualization, supervision, project administration, funding acquisition
Conflicts of interest
The authors disclose no conflicts.
Funding
This work was supported by (1) a Technology Innovation Program grant (Development of AI Base Multimodal Endomicroscope for In Situ Diagnosis Cancer) funded by the Ministry of Trade, Industry, and Energy of South Korea (No. 20001533) and (2) a Korea Health Industry Development Institute grant (Korea Health Technology R&D Project) funded by the Ministry of Health and Welfare of South Korea (No. HI22C1302). The funders had no role in the design of the study; in the collection, analysis, and interpretation of the data; or in the writing and review of the manuscript.