|
|
||||||||
Original Research |
1 Department of Radiology, Mayo Clinic, 200 First St. SW, Rochester, MN
55905.
2 Department of Radiology, National Institutes of Health, Bethesda, MD.
3 Siemens Medical Solutions, Forchheim, Germany.
Received July 25, 2006;
accepted after revision March 28, 2007.
The polyp detection systems used in this study were provided by Siemens
Medical Solutions and R. M. Summers of the National Institutes of Health. One
of the coinvestigators, R. M. Summers, is coinventor of one of the polyp
detection systems tested, while two other coinvestigators, L. Guendel and B.
Schmidt, are employees of Siemens Medical Solutions. Two authors, J. G.
Fletcher and C. H. McCollough receive salary support through an unrestricted
grant from Siemens Medical Solutions. The other authors do not have a
potential conflict of interest relative to this study. One of the
radiologists, J. L. Fidler, without a conflict of interest, performed the
matching of the automated polyp detections to endoscopic findings to determine
polyp detection system performance. The authors at the Mayo Clinic retained
all data relating to endoscopic findings and performed the data analysis.
Abstract
|
|
|---|
MATERIALS AND METHODS. We evaluated two polyp detection systems—Polyp Enhanced Viewing (PEV) and the Summers computer-aided detection (CAD) system (National Institutes of Health [NIH]) using a unique cohort of CT colonography examinations: 31 examinations with true-positive lesions identified by radiologists and 34 examinations with false-positive lesions incorrectly identified by radiologists. All patients had reference-standard colonoscopy within 7 days of CT. Candidate lesions were compared with the endoscopic reference standard and prospective radiologist interpretation. The sensitivity and false-positive rates were calculated for each system.
RESULTS. The NIH system had a higher sensitivity than the PEV tool
for polyps
1 cm (22/23, 96%; 78-99%, 95% CI vs 14/23, 61%; 38-81%, 95%
CI; p = 0.008, respectively). There was no significant difference in
the detection of medium-sized polyps 6-9 mm in size (8/13 vs 6/13, p
= 0.68, respectively). The PEV tool had an average of 1.18 false-positive
detections per patient, whereas the NIH tool had an average of 5.20
false-positive detections per patient, with the PEV tool having significantly
fewer false-positive detections in both patient groups (p
<0.001).
CONCLUSION. One polyp detection system tended to operate with a higher sensitivity, whereas the other tended to operate with a lower false-positive rate. Prospective trials using polyp detection systems as a primary or secondary means of CT colonography interpretation appear warranted.
Keywords: computer-aided detection CT colonography polyp detection
|
|
|---|
The use of computer-aided polyp detection in CT colonography relies on the 3D shape of a polyp as it projects into the colonic lumen, and other factors such as internal attenuation and heterogeneity, to identify colorectal lesions. The integration of computer-aided polyp detection into radiologic interpretation would likely reduce interobserver variability. One recent trial using multiple reviewers found that the use of computer-aided polyp detection can decrease interobserver variability [5], whereas another showed that an automatic polyp detection system could be used as a training tool to improve radiologist performance [6]. The contribution of a computer-aided polyp detection system in practice settings would similarly depend on the performance of the tool and how it is integrated into clinical interpretation. Given the recent development and proprietary nature of polyp detection systems, few head-to-head studies have been performed. The purpose of this study was to evaluate two current polyp detection systems to determine their sensitivity and false-positive rate in the same cohort of patients who have undergone CT colonography and subsequent endoscopy.
|
|
|---|
Because these CT examinations were ordered for clinical reasons, interpreting endoscopists were aware of CT findings and carefully searched relevant colonic segments for lesions identified on CT. Polyp detection systems are generally evaluated in terms of their sensitivity for colorectal polyps and false-positive rate. Because we wanted to compare the performance of the polyp detection systems over a broad range of polyp morphologies, we designed our study to be polyp-enriched: all cases (n = 31) in our CT colonography teaching file, which were performed for clinical reasons (from February 2003 to August 2004), that had true-positive findings identified prospectively by the clinical radiologists with subsequent endoscopic confirmation and that met inclusion but not exclusion criteria (n = 31) were included.
We also wanted to compare the ability of the polyp detection systems to distinguish true polyps and cancers from stool and colonic structures, so the remaining 34 cases had false-positive visual endoluminal findings identified by the interpreting radiologist at CT colonography and incorrectly thought to represent colorectal lesions but that were not found at subsequent endoscopy. These were consecutive false-positive cases from our clinical CT colonography database (from August 2002 to August 2004). CT colonography data were acquired using an 8-MDCT system (LightSpeed Ultra 8, GE Healthcare) with tube rotation time, 0.5 second; table speed, 13.5 mm; 120 kVp; and 140 mA using a standard reconstruction algorithm. For patients exceeding a 40 cm field of view in size, mA was doubled to 150 mAs. The average CTDIvol for an average-sized patient was approximately 5.5 mGy.
Reference Standard
An expert gastrointestinal radiologist with 7 years of experience in CT
colonography who did not participate in polyp detection system analysis or
polyp detection system-endoscopy lesion-matching characterized the 3D location
of every endoscopically verified polyp, noting the colonic segment, slice
number, and 3D location. True-positive CT lesions were matched to
endoscopically verified lesions when they were within one colonic segment and
within 50% of the endoscopic size
[7]. The location of lesions
identified by this radiologist served as the reference standard against which
automated detections were compared. Lesion sizes reported at endoscopy were
used as the reference standard and to classify lesions into lesions greater
than or equal to 1 cm or those 6-9 mm in size. Lesion morphology (sessile,
pedunculated, or flat) was recorded from the endoscopic report.
Polyp Detection System Analysis and Lesion Matching with Endoscopy
Two polyp detection systems were used in our study. The Siemens Medical
Solutions Polyp Enhanced Viewing (PEV) system is a polyp detection system
designed to assist radiologists after visual interpretation of the CT
colonography data set is performed, and has received 510 K approval from the
U.S. Food and Drug Administration (FDA) to be used for this purpose. The
National Institutes of Health (NIH) Summers computer-aided detection (CAD)
system is an investigative system that is not FDA approved but has been used
in numerous studies
[8-11].
The CT data sets meeting all inclusion, but not exclusion, criteria were made anonymous and then distributed to both the PEV and the NIH groups. The NIH group returned the slice number and 3D coordinates of all system detections. The PEV group provided a Leonardo computer workstation using syngo Colonography software (Siemens Medical Solutions) equipped with the PEV software, returning volume files with arrows pointing to PEV detections in supine and prone data sets. Both groups returned polyp detections to the primary institution.
A second gastrointestinal radiologist with extensive experience in CT colonography compared the location of potential lesions identified by the polyp detection systems to the reference standard, categorizing polyp detections as true-positive or false-positive. For a match to occur, the 3D coordinates or arrow pointing to polyp detections had to be on the same lesion identified by the 3D coordinates in the reference standard. Results for polyp detection systems were examined per patient, not per supine or prone position, so if a lesion was detected in one position, the lesion was counted as a true-positive detection regardless of whether the polyp detection system identified the lesion in the complementary position.
False-positive polyp detections were also recorded. False-positive detections were recorded per data set (supine or prone) to minimize subjective decisions regarding the movement of stool between data sets (i.e., if a single piece of stool resulted in a false-positive detection in the supine and prone data set, it counted as two false-positive detections in a particular patient). False-positive detections were characterized according to the structure (rectal tube, ileocecal valve, haustral folds, stool, small bowel, and other causes), which in the estimation of a gastrointestinal radiologist may have caused the false-positive detection. A subset of the false-positive examinations due to other causes included detections that appeared to the gastrointestinal radiologist to represent colorectal polyps that were not verified endoscopically. In the 34 patients referred to endoscopy for false-positive CT colonography findings, a gastrointestinal radiologist reviewed clinical reports to identify the CT abnormality that prompted the referral to endoscopy, noting the 3D location of the human false-positive CT finding. False-positive detections identified by both systems were subsequently compared with human false-positive detections.
Statistical Analysis
The sensitivity and 95% CI of the polyp detections systems for polyps 1 cm
or greater in size and polyps 6-9 mm in size was calculated by comparing polyp
detections to the endoscopic reference standard. Comparison of sensitivities
for the two systems was made using the sign test
[12]. An
-level of 0.05
was considered to be statistically significant.
The total number of false-positive detections per patient was calculated for each polyp detection system, excluding rectal tube detections. This exclusion was performed because the Summers CAD system has an algorithm minimizing rectal tube detections [13, 14]. The results of prospective human interpretation are reported but not compared with the polyp detection systems, given the polyp-enriched and false-positive-enriched study design.
Among the 34 patients with no endoscopic lesion, the number of false-positive detections for the two systems was compared within a patient. The mean difference in false-positive detections is across the 34 patients reported, and the test for mean difference of zero was made using a paired Student's t test. From the practical perspective of a practicing radiologist, false-positive detections are not bothersome if they are relatively few and can be easily dismissed, but they can be bothersome and lead to mistrust of polyp detection system results if they are numerous. We therefore dichotomized the number of false-positive detections as zero, one, or two versus greater than two in these patients. Using the dichotomized results, the performance of the two systems was compared using the sign test, with the result of two or fewer false-positive detections considered to be acceptable (because this would require minimal radiologist efforts to examine). Similar comparisons were performed for CT examinations with endoscopically verified lesions. Finally, the total number of false-positive detections identified by each system, which matched human false-positive detections in the false-positive-enriched cohort, is reported.
|
|
|
|
|
|
|
|---|
1 cm and 13 polyps
6-9 mm in size. The endoscopic morphology of the polyps or cancers
1 cm
was sessile (8/23, 35%), flat (7/23, 30%), pedunculated (6/23, 26%), and
annular (2/23, 9%), and the morphology of the 6-9 mm lesions was uniformly
sessile. Both of the annular lesions were adenocarcinomas, as were two of the
flat lesions.
For lesions
1 cm in size, the PEV tool had a sensitivity of 61%
(14/23; 38-81%; 95% CI) compared with the Summers CAD system, which had a
sensitivity of 96% (22/23; 78-99%, 95% CI) (Fig.
1A,
1B,
1C). The 95% CIs slightly
overlapped for the two polyp detection systems, but the Summers CAD system
sensitivity was significantly higher using the sign test (p = 0.008).
The prospective interpretation by the clinical radiologists identified 22 of
23 lesions
1 cm (96%), but high radiologist performance was anticipated
because we used radiologist detection of any CT lesion as an inclusion
criterion (Fig. 1A,
1B,
1C). Both polyp detection
systems detected one 1-cm polyp missed by the human observers (Fig.
2A,
2B). This lesion had an
irregular surface, which may have led to it not being considered a true lesion
by the interpreting radiologist
[7].
Flat lesions accounted for the majority of false-negative examinations for
polyps and lesions
1 cm in size. The Summers CAD system had only one
false-negative CT examination with a polyp or lesion
1 cm in size. This
single case occurred in a patient with a cigar-shaped flat lesion located on a
fold in the ascending colon (Fig.
3A,
3B,
3C). The PEV tool failed to
identify this polyp as well. Six of nine (67%) false-negative lesions
1
cm missed by the PEV tool were flat lesions.
|
|
|
In our 65 patients, the PEV tool had an average of 1.18 false-positive detections per patient (range, 0-6; SD, 1.65), whereas the Summers CAD tool had an average of 5.20 false-positive detections per patient (range, 1-26; SD, 4.78), excluding the rectal tube detections for both systems. Prospective radiologist interpretations in our skewed cohort had 0.9 false-positive detections per case. The PEV tool had 21 rectal tube detections (0.32 per case, 0.95 SD) compared with the Summers CAD system, which had only one rectal tube false-positive detection (0.02 per case, 0.13 SD), owing to an algorithm used by the NIH group that aims to minimize false-positive rectal tube detections.
Table 1 summarizes the average number of false-positive polyp detections per case according to probable etiologic factor. The Summers CAD tool had an approximately fivefold increase in false-positive detections relating to haustral folds and stool. Both polyp detection systems had about the same number of detections on structures that visually resembled a polyp. The PEV tool had eight false-positive detections on the ileocecal valve (0.12 per case) compared with the Summers CAD tool, which had 48 false-positive ileocecal valve detections (0.45 per case). The PEV tool had 12 detections that resembled colorectal polyps visually but which were not found endoscopically, whereas the Summers CAD system identified 13 similar detections (Fig. 4A, 4B, 4C).
|
|
|
|
In the 34 patients with no colorectal lesion at endoscopy, the PEV tool had a mean of 1.9 (SD, 1.8) false-positive detections per patient, whereas the Summers CAD tool had a significantly greater number of false-positive detections (5.8 false-positive detections per patient; SD, 4.5; sign test p < 0.001 using > two detections as a cutoff). The mean difference in the number of false-positive detections was 3.9 (SD, 4.1), with the NIH system having more. In this group of patients in whom a radiologist had at least one false-positive detection in every case, the PEV tool correctly made no detections in nine (26%) of 34 patients, compared with the Summers CAD tool, which made at least one detection in every case. In paired comparisons, this difference ranged from one instance in which the Summers CAD tool had 21 false-positive detections more than the PEV tool to another case in which the PEV tool had two more false-positive detections than the Summers CAD tool (p < 0.001, paired Student's t test). In 31 (91%; 76-98%, 95% CI) of 34 patients, the Summers CAD system had more false positives than the PEV system. Relating to the false-positive CT detections that radiologists erroneously interpreted as polyps, the PEV tool had 17 false-positive detections in 12 patients, which matched the radiologists' false-positive detections, whereas the Summers CAD system had 38 detections in 17 patients, matching the radiologists' false-positive findings.
In the 31 patients with colorectal polyps or cancers, similar results were also observed, with the Summers CAD system having a mean of 4.2 false-positive detections more than the PEV tool (paired Student's t test, p < 0.001). A significantly higher percentage of patients had two or fewer false-positive detections by the PEV system compared with the Summers CAD system (27/31, 87% vs 12/31, 39%; p < 0.001, sign test).
|
|
|---|
1 cm (96%), detecting 21 of 22 lesions seen by human observers
and a synchronous polyp missed by human observers. The other polyp detection
system operated with a lower false-positive rate (1.18 false-positive
detections per patient), which approached that of radiologists in our
false-positive-enriched population. Importantly, both polyp detection systems still require visual assessment by the radiologist of detections because several false-positives can be generated during each examination despite the lower number of detections reported per case than in some prior studies [11]. A few false-positive polyp detections in general are not time-consuming for radiologists to evaluate because standard 2D multiplanar reconstruction and 3D endoluminal views can be quickly used to dismiss or confirm them. An excessive number of false-positive detections, however, can lead to distraction and failure to recognize true-positive lesions.
It is difficult to understand the relative strengths and weaknesses of different polyp detection systems given the idiosyncratic differences in populations and study designs used in such studies. Both of the polyp detection systems discussed in this study have been previously described along with their algorithms for polyp identification and classification, and prior studies have reported excellent results [8, 11, 15, 16]. Comparisons of CAD systems on identical patient populations are necessary to understand the differences between systems. This information is important for radiologists to understand when selecting a polyp detection system and deciding how or if automated polyp detection systems will be used in their colonography practice. Highly sensitive polyp detection systems could be used in a "first reviewer paradigm," with radiologists evaluating only automated polyp detections, a practice that could dramatically decrease interpretation time [5]. Baker et al. [6] recently used the PEV tool after visually evaluating CT colonography data sets and showed that over time, using the PEV tool as a "second reviewer" reduced interobserver variability and improved reviewer performance.
Double-reviewing of CT colonography examinations by radiologists has been shown to improve sensitivity for the detection of colorectal polyps [17], but it is expensive and time-consuming. The recent report by Taylor et al. [18] suggests that a second review by a polyp detection system may be a cost-effective alternate strategy, reducing interobserver variability and detecting polyps missed by even experienced radiologists. Moreover, several studies examining the learning curve on CT colonography have shown that the overwhelming majority of errors committed by new reviewers are errors of detection (i.e., they did not stop their 2D or 3D review to consider CT abnormalities corresponding to endoscopically verified polyps) [19, 20]. For such novice radiologists, polyp detection systems may be helpful by forcing them to reconsider CT abnormalities they did not analyze earlier.
The PEV tool failed to identify six of seven flat lesions in our study. Polyp detection systems are optimized for lesion morphologies and CT techniques to which they have been exposed, and flat lesions are not a morphology for which the PEV tool had been optimized at the time of this study [15, 16]. Furthermore, the morphologic distribution of large polyps in our study was skewed toward a high proportion of flat lesions (30%) compared with that reported previously in a screening population (10-15% flat polyps) [7]. Earlier assessments of the PEV tool also used a much thinner slice thickness of 1 mm [15]. Hence, technological assessments such as ours are time-limited. As the PEV system is exposed to more flat lesions, its performance in this category will improve. Since the time of this study, the Summers CAD group has developed an algorithm to reduce false-positive detections on the ileocecal valve, and the PEV tool has now developed an algorithm to identify rectal tube detections [21, 22].
Our study has several limitations. Primarily, we did not assess these polyp detection systems as part of the clinical workflow or assess the incremental benefit of the polyp detection systems to the identification of colorectal neoplasia after human observation. Rather, we limited our review to patients with CT colonography findings detected at prospective clinical interpretation and followed up by endoscopy. Second, our patient population was purposely skewed, being both polyp enriched and false-positive enriched. Although this study provides a means of comparing polyp detection systems, it does not indicate the sensitivity of these tools in a screening or diagnostic setting. Importantly, given our study design, we did not include patients who had false-negative CT colonography examinations because these patients would not have been referred for subsequent endoscopy. We selected our patient population so that the strengths and weaknesses of each polyp detection system could be highlighted. We performed our comparison in CT colonography data sets without stool tagging. The Pickhardt trial [1, 23] used both stool and fluid tagging, and these techniques are now the clinical practice at our institution. At the time of study design, we selected scanning characteristics based on the suggestions of the principals involved with each polyp detection system, a decision that precluded the use of CT colonography examinations in the public domain from the Pickhardt trial [1, 23]. Since that time, both systems have undergone testing in large numbers of patients with stool and fluid tagging [9, 22]. Finally, we did not perform receiver operating characteristic analysis because we were unable to adjust the number of detections either system reported; instead we asked polyp detection systems to report detections in binary terms (present, not present) rather than probabilities.
In conclusion, our comparison of two available polyp detection systems showed that one system operated with higher sensitivity, whereas the other operated with a lower false-positive rate, in a collection of CT colonography cases that were selected to provide a large number of polyps and potential false-positive detections. The sensitivity of one polyp detection system was identical to the human observers, whereas the false-positive rate for the other polyp detection system was similar to human observers. Comparison of systems using the same data sets, such as this study, can evaluate the relative strengths and weaknesses of each polyp detection system. The establishment of a national database of CT colonography examinations that can be used to similarly assess polyp detection system performance will assist inventors and radiologists in determining the strengths and weaknesses of individual polyp detection systems. In addition, we believe that prospective trials using automatic polyp detection systems as a primary means of CT colonography interpretation appear warranted.
Acknowledgments
We thank C. Daniel Johnson and W. Scott Harmsen for their contributions to
this article.
|
|
|---|
This article has been cited by other articles:
![]() |
S. H. Park, S. Y. Kim, S. S. Lee, L. Bogoni, A. Y. Kim, S.-K. Yang, S.-J. Myung, J.-S. Byeon, B. D. Ye, and H. K. Ha Sensitivity of CT Colonography for Nonpolypoid Colorectal Lesions Interpreted by Human Readers and With Computer-Aided Detection Am. J. Roentgenol., July 1, 2009; 193(1): 70 - 78. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. A. Taylor, J. Brittenden, J. Lenton, H. Lambie, A. Goldstone, P. N. Wylie, D. Tolan, D. Burling, L. Honeyfield, P. Bassett, et al. Influence of Computer-Aided Detection False-Positives on Reader Performance and Diagnostic Confidence for CT Colonography Am. J. Roentgenol., June 1, 2009; 192(6): 1682 - 1689. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. M. Summers, L. R. Handwerker, P. J. Pickhardt, R. L. Van Uitert, K. K. Deshpande, S. Yeshwant, J. Yao, and M. Franaszek Performance of a Previously Validated CT Colonography Computer-Aided Detection System in a New Patient Population Am. J. Roentgenol., July 1, 2008; 191(1): 168 - 174. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. D. Johnson, J. G. Fletcher, R. L. MacCarty, J. N. Mandrekar, W. S. Harmsen, P. J. Limburg, and L. A. Wilson Effect of Slice Thickness and Primary 2D Versus 3D Virtual Dissection on Colorectal Lesion Detection at CT Colonography in 452 Asymptomatic Adults Am. J. Roentgenol., September 1, 2007; 189(3): 672 - 680. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |