|
|
||||||||
Original Research |
1 Intestinal Imaging Centre, St. Mark's Hospital, Harrow, Middlesex, England HA1
3UJ.
3 Medical Statistics Group, Centre for Statistics in Medicine, Institute of
Health Sciences, Oxford, England.
4 Consultant, Ruislip, Middlesex, England.
Received February 1, 2005;
accepted after revision May 9, 2005.
Address correspondence to S. Halligan.
Abstract
|
|
|---|
MATERIALS AND METHODS. Four observers, three of whom had prior experience with CT colonography, estimated the maximum diameter of 48 polyps using three different visualization displays: 2D colonography window, 2D abdominal window, and 3D surface rendering. Each re-measured a subset of 10 polyps. Polyps measured 2 to 12 mm according to a colonoscopic reference. Inter- and intraobserver agreement and agreement with the reference measurement were determined using the Bland-Altman method, paired Student's t testing, analysis of variance, and analysis of covariance (ANCOVA), and by calculating the components of variance.
RESULTS. CT measurements overestimated polyp diameter, a phenomenon found least using the 2D abdominal display. Generally, 95% limits of agreement encompassed different size categories for individual polyps: the widest spanned 14.6 mm (-4.6 mm to 10.0 mm) for an experienced observer using the 3D display. When using the 2D abdominal display, no significant difference was found between estimates and the reference value for the other two experienced observers (p = 0.83 and 0.23). All the observers' measurements were significantly different from the reference when using the 3D display (p < 0.001). The novice was significantly different from the experienced observers in some analyses. Inter- and intraobserver agreement were poorest for the 3D display.
CONCLUSION. Measurement of polyp diameter from CT colonography is subject to variation contingent on the observer's experience and the viewing display used. Although 3D visualization display is commonly used for polyp detection, it should not be used for measurement.
Keywords: colon colonoscopy colorectal cancer CT colonography
|
|
|---|
The biologic significance of an adenoma is thus heavily influenced by its maximum diameter. Diameter can be determined during colonoscopy, most often by comparing the polyp to adjacent biopsy forceps. This procedure can be followed immediately by polypectomy. CT colonography is increasingly advocated as an alternative screening method because it is less invasive and safer than colonoscopy [10]. However, immediate polypectomy is not possible with this process, so when a polyp is detected, recommendations for future management must be made based on measurements obtained from the CT data set. Large polyps require subsequent endoscopy for polypectomy; small polyps may be safely left in situ because the risk of malignant transformation is outweighed by the small but significant risk of adverse events related to colonoscopy and the cost and inconvenience of a second procedure (Zalis M, presented at the Fifth International Symposium of Virtual Colonoscopy). It is therefore important for measurements of polyp diameter obtained at CT colonography to be both accurate and reproducible. However, although several published studies have focused on the variability of polyp measurements obtained during colonoscopy [11-13], at the time of writing, no study has determined this for CT colonography. A degree of inter- and intraobserver disagreement is inevitable, and this is potentially confounded by the viewing conditions (e.g., window width and level), the method of image rendering used, and the experience of the observer.
|
|
|
|
|
|---|
In brief, CT colonography was performed after full bowel preparation and distention using carbon dioxide for insufflation. Prone and supine scanning was performed using a 4-MDCT unit (LightSpeed Plus, GE Healthcare), collimation of 1.25 mm to 2.5 mm, pitch of 1.5, and 50 mA to 100 mA. This technique has been described in detail previously [14]. Same-day optical colonoscopy was performed after CT colonography.
The 24 individual patients were anonymous; after their order was randomized, each was given a unique study number. The study coordinator, who had prior experience interpreting more than 200 CT colonography studies, determined and recorded on a study sheet the segmental location and CT coordinates for each of the known 48 polyps. This was achieved by reference to the previous CT colonography and optical colonoscopy reports, and by viewing the prone and supine CT data sets for each study on a commercially available workstation using proprietary software (Advantage Windows 4.1 and Navigator colon package, GE Healthcare). The axial slice number for the epicenter of each individual polyp was noted for the prone and supine studies individually, if visible on both. This procedure facilitated polyp identification and location for the study interpreters (because polyp detection was not an aim of this study) and allowed them to select one of the paired studies from which to make their measurements.
Polyp Measurement
Four observers interrogated each of the 48 polyps. The observers were
unaware of the reference measurement for each polyp and also unaware of the
distribution of polyp sizes in the data set. All four observers were familiar
with the proprietary software for reporting routine abdominopelvic CT, and the
study coordinator also ensured they were familiar with the display and
measurement functions required for 2D and 3D blinded assessment (see below).
Observers 1, 2, and 3 had previous experience with CT colonography
interpretation, with a minimum of 150 examinations each at the time of the
study. Observer 4 had no prior experience and had only received instruction on
how to use the software, not on how best to measure polyps. Each observer was
asked to measure the maximal diameter of each polyp indicated on the study
sheet using one of three image display methods: colonography window setting,
abdominal window setting, and 3D image. Measurements were made using the same
proprietary workstation and software used by the study coordinator to select
the polyps. A four-quadrant display was used (4:1), and the first measurement
was made from the 2D axial images or 2D multiplanar or oblique reformatted
images obtained with the patient in either the prone or supine position,
whichever was felt by the observer to better depict the maximal diameter of
the polyp. The observers were also able to magnify the images according to
their individual preference. The workstation was set to measure from 2D
images, and a standard colonography window setting was used for viewing
(width, 1,500 H; level, 150 H) (Fig.
1A). The measurement was made by placing software calipers across
what was judged to be the maximal diameter of the polyp. To reduce bias, the
screen annotation function was disabled during this procedure so the observer
was unaware of the value of the measurement made. The observer then changed
the display to standard abdominal viewing windows (width, 400 H; level, 40 H)
(Fig. 1B) and made a second
measurement of maximal polyp diameter in an identical fashion to the first,
again unaware of the value of the measurement made. Last, the observer
switched to a default endoluminal surface-rendered perspective display
(Fig. 1C), repositioned the
viewing angle to best depict the polyp, and made a final measurement of
maximal diameter after changing the software settings to account for diameter
measurement using 3D rendering. Once this measurement was done, the screen
annotation function was enabled, and the observer recorded all three
individual measurements on the study sheet. The interpreters were unaware of
each other's results.
To assess intraobserver agreement, each observer repeated the three measurements on a subset of 10 polyps chosen by the study coordinator from the study data set, which broadly represented all polyp sizes. After the order of the patients was again randomized, each observer remeasured the same subset of polyps.
Once all measurements were completed, the study sheets were collated by the study coordinator and individual observer measurements for each polyp transferred to an Excel worksheet (version 2000, Microsoft). Analysis was performed using Stata (version 7, StataCorp LP) and MLwiN (version 1.10, Institute of Education, University of London).
Statistical Analysis
The Bland-Altman method
[15] was used to determine the
level of agreement between observers' estimates of maximum polyp diameter and
the reference endoscopic diameter. The 95% limits of agreement were used to
define the range of discrepancies between the test (CT) and the reference
(colonoscopy) value on 95% of occasions. Also, because each observer measured
the same polyps, a paired Student's t test was used to determine the
presence of a significant difference between each set of measurements and the
endoscopic reference size (for each observer). Paired Student's t
tests were also used to determine any significant difference between estimates
from the novice observer and the experts when using the three different
visualization displays. A repeated measures analysis of variance was used to
examine the difference in measurements between the four observers. We allowed
for the repeated measurements on the same polyp, adjusting for this using the
Huynh and Feldt method. Repeated measures analysis of covariance (ANCOVA) was
used to determine if the differences between observers varied for polyps of
different sizes. This was done by including the endoscopic reference size in
the analysis and examining if there was an interaction between this and the
observer.
To assess the repeatability of the CT polyp measurements, the variability between interpreters was measured regardless of the reference value by calculating the components of variance. It was possible to break down the data variability into that which resulted from between and from within polyp measurements. A large proportion of variability between polyp measurements implies that the observer had relatively little impact on the measurements. Conversely, if a large proportion of the variability was within polyps, this implies that the observer had more influence on the measurements. Polyp variation was further broken down into those attributable to different observers and those attributable to repeat measurements. The components of variance were calculated using a cross-classified multilevel model, with the multilevel component allowing for the structure of the data and the cross-classification allowing for repeat observations on the same polyp. Any missing observations were excluded from the analysis.
|
|
|---|
Of the 48 polyps measured using a 2D display, observer 1 used an axial display for 27, an oblique display for 15, a coronal display for 5, and a sagittal display for 1. Only observer 3 used axial views for all polyps, and observers 2 and 4 used a combination of axial, coronal, and sagittal views.
Observer Agreement with Colonoscopic Reference Size
The mean differences between observers' estimates of polyp diameter and the
colonoscopic reference size, the SD of these differences, and limits of
agreement for each observer are summarized in
Table 1 for each of the three
visualization displays. In addition, the difference between individual
observer measurements and the reference size are displayed graphically as
Bland-Altman dot plots (Fig. 2A
and 2B). These data show that
the CT measurements, irrespective of the display used, on average
overestimated the diameter of the polyp studied when compared with the
reference value. However, the least amount of measurement error occurred when
using the 2D abdominal window setting. The 95% limits of agreement were
relatively wide for all observers and certainly had sufficient span to
encompass different size categories for individual polyps. The narrowest
limits of agreement spanned 7.6 mm for observer 1 using the 2D colonography
window setting, whereas the widest spanned 14.6 mm for observer 3 (an
experienced observer) using the 3D display. When significant differences
between the observers' estimates and the reference values were considered,
observers 1 and 2 were not significantly different, but only when using the 2D
abdominal window setting; these two observers were significantly different
from the reference standard when using the other two displays, and observers 3
and 4 were significantly different when using all three displays (see
Table 1). The novice (observer
4) had errors that were significantly larger than expert observer 1 for all
three displays (p = 0.001, 0.01, and < 0.001 for 2D colonography,
2D abdominal, and 3D displays, respectively) and significantly larger than
observer 2 for two of the displays (p = 0.04 and 0.001 for 2D
colonography and 3D displays, respectively). No significant difference was
found between the novice and observer 3 for any display. For all displays,
measurement error was smallest, on average, for observer 1, followed by
observer 2, with observers 3 and 4 having greater errors. Supporting these
datawhen the absolute error was calculated for each observation and
observer and compared with the visualization platform usedno
significant difference was seen between the 2D colonography window setting and
the 3D display. A significant difference occurred, however, when the 3D
display was compared with the 2D abdominal window setting in all but one
comparison (Table 2).
|
|
|
|
Observer Agreement Irrespective of Reference Size
Between-polyp variance (variability when measuring different polyps) and
within-polyp variance (variability when measuring the same polyp, further
broken down into those attributable to different observers and those
attributable to random variation [between-measurement variance]) are shown in
Table 3. These data show that
measurements obtained using the three visualization displays are relatively
similar with very little within-polyp variation resulting from differences
between observers; approximately 70% of the total variability in the data was
attributable to between-polyp differences. Of the within-polyp variation, most
was random variability of the measurements rather than differences between
observers. When the differences between observers' measurements were examined
using analysis of variance, the most significant difference occurred when
using the 3D display (Table 4).
No evidence that this difference was significantly influenced by polyp size
was seen (2D colon window setting [p = 0.96], 2D abdominal window
setting [p = 0.84], 3D display [p = 0.86], analysis of
covariance).
|
|
Predictably, assessment of intraobserver agreement revealed that it was superior to interobserver agreement, with the narrowest limits of agreement for the 2D abdominal display and the widest for the 3D display (Table 5). Differences between the two sets of observer measurements are displayed graphically as Bland-Altman dot plots (Fig. 3A and 3B).
|
|
|
|
|
|---|
Researchers group polyps into three size categories contingent on their maximum diameter (small, 5 mm or less; medium, 6 to 9 mm; large, 10 mm and larger). Expert consensus is that polyps in the small category are clinically insignificant; surveillance scanning should be considered for medium and large polyps, but polypectomy is recommended for those that are large (Zalis M, presented at the Fifth International Symposium of Virtual Colonoscopy). Such grouping is convenient, but measurement error is inevitable and could potentially result in large polyps being inadvertently classified as small and vice versa, especially if the error occurs around a category cut point. Supporting this, we found limits of agreement as wide as 14 mm, raising the possibility that a polyp whose true diameter is 10 mm or more could be assigned to the small category because of measurement error. As a result, the polyp might be assigned to interval surveillance or even be disregarded when the correct course of action would be endoscopic polypectomy. Conversely, by including clinically insignificant polyps (2- to 5-mm polyps) in our data set, we showed that diminutive polyps can be incorrectly categorized as large and therefore inappropriately referred for polypectomy.
Our analysis of variance found evidence of a significant difference between mean polyp measurements for the four observers, especially when the 3D image was used to make the measurement. This might be expected because there are more subjective opportunities for cursor placement using this type of image. The angle and direction from which the polyp is viewed may be continuously varied by the observer, as may be the distance from the polyp, even though the software is calibrated to account for this. Intuitively, these factors suggest that inaccurate measurements most likely occur when using a 3D display, and our results support this hypothesis. Alternatively, it could be argued that 3D visualization offers the best chance of detecting the long axis of the polyp, especially when the polyp is large and its morphology irregular. However, we found no evidence that accuracy was influenced by the size of the polyp being measured.
Predictably, intraobserver agreement was better than interobserver agreement, but, again, the 3D image was the least accurate. This is important because polyps left in situ require reexamination and remeasurement, perhaps after an interval of several years, which makes it less likely that the subsequent measurement will be performed by the same observer who made the first. The clear message is that whereas 3D visualization may be the best method with which to detect polyps [10], it is not the most reliable method for measuring them.
Whether to use colonography or abdominal window settings for measurement is open to debate from examination of our data. Both generally overestimated the diameter of polyps when compared with the reference colonoscopy measurement, but, overall, this discrepancy was least for the abdominal window setting. However, the narrowest limits of agreement were achieved by observer 1 (an expert) when using the colonography setting, and the least interobserver error overall (irrespective of agreement with the reference values) was achieved using the colonography setting. Also, two observers reported that very small polyps "disappeared" when they enabled the abdominal window setting. Intuitively, this latter finding suggests abdominal windows should not be used when there is an appreciable reduction in polyp conspicuity by using these settings. Moreover, in vitro studies of lung nodule measurement indicate that far more accurate measurements are obtained using lung window settings (closer to the colonography setting) compared with using an abdominal window setting [17]. Data also showed that observers differed in their ability to make accurate measurements. Notably, observers 1 and 2 (two experts) were generally more accurate than observers 3 and 4 (an expert and a novice); the novice observer had significantly larger errors than two of the experts but not the third. Although this suggests that prior experience in CT colonography interpretation may facilitate more accurate measurements, this study is not absolutely conclusive.
There are several limitations to our study, and perhaps it raises more questions than it answers, especially regarding the true diameter of a polyp. We used the colonoscopic estimate of polyp diameter as a reference standard because this is conventional, well-established practice. We also used a single expert endoscopist whose experience exceeded 5,000 colonoscopies. Despite this, it is well recognized that colonoscopic measurements are subject to considerable error, even when made by experienced practitioners [11-13]. Endoscopists tend to overestimate polyp diameter during in vivo assessment and underestimate diameter during in vitro assessment [11, 13]. For pragmatic reasons we used open biopsy forceps to determine the reference size because this is the method preferred by our endoscopists. Using a calibrated linear measuring tool may have been preferable, but even this is imperfect because it may not be possible to align the tool adjacent to the maximal diameter of the polyp, leading to a semisubjective assessment [13]. Alternatively, it could be argued that we should have used pathologic estimates as our reference standard as they have been shown to be more reliable than endoscopic measurements [12]. However, even this approach is imperfect because excised polyps tend to significantly decrease in diameter as a result of cauterization and vascular collapse and may either enlarge or shrink with formalin fixation [12, 18]. We did not attempt to analyze our data according to polyp morphology; for example, our data set did not contain flat polyps, which are notoriously difficult to detect and to measure [19, 20]. Also, it is conceivable that elliptic polyps may be more accurately measured on 3D because of better depiction of their shape. Indeed, our study provides some indirect evidence to suggest this is the case. Although observers were free to use oblique 2D reformatted images, only observer 1 actually did so, and observer 1 was closest to the reference standard. In contrast, observer 3 solely used the axial plane and was the least accurate of the experts.
All these factors conspire to thwart a truly reliable estimate of polyp diameter with which comparisons can be made. The Bland-Altman approach is especially relevant to this analysis because it can be argued that the true measurement is not known with certainty from either colonography or colonoscopy. In this case, the best estimate of the true diameter is the mean of the measurements obtained by colonoscopy and colonography, which is the bedrock of the Bland-Altman analysis. We found that absolute agreement between the CT colonography estimate and the colonoscopic reference measurement was variable, evidenced by limits of agreement that were frequently enough to span not only one but two polyp size categories.
We used a single viewing platform (with a variety of visualization displays). The 3D image we used, which was surface-rendered, may be more inherently inaccurate than a volume-rendered alternative [21]. Also for pragmatic purposes, we did not formally assess the effect of display magnification on polyp measurement accuracy, although observers were free to alter magnification according to individual preference. Further research on these topics is required. Our use of a single novice may be criticized because this individual was effectively acting as a proxy for all novices. We were able to show significant differences between our three experts and no doubt there would also be differences between novices. One potential solution to minimize inter- and intraobserver variation of assessment of polyp diameter is to fully automate the measurement process using computer-aided boundary detection, which removes the subjective element of cursor placement. Further development in this area is awaited.
In conclusion, measurement of maximum polyp diameter during CT colonography is subject to inter- and intraobserver variation, the degree of which is contingent on the observers, their level of expertise, and the viewing conditions used. This variation may result in polyps detected by CT being inadvertently assigned to an incorrect size category. In day-to-day clinical practice, the window setting and anatomic plane used for measurement should be documented to facilitate subsequent interval comparisons. Finally, our study suggests that 3D visualization displays, commonly used for polyp detection, should not be used for polyp measurement.
|
|
|---|
This article has been cited by other articles:
![]() |
C. van Wijk, J. Florie, C. Y. Nio, E. Dekker, A. H. de Vries, H. W. Venema, L. J. van Vliet, J. Stoker, and F. M. Vos Protrusion Method for Automated Estimation of Polyp Size on CT Colonography Am. J. Roentgenol., May 1, 2008; 190(5): 1279 - 1285. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. H. Park, E. K. Choi, S. S. Lee, J. Y. Woo, S. Y. Chung, Y. J. Kim, J. K. Han, and H. K. Ha Linear Polyp Measurement at CT Colonography: 3D Endoluminal Measurement with Optimized Surface-rendering Threshold Value and Automated Measurement Radiology, December 1, 2007; 246(1): 157 - 167. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. A. Taylor, A. Slater, S. Halligan, L. Honeyfield, M. E. Roddie, J. Demeshski, H. Amin, and D. Burling CT Colonography: Automated Measurement of Colonic Polyps Compared with Manual Techniques--Human in Vitro Study Radiology, December 1, 2006; 242(1): 120 - 128. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |