|
|
||||||||
Original Research |
1 Department of Imaging Science and Technology, Delft University of Technology,
Lorentzweg 1, Delft, Zuid-Holland, 2628 CJ, The Netherlands.
2 Department of Radiology, Academic Medical Center, Amsterdam, The
Netherlands.
3 Department of Medical Physics, Academic Medical Center, Amsterdam, The
Netherlands.
4 Department of Gastroenterology and Hepatology, Academic Medical Center,
Amsterdam, The Netherlands.
Received July 13, 2007;
accepted after revision November 20, 2007.
Address correspondence to C. van Wijk.
Abstract
|
|
|---|
MATERIALS AND METHODS. The study included phantoms (23 phantom objects) and patients (16 polyps). Measurement with sliding calipers served as the reference for the phantom data. The mean of two independent colonoscopic measurements was the reference for the polyps. The automated measurement was developed for a computer-aided detection scheme, and the size of any detected object was obtained from measurement of its largest diameter. The automated measurement was compared with manual 2D and 3D measurements by two experienced observers.
RESULTS. For phantom data, the measurement variability of the automated method was significantly less than that of the two observers (p < 0.05), except for the 3D measurement by observer 1, as follows: automated, 0.86 mm; observer 1, 1.76 mm (2D), 0.96 (3D); observer 2, 1.34 mm (2D), 1.45 mm (3D). The variability of the automated method did not differ significantly from that of manual methods in measurement with patient data. The automated method had a systematic error for phantom data (1.9 mm).
CONCLUSION. For phantoms, the automated method has less measurement variability than manual 2D and 3D techniques. For true polyps, the measurement variability of the automated method is comparable with that of manual methods. The automated method does not suffer from intraobserver variability. Because systematic error can be calibrated, automated size measurement may contribute to a practical evaluation strategy.
Keywords: automated measurement colon cancer CT colonography polyps
|
|
|---|
Our focus is on accuracy and measurement variability. Accuracy is defined as the mean difference between the measurement obtained with the method under investigation and that obtained with the reference standard. Systematic error is a significant mean difference, which can be due to overestimation or underestimation with the method under investigation. Measurement variability is defined as the SD of the mean difference. A method can be highly accurate but have great measurement variability, and vice versa.
Lesion size is best defined as the single largest diameter of the polyp head, excluding the stalk. It is usually measured on 2D reformatted images or in endoluminal 3D display [5–7]. In either case, significant measurement variability contingent on the observer's experience and the viewing display used has been reported [6].
Automated techniques have been introduced to enhance the measurement reliability [8, 9]. It has been reported [10] that automated and manual 3D measurements are more accurate than manual 2D measurement of polyps in a human colectomy specimen. The measurements were done on a resected specimen that was insufflated and submerged in a container with 0.9% saline solution. It was later found, however, that 3D measurement had the largest systematic error in a study [11] that included polyps from a CT colonography study in which colonoscopy was the reference standard.
Burling et al. [8] found semiautomated measurement to have better interobserver and intraobserver agreement than manual 2D measurement of spherical polyplike phantom objects. The semiautomated method evaluated in that study requires the user to indicate a starting point for polyp segmentation and size measurement. It is presumed that the user interaction causes measurement variability. Taylor et al. [10], however, observed that automated and manual approaches have comparable interobserver agreement. The latter observation was confirmed by Burling et al. [11]. Several factors may explain the conflicting results: the types of objects used (phantom objects vs patient data), noise characteristics and scanner resolution, reader variation (interobserver and intraobserver), and reliability of the reference standard (colonoscopy) in patient studies.
We studied the accuracy and measurement variability of an automated measurement technique [12] under varying scanning conditions and using both phantom data and patient data. The performance algorithm was compared with that of both 2D and 3D manual measurements by human observers. For the phantom data, the evaluation was performed for two slice thicknesses and two orientations of the phantom data in the scanner. All data were acquired with a 64-MDCT scanner. We hypothesized that the variability of automated measurement would be greater than the intraobserver and interobserver variability of human readers.
|
|
|---|
|
|
Patient Data
The polyps were from scans of 10 patients (six men, four women; mean age,
59 years; range, 30–74 years) selected from an ongoing CT colonography
study, which included patients at increased risk of colorectal cancer. All
patients in the ongoing study underwent CT colonography followed by
colonoscopy, which was videotaped. The selection of patients for this study
was based on polyp size measured during colonoscopy; the requirement was a
diameter larger than 5 mm, irrespective of the shape or location of the polyp.
All such polyps in patients examined in the period March 31–August 30,
2006, were included. The CT colonography study was approved by the medical
ethics committee of the hospital. The patients were informed a priori of the
study purpose by letter and orally and gave written consent.
For this study, the polyps were remeasured on the colonoscopic videotape by two experienced gastroenterologists who were aware of the aim of the study. The gastroenterologists were blinded to the CT colonographic size measurements and the initial colonoscopic measurements. The retrospective measurements were compared by use of an open biopsy forceps (size, 8 mm) and a caliper (size, 10 mm) if available (four of 16 cases). The mean retrospective measurement served as the reference standard for polyp size. The 10 patients had 16 polyps: 11 polyps 6–10 mm in diameter in eight patients and five 10 mm in diameter or larger in five patients (Table 1). The colonoscopic findings were matched with the colonographic data by a research fellow who was not involved in the study.
CT
CT of the phantom and the patients was performed on a 64-MDCT scanner
(Brilliance, Philips Medical Systems). The scan parameters were as follows:
100 mAs; 120 kV; collimation, 64 x 0.625 mm; pitch, 0.98; standard
reconstruction filter. The phantom (Fig.
1) was scanned in two positions: parallel and at an angle of
45° with respect to the axis of the scanner. This step was taken to obtain
an oblique orientation of the main polyp axes with respect to the scanning
direction. The field of view was fixed at 300 mm2. The phantom in
parallel orientation was scanned once with a slice thickness of 3.0 mm; the
slice thickness was 0.9 mm for all other scans.
Starting the day before the examination, all patients drank 4 L of polyethylene glycol solution (KleanPrep, Helsinn Birex Pharmaceuticals), which is a hyperosmolar cathartic agent, combined with four 50-mL doses of tagging material (meglumine ioxithalamate, 300 mg I/mL, Telebrix, Guerbet) for bowel preparation. The colon was distended by automated insufflation of carbon dioxide to a maximum of 20 mm Hg or maximum patient tolerance. The patients underwent scanning in both prone and supine positions. The field of view varied between 286 and 440 mm2. The slice thickness was 0.9 mm.
Automated Polyp Measurement
Automated polyp measurement was implemented on a proprietary experimental
version of the workstation (ViewForum 6.1, Philips Medical Systems). The
method is part of a computer-aided detection scheme for automated polyp
detection [12]. The scheme
entails estimation of the deformation of the colonic wall needed to digitally
remove a presumed lesion (Fig.
2). For example, iteratively moving the points on the convex parts
of the polyp (i.e., the protruding part) inward effectively flattens the
object. After a certain amount of deformation, the surface flattening is such
that the protrusion is removed. Thus the surface looks as if the object were
never there. The amount of displacement is a measure of protrusion. The polyp
is delimited by a threshold of the deformation field. The size is obtained by
fitting an ellipse [9]. The
polyp measurement is the largest diameter of the ellipse.
|
The 3D display was obtained by isosurface volume rendering into an enhanced 3D viewing method (unfolded cube images) [13]. The transfer function had a threshold of –650 H, making the voxels below this threshold completely transparent. The 3D measurements were obtained with electronic calipers. The viewing software used for this study allowed manual navigation for optimizing the endoluminal vantage point and placement of caliper points in the 3D space. The observers were instructed to maneuver orthogonally over the object and to measure the maximum diameter.
The 2D measurement also required orthogonal navigation over an object. A reformatted cross section through the object was shown that could be rotated to identify the longest linear dimension. Window settings of 1,300 H for the phantom and 1,250 H for the patient and levels of 0 H for the phantom and –50 H for the patient data were applied. The difference accounted for the slightly higher attenuation of plasticine, which was measured to be 100 H. The observers were instructed to freely zoom in and out. The size of the object was determined with electronic calipers.
The interval between 2D and 3D measurements of the same object was a few hours, during which approximately 100 other measurements were made. This setup was chosen to avoid observer bias. The images were presented randomly. Thus the order in which 2D measurements were made differed from the order in which the 3D measurements were made. The objects in the phantom and the true polyps were measured twice by both observers using both methods. The interval between 2D and 3D measurements of the same object on the same scan was at least 4 weeks. Recall bias was further reduced because both observers evaluated images from more than 50 other CT colonographic examinations during the intervals.
Outcome Parameters and Statistical Analysis
The accuracy and measurement variability of the observers and the algorithm
were determined by comparing the first measurements of the phantom objects in
parallel orientation with the reference standard. In addition, the accuracy of
the measurements of true polyps was determined by comparison with the
retrospective colonoscopic measurements. Moreover, we counted how often the
critical category of a polyp was changed by the measurement (for instance, a
polyp measured 6–9 mm by an observer measured 10 mm or larger with the
reference standard).
For determination of intraobserver variability, the initial measurements on the scan with the phantom in parallel orientation were compared with measurements made on the same scan 4 weeks later. The intraobserver variability of the measurements of the true polyps was determined in the same way. Interobserver variability was explored by comparing the two observers' first measurements of the phantom in parallel orientation. The interobserver variability of the first measurements of the true polyps also was determined in this manner. Variability due to differences in orientation of the phantom in the scanner was assessed by comparing the first measurements from the observers and the algorithm on the scans with the phantom in parallel and oblique orientations. The influence of slice thickness was studied by comparing the measurements from the observers and the algorithm on the phantom scans with measurements at slice thicknesses of 0.9 and 3.0 mm.
Student's t test was applied to assess systematic mean difference between paired measurements. The SD of the mean difference was calculated to express measurement variability. A Bartlett test [14] was first applied to test the assumption that SDs across measurement series were equal (e.g., the SDs in Fig. 3A). If the null hypothesis of equal variance was rejected, the squares of the SDs were compared by means of the F test.
|
|
|
|
|---|
|
|
|
|
|
|
Patient data—The Bland-Altman plot of the patient data is shown in Figure 3B. With patient data, observer 2 made systematic errors in both 2D and 3D measurements. There were no statistically significant differences between the automated method and either 2D or 3D manual measurement regarding measurement variability. Linear regression revealed a significant trend in the automated measurements, that is, significantly greater measurement error with a larger polyp size. All of the approaches changed the critical category of four of 16 polyps, except for 3D measurements by observer 2, for whom one of 16 polyps changed category.
Intraobserver Variability
Phantom data—Figure
4A shows the Bland-Altman plot of intraobserver variability of the
phantom data. Measurement of the phantom objects resulted in intraobserver
variability that was not significantly different between the observers for
either 2D or 3D methods.
Patient data—Figure 4B shows the outcome for the polyps. Again, intraobserver variability was not significantly different between the observers for 2D measurement, but the variability was significantly different (p = 0.035) for 3D measurement.
Interobserver Variability
Phantom data—Figure
4C is a Bland-Altman plot illustrating interobserver variability
for 2D and 3D measurement with the phantom data. For the phantom objects, a
statistically significant mean size difference of 1.43 mm was found between
the 3D measurements of the two observers (p < 0.01). The
interobserver variability of 2D measurements was significantly greater than
the variability of 3D measurement (p << 0.01).
Patient data—Figure 4D is a Bland-Altman plot of the interobserver variability for 2D and 3D measurements of true polyps. Compared with observer 1, observer 2 underestimated polyp size by 1.48 mm for 2D and 1.54 mm for 3D measurements. The interobserver variabilities of 2D and 3D measurements were not significantly different for the true polyps.
Orientation of Phantom in Scanner
The effects of orientation were investigated only with phantom data. There
were no significant differences in mean size between the measurements of the
phantom objects in different orientations for either the automated method or
the observer measurements. Figure
5 is a Bland-Altman plot illustrating variability due to variation
in object orientation. The variability of the automated measurements was
significantly less than the variability of the observers for 2D measurements
(both, p < 0.001). The variability of the automated measurements
also was less than the variability of the 3D measurements by the observers,
but the difference was not significant.
Slice Thickness
The effects of slice thickness were investigated only for phantom data.
Figure 6 is a Bland-Altman
plot illustrating variability due to differences in slice thickness.
Statistically significant differences in mean size were found only between 3D
measurements by both observers (p < 0.01). The variability of
automated measurement was not significantly different from that of the
observer measurements.
|
|
|---|
We found that one observer made a systematic error (consistent undersizing in both 2D and 3D measurement) with the patient data. We attribute this error to different perception of the polyp boundaries by this observer. The automated method had the most measurement variability, although it did not differ significantly from that of any manual approach. The great variability might have been due to systematic error, which increases with polyp size. The measurement variability (Fig. 3B) for polyps was considerably greater than the corresponding variability for phantom objects (Fig. 3A). The increase may be explained by imprecision and inaccuracy in the reference standard (colonoscopy).
Polyps conventionally are measured according to the single largest polyp diameter in either 2D or 3D, although other methods of quantification of size (e.g., by volume) have been explored [8, 15]. Conflicting results have been reported [5, 10, 11] regarding whether 2D or 3D measurement is preferred and how manual measurement relates to automated methods. Pickhardt et al. [5] found that manual 3D measurement is significantly more accurate than manual 2D measurement. They attributed underestimation of 2D measurement to suboptimal alignment of the standard orthogonal multiplanar reformations in relation to the polyp axis. That study was based on phantom data and patient data (colonoscopy served as the reference for the polyps). We reduced the pitfall of selecting the proper cross section for 2D measurement by letting the observers navigate orthogonally over a polyp in the 3D display. A reformatted cross section through the object was shown that could be manipulated to find the longest dimension of the object. The orientation of the reformatted cross section was visualized simultaneously in the 3D display, but the measurement had to be done on the reformatted image.
Burling et al. [11] found that the greatest measurement error was made with the manual 3D approach. Those authors indicated that 3D measurement is prone to subjective cursor placement, for example, due to varying angle and direction from which a lesion is viewed [6]. Both studies by Burling et al. included true polyps, and colonoscopy served as the reference standard. In our experiments, the observers were aware of the difficulty of positioning electronic calipers on 3D views. Accordingly, they checked their placement carefully. Taylor et al. [10], however, found that automated and manual 3D polyp measurements were more accurate than manual 2D measurement of a human colectomy specimen (irrespective of observer experience).
We opted to include both phantom data and patient data because observable differences regarding 2D versus 3D measurement may relate to the types of objects used (phantom vs true polyps) and the accuracy of the reference standard (sliding calipers vs colon oscopic size). Our 2D measurements with the phantom data had greater variability than our 3D measurements as orientation of the objects in the scanner varied. Moreover, we found greater variability for 2D measurement than for 3D measurement with the phantom data in comparison with the reference standard. The latter finding confirms results reported by Pickhardt et al. [5]. We hypothesize that more complex manipulations are needed for manual 2D measurement. Still, results for the true polyps did not reveal differences, which might have been the result of inaccuracy of colonoscopic measurement (see later). The lack of difference in patient data agrees with the findings of Taylor et al. [10]. One of the observers made a systematic error (underestimation) in measuring the polyps. Such a difference in observer measurement confirms the findings of Burling et al. [6].
The automated measurements of the phantom had less variability than both manual methods in comparison with the reference standard. However, the automated method had systematic error with the phantom data. The reported bias of the automated measurement may be partially explained by the low threshold (–750 H) applied to segmentation of the phantom (compared with a value of –650 H for manual measurement [13]). Certainly, objects appear larger as the threshold is lowered, but the value of –750 H yields optimal sensitivity and specificity for automated polyp detection in a computer-aided detection system [12].
Systematic error can be corrected with proper calibration. This principle holds for measurements by observers and with an automated approach. The systematic difference between the observers regarding 2D and 3D measurement signifies that separate correction values may be needed. In other words, it may indicate that observers need to be calibrated individually to avoid systematic error. Calibration would require a procedure in which an established collection of objects of a precisely known size is measured to determine the accuracy attained by an observer. Eventual systematic error would be subtracted from all subsequent measurements.
Burling et al. [8] described a fully automated technique initiated by two software seeds and proceeding in a regional growing scheme. Another technical report [9] introduced an automated technique that starts with placement of a seed point on the polyp, from which a patch is grown over the polyp surface. Our method merely requires user interaction to indicate a specific protrusion. By definition, repeated measurements with our approach, initiated by either the same or another observer, yield identical results irrespective of seed placement. Because other programs operate with different methods of size measurement, the current results for automated measurement are limited to the software we used.
A limitation of our work was the restricted number of polypoid objects used. An unlimited number of shapes are encountered in clinical practice. For practical reasons, we selected a limited number of phantom shapes that we considered relevant to the hypothesis tested. No criteria regarding lesion shape were applied to the selection of true polyps. Another limitation was the slightly denser material (approximately 100 H difference) of phantom objects compared with true polyps. We used a window of 1,300 H and level of 0 H for the phantom and a window of 1,250 H and level of –50 H for the polyps to have a similar appearance on 2D measurement and to minimize the effect. We hypothesize that the influence on the automated method may be negligible because the algorithm does not entail use of underlying attenuation values.
A third limitation was the precision of the reference standard for the colonoscopic measurements, which increases the total variation for all readers (automated and manual) for patient data. It is well known that colonoscopic measurements have inherent error [16, 17]. Consequently, the reported SDs of true polyp size compared with the reference standard may be overly pessimistic.
Our findings indicate that there is reduced variability in polyp size with automated measurement of phantoms. The variability of automated measurement is in the same range as that of manual measurement of polyps from patients. A clear advantage of the automated method is that it does not suffer from intraobserver variation. Moreover, the automated method can be calibrated once, whereas each observer may need individual calibration. Therefore, automated size measurement may well contribute to a practical evaluation strategy.
|
|
|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |