|
|
||||||||
Original Research |
1 Diagnostic Radiology Department, Clinical Center, National Institutes of
Health, Bldg. 10, Room 1C368X, Bethesda, MD 20892-1182.
2 Department of Radiology, Mayo Clinic, Rochester, MN.
3 Department of Radiology, Walter Reed Army Medical Center and Uniformed
Services University of the Health Sciences, Washington, DC.
4 Department of Radiology, University of Wisconsin Medical School, Madison,
WI.
Received March 27, 2007;
accepted after revision June 10, 2007.
Presented at the 2006 International Symposium on Virtual Colonoscopy,
Boston, MA.
Abstract
|
|
|---|
MATERIALS AND METHODS. CTC scans of 30 patients were selected retrospectively to span ranges of luminal distention (well distended to poorly distended) and surface area covered by residual fluid (high amount of coverage to low amount of coverage). We used QA software developed in our laboratory to automatically measure the mean distention of each of five colonic segments (ascending, transverse, descending, sigmoid, and rectum). Three experienced radiologists visually graded each scan for distention and fluid coverage. Distention and fluid scores for specific segments were assessed with Bland-Altman analysis (mean difference with 95% limits of agreement) and the weighted kappa test. Interobserver and intraobserver variability was determined with the weighted kappa test.
RESULTS. For distention scoring, the mean difference between radiologists and the QA software was 0.1% (95% limits of agreement, –25.6% and 25.9%). For fluid scoring, the mean difference was –0.6% (95% limits of agreement, –8.2% and 7.1%). There was moderate to good agreement (weighted kappa value, 0.50–0.78) between the radiologists' mean scores and the scores obtained with the QA software and for interreader and intrareader assessments of distention and fluid coverage.
CONCLUSION. Results with the QA software agreed with radiologists' assessment of colonic distention and residual fluid coverage but were a more objective assessment. Use of this QA software can help standardize two important factors, distention and residual fluid coverage, that affect the quality of CTC, reducing two known causes of poor CTC performance.
Keywords: colon colonography CT quality virtual colonoscopy
|
|
|---|
In an effort to facilitate assessment of the quality of CTC examinations, we developed automated quality assessment (QA) software to measure colonic distention and luminal surface area obscured by residual fluid. In contrast to most reported clinical assessments of CTC quality, which have been conducted with visual grading scales, use of an automated QA system can standardize, simplify, and expedite the process of assuring a standard of quality in CTC. The purpose of this study was to validate the QA software by comparing its assessments with those made with the accepted reference standard of visual grading by experienced CTC radiologists.
|
|
|---|
Patient Population
Thirty subjects were chosen retrospectively from the patient cohorts at
three medical centers to span a range of CTC quality
[1]. The patients underwent CTC
between May 2002 and June 2003. A sample size of 30 was chosen according to
Altman's [12] nomogram for
calculating sample size with the following parameters: power, 0.80; p
= 0.05; target difference, 5%; SD, 7%. The numbers of patients from the three
institutions were 7, 13, and 10. The age range of the seven women and 23 men
was 45–77 years. Patients were included only if the entire colon could
be processed with the QA software. Proper processing required correct
centerline generation with correct connectivity. However, areas of complete
collapse were allowed. In this situation, the software automatically crossed
the collapsed segment.
The patients were chosen on the basis of a particular segment of interest. The colonic segment of interest displayed a desirable quality trait, that is, a particular degree of distention or coverage with residual colonic fluid. Patients were chosen by a research trainee after visual inspection of anteroposterior snapshot JPEG images of a 3D surface reconstruction of their colons. Use of the JPEG images allowed rapid assessment of the colons. Radiologist 1 confirmed the existence of the desirable quality trait in the segment of interest. Radiologist 1 was among the three radiologists who subsequently graded the 30 patient CTC scans. To avoid recall bias, there was a 2-month delay between case selection and grading by radiologist 1. On the basis of the assessments, 15 patients were chosen for a range of colonic distention. Another 15 patients were chosen for a range of residual fluid coverage. Although patients underwent scanning in both the supine and the prone positions, only one position per patient (nine prone, 21 supine) was used to avoid intrasubject correlations. The position that most clearly depicted the desired quality trait of the segment was chosen.
Bowel Preparation
Patients at all three institutions underwent a standard 24-hour colonic
preparation that consisted of oral administration of 90 mL sodium phosphate,
10 mg bisacodyl, 500 mL barium (2.1% by weight), and 120 mL diatrizoate
meglumine and diatrizoate sodium in divided doses
[1].
|
Image Analysis with Automated QA
The results of analysis were reported in part on the basis of colonic
segments. A multistage procedure was performed that included computation of
the centerline of the colon, manual correction of centerline connectivity
errors, and definition of colonic segments. The centerline was computed with a
previously described automated procedure
[13]. If colonic segments were
collapsed, the connectivity of the centerline might have been incorrect
because the automated software was not capable of linking the discrete
segments (it cannot follow collapsed bowel). Connectivity errors due to
collapse were corrected manually with a graphical user interface.
To standardize the colonic segments for both the automated and manual assessments and to eliminate interobserver differences in the locations of the colonic segments, the segments were predefined manually. With a graphical user interface, separators were placed manually along the colonic centerline to subdivide the colon into five segments (ascending colon, transverse colon, descending colon, sigmoid colon, and rectum) as shown in Figure 1. Only five segments were used to ensure that the segments were of sufficient size for assessment without being too large for a radiologist to interpret and visually integrate the overall distention and fluid coverage of a segment. For ease of analysis, and because it tends to be short relative to the other segments, the cecum was considered part of the ascending colon.
Separators were placed by a research trainee under the supervision of a board-certified diagnostic radiologist. The separators were placed at the proximal and distal ends of the colon and at the junctions between segments. The segments were defined with a modified version of a procedure developed by Taylor et al. [9]. The rectum was defined as the portion of the colon extending from the anorectal junction proximally to the level of the acetabular roof. The sigmoid was defined as the portion of the colon proximal to the rectum to the level of the iliac crest at which the colon does not reenter the pelvis. The descending colon was defined as the portion of the colon proximal to the iliac crest to the midpoint of the splenic flexure. The transverse colon was defined as the portion of the colon between the midpoints of the splenic and hepatic flexures. The ascending colon was defined as the midpoint of the hepatic flexure to the portion of the cecum distal to the ileocecal valve. With the QA software (written in C++, Visual Studio version 6, Microsoft) on a standard desktop PC with the Windows XP operating system (Microsoft), the user was able to rotate the colon along the x- and y-axes, facilitating accurate placement of the separators. For the purposes of this project, the precise definitions of the beginning and end of each colonic segment were less important than standardization of their positions.
After the separators were placed, the QA software was used to compute the distention in each colonic segment. First, the colon was subdivided into narrow slices approximately 1 cm wide and perpendicular to the centerline. Second, the mean diameter of the colon in each slice was computed. Third, on the basis of previous findings that a 2-cm colonic diameter indicates adequate distention [14], the percentage of centerline slices in each colonic segment that had a corresponding distention of 2 cm or more was calculated. This method was called distention method 1. The QA software was used to compute the number of colon surface vertices abutting air or fluid and to convert these numbers to surface area covered by air and fluid. The percentage of each segment obscured by fluid was the fluid surface area divided by the total colonic surface area.
Image Analysis by Radiologist QA
Three radiologists individually reviewed the CTC images of the same 30
patients whose images were used for automated analysis. Using a
custom-integrated 2D JPEG image viewer and scoring software (written in Visual
Basic with Visual Studio version 6) each radiologist reviewed the images on a
PC with Microsoft Windows XP as the operating system. The three radiologists
had 7, 6, and 3 years of experience with CTC. The original CTC transverse
images were converted to JPEG images with two window level and width settings
(0/2,000 and 40/350 H). The radiologists were allowed to review the images
with either or both settings. The data file containing the coordinates of the
separators was used to place markers in the JPEG image viewer to indicate
segment boundaries. The purpose of this procedure was to ensure that the
colonic segments were defined identically for both the automated and
radiologist assessments.
The radiologists scored the distention and residual fluid coverage of the 30 CTC scans. Distention was scored on a scale ranging from 0 to 100 in increments of 5, where 0 indicated that 0% of a particular segment had distention of 2 cm or more and 100 indicated that 100% of a particular segment had distention of 2 cm or more. This scoring system corresponded to that of distention method 1 computed with the automated QA software. Likewise, residual fluid coverage was scored on a scale ranging from 0 to 100 in increments of 5, where 0 indicated that none of the surface area of a particular segment was obscured by fluid and 100 indicated that 100% of the surface area of a particular segment was obscured by fluid.
We compared distention method 1 with a previously described distention scoring method based on a series of complex clinical assessments. The second method (distention method 2) entailed use of a four-point scale developed by Taylor et al. [9] on the basis of work by Chen et al. [15]. To reduce recall bias, the radiologists reviewed CTC scans of the same 30 patients using distention method 2 at least 1 week after analyzing the images with distention method 1.
Statistical Analysis
To avoid bias from intrasubject correlations in analyses based on colonic
segment, statistical analysis was performed on one particular segment (the
segment of interest) from each patient. Although a particular segment was
chosen for only one of the two traits (distention or fluid), both scores
(distention and fluid) for that segment were used in the analysis. The colonic
segments of interest were distributed evenly over the length of the colon
(Table 1). Bland-Altman
analysis was used to compare the means of the radiologists' scores for
distention and fluid with the scores derived with the QA software.
|
The weighted kappa statistic was used as a second measure of agreement. For
this analysis, distention scores were grouped as follows: poor, 0–60;
fair, 61–80; good, 81–90; very good, 91–95; excellent,
96–100. Fluid scores were grouped as follows: low amount of fluid
coverage, 0–5; medium to low, 6–10; medium, 11–20; medium to
high 21–39; and high, 40–100. Groupings were unequal to allow an
even spread of data. The weighting (w) was dependent on the total
number of rows and columns (g) in the following contingency table:
![]() |
Interreader and intrareader agreement for distention and fluid scores was assessed with the weighted kappa statistic. Intrareader analysis was performed by radiologist 3 on 10 patient CTC scans randomly selected from the 30 original cases. The reader used distention method 1 to rescore the 10 scans 1 month after the first scoring session and distention method 2 the following week.
|
|
|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|---|
We found narrower limits of agreement for fluid scoring than for distention scoring. There are a number of possible explanations for this finding. The tasks given to the three radiologists were difficult because they primarily involved visual integration of the amount of distention or fluid over a colonic segment. Of the three scoring tasks, distention scoring was the most difficult, particularly for distention method 1, which had a 2-cm colonic diameter threshold. For example, it is difficult to distinguish distentions of 1.9 and 2.0 cm, and the lack of differentiation can lead to large differences in the overall scores of a segment at this diameter threshold. Therefore, our expectation was that agreement of distention scores would be less than that of fluid scores, and our findings met the expectation. Nevertheless, both interobserver and intraobserver variabilities for the radiologists were good.
The fluid score agreement between automated assessment and mean score of the radiologists was greater than that of the interobserver and intraobserver radiologist agreement. One possible explanation is that the true fluid score of a segment was between the scores assigned by all three radiologists and could be best characterized by the mean. In addition, these results suggest that automated QA is more reliable than manual assessment because it is less variable and likely more reproducible than a single radiologist's interpretation.
Because of the complexity of distention method 2, a previously described [9, 15] scoring method less amenable to automation, we developed distention method 1 and incorporated it into the QA software. Results of the weighted kappa and correlation analyses indicate that distention methods 1 and 2 give comparable results. Of note is a difference in scoring between the two methods in poorly distended segments. The trend line in Figure 9 has an offset because a score of 0 with distention method 2 indicates that a portion within the segment is collapsed, whereas a score of 0 with distention method 1 indicates that the entire colonic segment is collapsed. In other words, distention method 2 scores the worst part of the overall segment and distention method 1 scores the average of the overall segment. This nonlinear skewing indicates that distention method 2 exaggerates the influence of focal collapse. Conversely, with distention method 1, a colonic segment with a short area of total collapse can be assessed as having quality greater than that of a segment with a continuous lumen but poor distention. The correct quality measure ultimately is proportional to the ability to detect polyps per unit length of colon. By this definition, the method of averaging (distention method 1) may be more suitable, although this hypothesis remains to be proven.
Another important factor in QA of CTC is the amount of residual feces. The presence of residual feces is a known cause of lack of detection of polyps [16]. This factor is more difficult to assess with automated than with manual technique, particularly prospectively and if fecal tagging is not used, because feces can look like polyps and masses. QA software will have to be adapted for assessment of the amount of residual feces.
In clinical application, to warn the technologist of the need for additional scans, such as a decubitus view, or for IV contrast administration, the QA software could be run automatically at completion of CTC before the patient leaves the examination table [17, 18]. Alternatively or in addition, the software could be used as part of a quality control program to warn of a slow downward drift in examination quality over time or to indicate the need for additional training of a technologist or nurse. The quality data could also be presented as an informative display, such as a graph of quality along the length of the colon. A visual display of this type would eliminate the need for manual location of the colonic segments and for interpretation of numeric data. This software also could be used to compare data sets from different institutions and from different validation trials to ensure, for example, a level of uniformity or to point out deficiencies of a given program. The software used in this study was written with a standard computer language and can be ported to a scanner console.
This study had a few limitations. First, because of a limitation in the centerline algorithm, the QA software made measurements only at the proximal end of the rectum, potentially leading to underreporting of rectal fluid and distention. Second, cases that could not be processed by the centerline software were excluded. This limitation mainly affected poorly distended colons, but substitute cases with poor distention were found and used. Extremely poor distention is underrepresented in this data set, although presumably in clinical application such poor quality would be obvious. It is possible that automated insufflators may prevent extremely poor distention [6, 8]. Third, the radiologists scored quality on 2D images. The use of 3D images might have led to better results, particularly in distention scoring. Fourth, we used only one scan per patient (supine or prone). This choice was advantageous for the statistical analysis because it avoided intrasubject correlation. However, in clinical application, it may be advantageous to register the supine and prone scans to detect colonic segments with poor quality on both scans. Fifth, we did not evaluate the ability of radiologists to detect polyps in colonic segments of differing quality. Although it probably is the reference standard for the quality of CTC, ability to detect polyps includes many other components, such as reader expertise and polyp size distribution. Consequently, such a study would require many more cases and be difficult to perform. Sixth, rather than use a randomized or consecutive series, we specifically chose cases on the basis of degree of quality. The purpose of this study was not to provide an unbiased assessment of quality in a cohort of patients but to validate the software. Hence a diverse spectrum of quality was required. Seventh, our QA software requires the use of oral contrast material for fluid tagging because it is much more difficult to assess residual fluid without tagging. The best reported CTC results, to our knowledge, were obtained in a study in which fluid tagging was used [1].
In conclusion, the findings with the QA software agreed with radiologists' assessments of quality. The software may help improve the quality of CTC examinations and lead to better polyp detection.
Acknowledgments
We thank William O. Schindler for supplying CT colongraphy data.
|
|
|---|
This article has been cited by other articles:
![]() |
R. L. Van Uitert, R. M. Summers, J. M. White, K. K. Deshpande, J. R. Choi, and P. J. Pickhardt Temporal and Multiinstitutional Quality Assessment of CT Colonography Am. J. Roentgenol., November 1, 2008; 191(5): 1503 - 1508. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |