|
|
||||||||
Original Research |
1 Department of Radiology, Academic Medical Center, Meibergdreef 9, 1105 AZ
Amsterdam, Noord-Holland, The Netherlands.
2 Department of Radiology, Onze Lieve Vrouwe Gasthuis, Oosterpark 9, 1090 HM
Amsterdam, The Netherlands.
3 Department of Clinical Epidemiology and Biostatistics, Academic Medical
Center, University of Amsterdam, Meibergdrefef 9, 1105 AZ Amsterdam, The
Netherlands.
Received March 31, 2006;
accepted after revision August 18, 2006.
Address correspondence to S. Jensch
(s.jensch{at}amc.uva.nl).
Abstract
|
|
|---|
MATERIALS AND METHODS. Four observers (a radiologist, a radiologist
in training, and two radiographers) evaluated 145 data sets using a primary 3D
approach. The radiographers were part of our CT colonography work group and
underwent training that consisted of 20 cases. The reference standard was
optical colonoscopy with second-look colonoscopy for discrepant lesions
10 mm in diameter. Mean sensitivities per patient and per polyp stratified for
size (any size,
6 mm, and
10 mm) was determined for the radiologists
and radiographers. Specificity was determined on a per-patient basis.
RESULTS. At colonoscopy in 86 of 145 patients, a total of 317 polyps
were found (60 polyps
6 mm in 26 patients and 31 polyps
10 mm in 18
patients). No statistically significant differences were found in detection
rates between radiologists and radiographers. Sensitivities for patients with
a lesion of any size (66% for radiologists vs 65% for radiographers),
6
mm (81% vs 87%), and
10 mm (both 78%) were similar for all observers. On
a per-polyp basis, detection rates were equivalent regardless of polyp size
(47% vs 40%), for lesions
6 mm (71% vs 65%), and for lesions
10 mm
(69% vs 66%). Mean specificities were similar among patients without lesions
(31% vs 30%), patients without lesions
6 mm (71% vs 67%), and patients
without lesions
10 mm (93% vs 93%).
CONCLUSION. Radiographers with training in CT colonographic evaluation achieved sensitivity and specificity in polyp detection comparable with that of radiologists. Radiographers can be considered reviewers in the evaluation of CT colonographic images.
Keywords: abdominal imaging cancer colon CT colonography double reading gastrointestinal radiology radiographer screening second reader
|
|
|---|
10 mm) polyps, the detection rates cannot be consistently
reproduced
[6-8].
A possible explanation is the wide range of reviewer performance. The origins
of interobserver variability are not completely understood, but high volumes
of data and low disease prevalence, which lead to reviewer fatigue, may play a
role. These problems are of particular concern in screening, in which large
numbers of patients without symptoms are examined. A double-interpretation strategy similar to that used for mammographic screening may be feasible for limiting wide interobserver variability. Johnson et al. [7] found a 19-29% increase in overall sensitivity of CTC when a second reviewer was used. Double interpretation is time-consuming, increases costs, and may therefore not be feasible in every radiology department. Computer-aided diagnosis is a promising tool in development that may be used as a possible second reviewer [9-11]. Another alternative may be to deploy trained paramedical personnel as second reviewers [12, 13]. The aim of the study was to investigate the reviewer performance of two trained radiographers in comparison with that of two radiologists in the evaluation of CTC examinations of 150 patients by comparing the sensitivity and specificity of CTC in polyp detection with those of the reference standard, optical colonoscopy.
|
|
|---|
|
The medical ethics committee of the Academic Medical Center approved the aforementioned accuracy study and indicated that no additional approval and no additional informed consent from patients were required for this study.
Diagnostic Procedures
Bowel preparationPatients underwent preparation with a full
bowel cleansing consisting of 4-6 L of polyethylene glycol electrolyte
solution (Klean-Prep, Helsinn Birex Pharmaceuticals) on the day before and/or
the day of the examination.
CTCCTC was performed 1 hour before colonoscopy. A bowel relaxant (20 mg of butylscopolamine bromide, Buscopan, Boehringer-Ingelheim, or 1 mg of glucagon hydrochloride, Glucagen, Novo Nordisk) was administered IV. Distention of the colon was achieved by manual insufflation of approximately 2 L of air containing 13.4% carbon dioxide. CT scans were obtained in the supine and prone positions. Each examination was performed with a 22-second breath-hold. The scans were obtained with a 4-MDCT scanner (Mx8000, Philips Medical Systems) with the following parameters: 120 kV; collimation, 4 x 2.5 mm; rotation time, 0.75 second; pitch, 1.25 (table feed per rotation / total collimation); slice thickness, 3.2 mm; reconstruction interval, 1.6 mm; standard medium-sharp reconstruction filter.
At the beginning of the study, scanning was performed with an effective
tube current of 100 mAs. In the course of the study, however, it became clear
that substantial radiation dose reduction did not impair sensitivity or
specificity [14,
15], and we reduced the
effective tube current according to the abdominal circumference of the
patient. Slender (
87.5 cm) patients were scanned at 25 mAs, medium-sized
(87.5-102.5 cm) patients at 40 mAs, and larger (> 102.5 cm) patients at 70
mAs. The estimated effective dose for a complete examination (supine and
prone) was 5 mSv for an average-sized patient.
ColonoscopyColonoscopy was performed by an experienced staff member (gastroenterologist or gastrointestinal surgeon) or by a gastroenterology fellow under direct supervision of the attending experienced staff member of the endoscopy departments of the Academic Medical Center and the Slotervaart Hospital. While performing colonoscopy, the endoscopist did not know the CTC findings. Patients received 2.5-7.5 mg of midazolam (Dormicum, Roche) and 0.05-0.1 mg of fentanyl (Fentanyl-Janssen, Janssen Pharmaceuticals) on request. The examination was attended by the research fellow and recorded on videotape. The size, morphologic features, and segmental location of polyps were documented on a case record form by the endoscopist who performed the examination and by the attending research fellow. Polyp size was estimated before removal on the basis of comparison with an open biopsy forceps of known size (8 mm). A polyp was considered flat if its height was less than one half the diameter of the lesion.
Determination of Lesion Status
Two research fellows not involved in interpreting the findings matched CT
colonographic and colonoscopic (reference standard) findings. Face-to-face
comparison was made of the CTC and colonoscopic images. A polyp detected on
CTC was labeled a true-positive finding on the basis of three criteria. First,
the segmental location of the CTC finding had to correspond with the segmental
location indicated on the case record form or with the adjacent segment.
Second, the polyp size estimated by the endoscopist had to correspond with the
CTC measurement. Third, the appearance of the lesions had to closely resemble
that of the corresponding polyp at the videotaped colonoscopic examination.
Because polyp size estimation at colonoscopy is prone to error, we accepted a
margin of error of 3 mm for polyps < 6 mm and of 5 mm for polyps
6
mm.
If unexplained false-positive findings
10 mm were found, second-look
colonoscopy was performed to verify whether these lesions were indeed
false-positive findings. In the case of second-look colonoscopy, the
endoscopist was informed of the location, morphologic features, and measured
size of the lesion on CTC. This step was taken only in the initial study for
lesions detected by the radiologists.
CTC Data Evaluation
ReviewersFour observers with different levels of experience
reviewed all data. Reviewer 1 was a radiologist with 9 years of experience in
abdominal radiology. This abdominal radiologist had interpreted more than
9,000 abdominal CT examinations. Reviewer 2 was a radiologist in training who
had been involved in research on CT and MR colonography and had attended
approximately 50 colonoscopic examinations. As part of a research project,
reviewer 2 had compared 50 CTC cases with videotaped colonoscopic examinations
in a face-to-face manner. Reviewers 1 and 2 had evaluated the 150 data sets
presented as part of a larger accuracy study
[5] on CTC and had both
evaluated more than 50 CTC cases before this study.
Reviewers 3 and 4 were radiographers with more than 5 years of experience in CT examinations. They were part of the CTC work group at our institution, and each had performed approximately 50 CTC examinations. Reviewers 3 and 4 had no experience in the evaluation of CT or CTC examinations.
Training in review of CTC imagesThe radiographers trained by evaluating 20 complete (supine and prone) CTC data sets. The results of the evaluations were checked, and feedback was provided by the research fellow with use of the videotaped colonoscopic examinations. All reviewers received the same instructions for data review.
Review methodThe observers were blinded to all clinical findings and the colonoscopic results. The examinations were evaluated on a workstation with a primary 3D unfolded cube review technique with axial 2D and multiplanar reconstruction images for problem solving (EasyVision, Philips Medical Systems) [16]. All reviewers were scheduled free from clinical work to interpret the examination findings. No more than 10 cases were interpreted per session. The reviewers scored the presence, morphologic features, size, and location of polyps or colorectal cancer. Observers were asked to provide a degree of confidence regarding polyp presence (0%, no polyp; 25%, probably no polyp; 50%, possibly a polyp; 75%, probably a polyp; 100%, certainly a polyp). Only lesions on CTC scans that were scored with a certainty of 50% or more were considered for analysis. Review time was recorded with a stopwatch.
Outcome Parameters
Because of the potential future role of CTC in screening for colorectal
adenoma and carcinoma, we used per-patient sensitivity and specificity as the
main outcome parameters. We also calculated per-polyp sensitivity and the
false-positive rate. All results were stratified according to cutoff values of
6, 7, 8, 9, and 10 mm.
Per patientPer-patient sensitivity was defined as the number of patients with at least one lesion detected with CTC relative to the number of patients with polyps identified during colonoscopy. In the size-stratified analysis, a patient in whom polyps were detected with CTC was considered to be a true-positive patient if at least one polyp in the respective size range was seen with colonoscopy. A patient was considered a false-positive patient when no polyps or only those in a smaller size category were detected during colonoscopy.
Per-patient specificity was defined as the number of patients with no polyps detected during colonoscopy relative to the number of patients without polyps at colonoscopy. In the size-stratified analysis, a patient in whom no polyps had been detected with CTC was considered a true-negative patient if no polyps in that respective size range or larger were seen with colonoscopy. A patient was considered a false-negative patient if polyps of that size or larger were detected during colonoscopy.
Per polypPer-polyp sensitivity was defined as the number of polyps detected with CTC relative to the number of polyps identified during colonoscopy. False-positive findings were CTC findings that did not match endoscopic findings as documented on the case record form and the colonoscopic videotape.
Interobserver agreementInterobserver agreement was determined by calculating the agreement in percentages on a per-polyp basis for colonoscopically confirmed lesions. Agreements were calculated for all polyps and according to cutoff values of 6, 7, 8, 9, and 10 mm. Reviewers were considered in agreement if both recorded the same lesion or if both recorded no findings.
Predictive valuesPredictive values were determined for the aforementioned size categories. Positive predictive value was defined as the proportion of patients with a true-positive finding among all patients with positive findings on CTC. Negative predictive value was defined as the proportion of patients with a true-negative finding among all patients without findings on CTC.
Combined sensitivity per patientFor screening purposes, a double-interpretation strategy can be applied. We calculated sensitivity after combining the results of different reviewers. For this purpose, true-positive lesions identified at CTC by two observers were summarized for calculation of combined sensitivity. This calculation was performed for each set of two observers: observer 1 plus observer 2, observer 1 plus observer 3, and so on. Subtracting the sensitivity of the observer with the highest value from the combined sensitivity made it possible to determine an increase in sensitivity.
Statistical Analysis
Differences in sensitivity and specificity between observers were tested
for significance using the McNemar statistic. In addition, sensitivity and
specificity for the radiologists (reviewers 1 and 2) and radiographers
(reviewers 3 and 4) as groups were compared by use of the McNemar test.
Statistical significance was considered p < 0.05.
|
|
|---|
10 mm),
29 medium (6-9 mm), and 257 small (< 6 mm). These polyps were found in 86
(59%) of 145 patients. In 18 (12%) of the patients, the largest polyp was
10 mm; in 26 (18%) of the patients, a polyp measuring at least 6 mm was found.
No histologic findings were retrieved on 87 polyps. Eighty-eight (28%) of the
317 polyps were adenomas. Two colorectal carcinomas were found and were
categorized among lesions
10 mm. Among the 317 polyps, 232 had a sessile
morphology, 25 pedunculated, and 60 flat.
Colonoscopy initially revealed that 16 (89%) of 18 patients had large
polyps. Two second-look endoscopic examinations were performed because two
lesions
10 mm detected with CTC but not with colonoscopy could not be
explained on the colonoscopic videotape. These lesions proved to be polyps
initially missed at conventional colonoscopy.
CTC
The median review time for a complete supine and prone CTC examination was
16 minutes (range, 7-37 minutes) for reviewer 1, 13 minutes (range, 7-48
minutes) for reviewer 2, 16 minutes (range, 8-80 minutes) for reviewer 3, and
20 minutes (range, 8-74 minutes) for reviewer 4.
Per Patient
Sensitivity and
specificityTable 2
shows the performance characteristics of all observers for CTC according to
polyp size per patient and per polyp. All reviewers correctly identified 14
(78%) of the 18 patients with at least one large polyp (
10 mm). The same
four patients with large polyps were missed by all observers. Three of four
missed cases were flat adenoma. One patient had a pedunculated adenoma. After
unblinding of the colonoscopic results, we could only positively identify one
patient in retrospect on the CTC examination (Fig.
1A,
1B).
|
|
|
Per patient, no significant differences in sensitivity stratified for polyp
size were observed between reviewers or between groups (radiologists vs
radiographers). Specificity among patients without large polyps ranged between
91% and 94% for the observers. Specificity values were comparable except for
lesions
6 mm; reviewers 2 and 4 had significantly more false-positive
findings (p < 0.05). For all other thresholds, no statistically
significant differences were found between observers or between groups.
Predictive valuesTable
3 shows the predictive values for the reviewers. Because the
number of false-positive findings among patients with large lesions was
relatively small, mean negative predictive values for patients without polyps
10 mm were high (97%). The positive predictive values for identification
of patients with large polyps were 64% for reviewer 1, 56% for reviewer 2, 61%
for reviewer 3, and 58% for reviewer 4.
|
Per Polyp
SensitivityIn Table
2 the performance characteristics are displayed on a per-polyp
basis stratified for size. Reviewers 2 and 3 correctly identified 22 (71%) of
31 large polyps; reviewers 1 and 4 detected 21 (68%) and 19 (61%) large
polyps. Reviewer 4 had higher false-negative rates for all categories;
however, only the false-negative rate for polyps of all sizes was
significantly higher than that of the other reviewers (p < 0.05).
Consequently, a significant difference in sensitivity between radiologists and
radiographers as groups was also found for polyps of all sizes but not in the
other categories.
False-positive
findingsTable 4
shows the false-positive findings stratified according to lesion size. For
every size category, the radiographers as a group had more false-positive
findings than did the radiologists. This difference was not statistically
significant. False-positive lesions
10 mm found by the radiographers were
checked by the research fellow. Follow-up colonoscopic examinations performed
as part of the patient surveillance program also were evaluated for new
lesions. None of the large false-positive lesions had to be reassigned
true-positive status. The false-positive lesions
10 mm found by the
radiologists are described earlier (Determination of Lesion Status).
|
Interobserver Analysis
Most reviewers detected and missed the same lesions. This phenomenon was
most apparent for reviewers 1 and 2, with interobserver agreement ranging from
good to excellent (81% for all lesions, 88% for lesions
6 mm, and 97% for
lesions
10 mm). The other reviewers had good interobserver agreement,
which ranged from 71% to 91% (Table
5).
|
Combined Sensitivity per Patient
Because every observer detected and missed the presence of lesions
9
mm and
10 mm in the same patients, no increase in sensitivity (79% and
78%, respectively) was found when results were combined. The combined
sensitivity for lesions
8 mm for any set of two observers was 82%
(18/22); the sensitivity increased 5% (one patient), from 77% to 82%, when the
results for reviewers 1 and 3 were combined. For all other combinations, no
increase was observed. For lesions
7 mm, the combined sensitivity was 83%
(20/24), an increase of 5% (one patient) if reviewer 1 was combined with
reviewer 3. For all other combinations, no increase was observed. For lesions
6 mm, no increase in sensitivity was found when the results of any of the
observers were combined.
|
|
|---|
In the literature, similar rates of polyp detection on CTC have been reported for radiologists and radiographers. Bodily et al. [12] found that in a selected data set of 50 cases, nonradiologists correctly identified polyps in 78% of patients with large lesions; the rate was 81% for radiologists. In 2006, in a European multicenter study [17] in which the performance of CTC-experienced radiologists was compared with that of recently trained radiologists and radiographers, the results were similar for the two groups. Although experienced reviewers found more lesions, detection rates were comparable among the newly trained radiologists and radiographers. Newly trained radiologists detected 71% of all cancers and 46% of large polyps versus 73% and 39% for the radiographers.
Our findings differ from those in the earlier studies because a different review method was used to evaluate the CTC examinations. We used a primary 3D evaluation approach [3, 5]. This more intuitive evaluation technique has greater conspicuity and exposure times to polyps than a primary 2D method. A primary 3D method therefore may be preferable for radiographers who have no experience in evaluating abdominal CT scans. Although whether radiographers would perform in a similar manner with a primary 2D approach cannot be distilled from our findings, findings in the earlier studies [12, 17] suggest as much.
Median review times in this study were higher for radiographers than for radiologists as a group (18 vs 14 minutes). This finding is in line with those in a study by Burling et al. [18] in which (experienced) radiologists using 2D technique interpreted images faster than did radiographic technicians, especially in cases in which there were pathologic findings. In that study, radiographers performed significantly better with longer review times, although large variation existed among newly trained reviewers. In our study, more time was needed for radiographers as a group to perform similarly, but individual review times differed considerably between the two radiographers. Reviewer 3 (a radiographer) needed the same median time to review an examination as the experienced abdominal radiologist. The enhanced 3D viewing method used in this study allowed single (one-way) navigation through the colon, considerably reducing viewing time [16]. The review times in this study were therefore comparable with those reported in studies with a primary 2D approach [7, 8, 19]. It is important to understand, however, that all reviewers were scheduled free from clinical work, and no time limit was imposed. These data therefore cannot be extrapolated to a production environment, and we do not know whether radiographers can perform as well in daily routine.
The radiographers who participated in this study were highly motivated and had great interest in CTC. They not only had performed many CTC examinations themselves but also had taken part in postprocessing of CTC images, such as segmentation and creating a centerline. The dedication of the radiographers was probably an essential element in their good performance in this study. We believe that because the radiographers were familiar with the software of the unfolded cube technique, an approach not widely used, it was also easier for them to interpret the images than if they had been unfamiliar with this technique.
Our study has several limitations. A short learning curve of only 20 cases for the radiographers was used to train the observers. Burling et al. (Burling D et al., presented at the 2005 annual meeting of the European Society of Gastrointestinal and Abdominal Radiology) reported that training of reviewers with 50 cases probably is not enough to achieve competence and that a learning curve was observed even after 350 cases. In addition, the reviewer experience was not the same for every observer because the radiologists had evaluated the patients as part of a larger accuracy study [5]. Review experience was especially distorted at the end of the study, the radiologists having almost twice as much reviewer experience as the radiographers. Although no feedback was provided during the course of the study, this disadvantage might have been present among the radiographers. Detection rates, however, were comparable, and this finding affirms the capabilities of radiographers in the evaluation of CTC images.
For calculation of interobserver variability, kappa value is the accepted statistic. Because this measure strongly depends on disease prevalence, we did not calculate kappa values on a per-patient basis but calculated indexes of positive and negative categories for colonoscopically proven lesions. With this method, false-positive findings are ignored in the calculation of agreement measures.
Some of the CT examinations evaluated by the radiographers were also performed by those radiographers. The radiographers, however, had no knowledge of previous findings, and because there was considerable time between the examination and the evaluation, we do not believe this factor influenced their results.
In this study, a balloon-tipped tube was inserted to insufflate the colon. The balloon was inflated with water and obscured the distal rectum. As a consequence, one large polyp in the rectum was missed, and most likely the use of air for balloon insufflation would have prevented this problem. Because adequate distention has been reported with only a thin rectal tube without a balloon [20], we no longer use balloon-tipped catheters. We use only a thin rectal tube with a small balloon that is deflated with the patient in the prone position.
Our patients received full bowel preparation, but such preparation may not be desirable for screening [2, 21-24]. Bowel preparation did not include tagging of the residual fluid with a contrast agent, which is currently considered standard practice. The lack of an oral contrast agent might have influenced our results negatively, because differentiation of polyps from feces is more difficult without contrast material, and polyps can be obscured in residual fluid. The challenge of discriminating tagged feces and polyps in a (reduced) bowel preparation has not been put to the test in this study.
Although this study showed no or only a slight increase in sensitivity for combined interpretation, a double-interpretation strategy, as in mammographic screening, may be a good review method for optimizing accuracy [7, 12]. In screening, however, the sheer number of patients markedly increases the workload of radiologists, and double interpretation by radiologists is probably not cost-effective in that situation. This problem may hamper implementation of CTC as a screening technique for colorectal adenoma and carcinoma. Computer-aided detection is an instrument under development that has had good initial results [9-11] and has recently gained regulatory approval. This promising tool may have an important role as a second review in CTC. Another good alternative may be to deploy a radiographer instead of a radiologist as a second reviewer. In that way the radiographer would alleviate some of the workload for radiologists, and costs would not be as high as with employment of two radiologists.
In conclusion, the results of this study suggest that dedicated radiographers trained in interpretation of CTC examinations can achieve accuracy comparable with that of radiologists in the evaluation of CTC. The results imply that deployment of a radiographer as a reviewer in CTC is acceptable. This finding is of particular interest in double-interpretation screening.
Acknowledgments
We thank Henk W. Venema for critical review of the manuscript.
|
|
|---|
This article has been cited by other articles:
![]() |
A. H. Dachman, K. B. Kelly, M. P. Zintsmaster, R. Rana, S. Khankari, J. D. Novak, A. N. Ali, A. Qalbani, and J. G. Fletcher Formative Evaluation of Standardized Training for CT Colonographic Image Interpretation by Novice Readers Radiology, October 1, 2008; 249(1): 167 - 177. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Hock, R. Ouhadi, R. Materne, A.-S. Aouchria, I. Mancini, T. Broussaud, P. Magotteaux, and A. Nchimi Virtual Dissection CT Colonography: Evaluation of Learning Curves and Reading Times with and without Computer-aided Detection Radiology, September 1, 2008; 248(3): 860 - 868. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Jensch, A. H. de Vries, J. Peringa, S. Bipat, E. Dekker, L. C. Baak, J. F. Bartelsman, A. Heutinck, A. D. Montauban van Swijndregt, and J. Stoker CT Colonography with Limited Bowel Preparation: Performance Characteristics in an Increased-Risk Population Radiology, April 1, 2008; 247(1): 122 - 132. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |