|
|
||||||||
Original Research |
1 Department of Radiology, Centre Hospitalier Universitaire de Charleroi,
Charleroi, Belgium.
2 Department of Radiology, Hôpital Erasme, Université libre de
Bruxelles, Route de Lennik 808, B-1070 Brussels, Belgium.
3 Service of Biostatistics and Medical Informatics and Institut de Recherche
Interdisciplinaire en Biologie Humaine et Moléculaire,
Université libre de Bruxelles, Brussels, Belgium.
Received August 11, 2007;
accepted after revision February 5, 2008.
Address correspondence to P. A. Gevenois.
Abstract
|
|
|---|
SUBJECTS AND METHODS. One hundred two patients without a history of abdominal surgery underwent unenhanced and contrast-enhanced CT for the evaluation of cancer. Three radiologists with varying degrees of experience read scans twice in separate sessions. They were asked to identify the appendix, to score their confidence in identification, and to mark the appendix on the images. Intraabdominal fat volume was measured with a computer-assisted method. Independent experts compared the readers' markings and indicated whether the findings were reproducible.
RESULTS. Reproducibility differed significantly between reading
sessions (p < 0.001) and readers (p = 0.003). On the
images of 71% of the patients, there was perfect intrareader and interreader
agreement with statistically significant and positive influences of patient
body mass index (p = 0.005) and intraabdominal fat volume (p
= 0.001). Contrast enhancement influenced intrareader reproducibility only for
the reader who made less-reproducible interpretations (p = 0.033).
Intrareader and interreader agreement in categorizing confidence in
identification of the appendix ranged from fair to good (
=
0.221–0.620). Confidence was not influenced by contrast enhancement
(p = 0.433–0.953), body mass index, or intraabdominal fat
volume (p = 0.058–0.798).
CONCLUSION. Reproducibility in identifying a normal appendix is reader dependent. Perfect intrareader and interreader agreement in marking the appendix occurs approximately 70% of the time and increases with patient body mass index and intraabdominal fat volume. Contrast enhancement does not influence the rate of identification of the appendix or reader confidence but may influence the reproducibility of findings.
Keywords: appendix CT gastrointestinal radiology
|
|
|---|
The purpose of our study was to prospectively investigate the reproducibility of readers' findings, the confidence of readers in identification of a normal appendix, and the influence of a number of factors on a reading. These factors included body mass index (BMI), intraabdominal fat volume, appendiceal diameter, appendiceal position, presence of gas in the lumen of the appendix, and use of IV contrast enhancement.
|
|
|---|
Between April and December 2005, 106 consecutively enrolled patients were included in this prospective study protocol on the basis of the following criteria: evaluation of a cancer diagnosis, no intraabdominal digestive tumor, no history of abdominal surgery (according to information on patient questionnaire, medical chart review, and inspection of the abdominal wall), and no symptoms or signs suggesting the presence of acute abdominal inflammatory disease. Patients unsure whether they had undergone abdominal surgery were excluded. Four patients in whom CT revealed ascites also were excluded. Among the other patients, none had evidence of peritoneal metastasis. The final study group consisted of 102 patients (27 women, 75 men; mean age, 63 years; age range, 36–94 years) with the following primary malignant diseases: pulmonary cancer (n = 57), breast cancer (n = 14), prostate carcinoma (n = 9), esophageal carcinoma (n = 7), extra abdominal lymphoma (n = 5), testicular carcinoma (n = 3), head and neck carcinoma (n = 3), melanoma (n = 2), renal carcinoma (n = 1), and penile carcinoma (n = 1). Follow-up was performed by review of the medical charts. We also confirmed that no patient had acute appendicitis within 1 month after inclusion in the study group. Patient body habitus was evaluated with BMI (weight in kilograms divided by height squared in meters) calculated.
|
|
|
|
Our standard protocol was slightly modified for this study, and scanning of the entire abdomen was performed twice, first without IV contrast material and then during the portal venous phase after IV administration of 120 mL of a contrast agent containing 350 mg I/mL (iobitridol, Xenetix 350, Guerbet) at 3 mL/s with a dual power injector (Empower, Bracco Diagnostics). No patient received oral or rectal contrast material. The absorbed radiation dose per acquisition, expressed as volume CT dose index, was 10.7 mGy. This dose did not exceed that recommended by the National Radiological Protection Board (14 mGy) and the Commission of the European Union (weight CT dose index, 35 mGy) for abdominal CT examinations for this indication [13, 14].
Image Analysis
Reconstructed images were stored and read on a clinical workstation with
real-time multiplanar reformation (MPR) display (Wizard, Siemens Medical
Solutions). Patient demographic identifiers were removed for image analysis.
Unenhanced and contrast-enhanced scans were read independently by three
readers. Because imaging the appendix in community hospitals such as ours is
managed by both general radiologists and residents, two readers were
board-certified radiologists skilled in reading CT scans in routine clinical
workflow (together with other CT scans) and emergency settings (readers A and
B). One reader was a first-year radiology resident with 3 months of experience
in reading CT scans in a community hospital (reader C).
Unenhanced scans were read before contrast-enhanced scans in separate and independent reading sessions and with a 2-week minimum interval. One month after this reading, the process was repeated with another 2-week minimum interval between unenhanced and contrast-enhanced scans. To minimize reader-order bias, time lag between reading sessions was introduced, and no feedback was available to the readers [15]. Readers were aware of the number of patients included in the study, but they were blinded to strategy and other information.
Readers read the scans in any plane they chose with the MPR display and were free to adapt the window settings at their convenience. At each reading, the readers were asked to simultaneously categorize whether the appendix was identifiable or not and their confidence in visualization (unidentifiable, visualized without certainty, identified with certainty) and to the mark appendix by electronically tracing its contours on all transverse images that contained the appendix. Although readers used the MPR display in any plane, the contours were traced only on transverse scans to make feasible the tasks of the experts who also read the scans. All marked images were photographed on hard copies (Figs. 1A, 1B, 1C, and 1D).
Independent Experts
To investigate the reproducibility of the readers' findings, we recruited
independent experts: two board-certified academic radiologists specialized in
abdominal CT, one with 6 years and one with 17 years of routine experience in
reading abdominal CT scans. The experts worked in consensus in all the
following steps. First, using hard copies of the images, they compared the
structure marked by the readers and coded yes or no whether the readers marked
the same structure at the first reading session (separately on unenhanced and
contrast-enhanced CT scans). Second, the same procedure was done between the
reading sessions of each reader (separately on unenhanced and
contrast-enhanced CT scans). Third, the same procedure was undertaken between
unenhanced and contrast-enhanced CT scans at the first reading sessions of
each reader. Fourth, on the workstation used by the readers, the experts coded
the visibility of the appendix as visible or not visible; if the appendix was
coded visible, position, diameter, and gas content were recorded. Position was
coded as anterior, lateral, medial, or posterior to the cecum; content was
coded as presence of gas or absence of gas. Finally, the experts coded as yes
or no concordance between the structure they considered the appendix and the
structure marked by the readers at each reading session. The experts were not
asked to record their confidence in identifying the appendix.
Intraabdominal Fat Measurement
The volume of intraabdominal fat was measured on unenhanced scans with an
interactive volume definition method available on the workstation. The expert
with 6 years of routine experience performed this task. With the PC mouse,
freehand regions of interest were traced within the muscle wall surrounding
the abdominal cavity as used in metabolic studies
[16,
17]. This procedure was done
on lower-abdominal images on 3-mm-thick transverse sections spaced with 21-mm
intervals from the tip of the iliac crest to the superior aspect of the
symphysis pubis (determined on the coronal and sagittal reformations
automatically acquired with the program). Thereafter, the system automatically
interpolated regions of interest on images included in these 21-mm intervals
and calculated the volume occupied by voxels with attenuation values ranging
from –150 to –50 H
[18].
Statistical Analysis
Quantitative variables are expressed as mean ± standard error of the
mean. Reproducibility between readers and reading sessions in marking the same
structure was quantified by the proportion of concordance. Intrareader and
interreader agreement in categorizing confidence in detection of the appendix
was investigated by calculation of Cohen's kappa statistics with their
asymptotic standard error
[19]. The null hypothesis of
no agreement between observers was tested, and associated p values
were calculated [20]. All
kappa values were interpreted as proposed in the literature
[21]. A kappa value less than
0.20 indicated poor agreement; 0.21–0.40, fair agreement;
0.41–0.60, moderate agreement; 0.61–0.80, good agreement; and
0.81–1.00, excellent agreement. To be consistent with clinical practice
in which CT scans are read only once, only readings from the first session
were considered for assessment of interreader agreement.
Student's t tests were used to compare means of quantitative continuous variables between two groups. Proportions were compared by use of Pearson's chi-square tests or Fisher's exact tests when appropriate. Relations between continuous variables were analyzed with Pearson's correlation coefficients. Logistic regressions (Enter procedure in SPSS for Microsoft Windows) [22] were used to evaluate the influence of different variables on the pro bability of a reader's confidence level, reproducibility of findings, and concordance with the expert findings. Statistical significance for all tests was set at p < 0.05. The statistical software used was SPSS for Windows (release 14.0, SPSS).
|
|
|---|
Visualization and Description of Appendix by Experts
The experts identified the appendix in 98 (96.1%) of 102 patients. The
diameter of the appendix averaged 5.4 ± 1.1 mm (range, 2.4–8.9
mm). The appendix was located against the anterior, lateral, medial, or
posterior aspect of the cecum in three (3.1%), 10 (10.2%), 53 (54.1%), and 32
(32.6%) of the 98 patients. The appendix contained gas in 62 (60.8%) of the
102 patients.
Identification of Appendix and Confidence in Interpretation by Readers
Frequency of identification on unenhanced and contrast-enhanced CT scans
according to confidence level of each reader during each reading session is
listed in Tables 1 and
2. Except for reader C at the
first reading (p = 0.016), there was no statistical difference in
level of confidence in identification between unenhanced and contrast-enhanced
CT regardless of reading session (p = 0.433–0.953). Intrareader
and interreader agreement in categorization of confidence on unenhanced and
contrast-enhanced CT scans is presented in
Figure 2.
|
|
|
Patient age, BMI, intraabdominal fat volume, appendiceal diameter, appendiceal position, and presence of gas in the appendiceal lumen were the six variables introduced in logistic regression to determine whether any of the variables were predictive of level of confidence level in identifying the appendix. Except for patient age for reader A on unenhanced CT scans (p = 0.018), none of the variables influenced reader confidence in interpreting unenhanced and contrast-enhanced CT scans (p = 0.058–0.996).
Reproducibility of Findings
Globally, reproducibility—expressed as the proportion of concordance
between readers or reading sessions in marking the same
structure—differed significantly between readers (p = 0.003),
between reading sessions of each reader for both unenhanced and
contrast-enhanced CT (p < 0.001), and between readings of
unenhanced and contrast-enhanced CT by each reader (p = 0.006). The
reproducibility of a given reader's interpretations between two reading
sessions was compared for unenhanced CT and contrast-enhanced CT conditions.
For reader A, this reproducibility was significantly lower for unenhanced CT
than for contrast-enhanced CT (80/102 vs 92/102, p = 0.033). This was
not the case for reader B (97/102 vs 99/102, p = 0.721) or reader C
(95/102 vs 94/102, p = 1.000), and there was no significant
difference between readers B and C (p = 0.487), between their
readings (p = 0.768), or between contrast-enhanced and unenhanced CT
scans (p = 0.373).
For 72 (70.6%) of the 102 patients (23 women, 49 men), there was perfect agreement among all readers at their two reading sessions in marking the same structure as the appendix. For 30 (29.4%) of the patients, there was disagreement between readers or readings. For seven patients, readers considered a different structure the appendix, and for 23 patients, one or two of the three readers did not visualize the appendix at all. In a comparison of the group of patients with perfect agreement between readers and those with disagreement, BMI (p = 0.005), intraabdominal fat volume (p = 0.001), presence of gas in the appendiceal lumen (p = 0.003), and appendiceal position (p = 0.017), medial against the cecum having a higher rate of agreement than other positions (p = 0.046), had statistically significant and positive effects on the perfect agreement, but age (p = 0.114) and appendiceal diameter (p = 0.505) did not. A case of intrareader disagreement in marking the appendix is shown in Figures 3A, 3B, 3C, 3D, 3E, 3F, 3G, and 3H.
|
|
|
|
|
|
|
|
|
Concordance Between Readers and Experts
The frequencies of concordance between the experts and each reader at both
reading sessions are shown in Table
3. Globally, the proportions of concordance were significantly
different among readers on both unenhanced (p = 0.001) and
contrast-enhanced (p = 0.008) CT scans. Two by two, these proportions
were not significantly different between readers B and C on unenhanced and
contrast-enhanced CT scans (p = 1.000) but were different between
reader A and the two other readers on both unenhanced and contrast-enhanced CT
scans (p = 0.004–0.024).
|
Logistic regressions also were performed to look for possible effects of age, BMI, intraabdominal fat volume, appendiceal diameter, appendiceal position, and presence of gas in the appendiceal lumen on the probability of concordance between each reader and the experts. For reader A, a statistically significant effect was found between this concordance and patient age for contrast-enhanced CT at both reading sessions (p = 0.038, p = 0.027), for intraabdominal fat volume on unenhanced and contrast-enhanced CT scans at the first reading sessions (p = 0.021, p = 0.045) but not at the second reading sessions (p = 0.304, p = 0.059), and presence of gas in the appendiceal lumen on contrast-enhanced CT scans at both reading sessions (p = 0.017, p = 0.005). For the other readers, there was no significant effect of any of the six variables on the probability of concordance with the experts (p = 0.106–1.000).
|
|
|---|
Depending on the reader, the appendix was detected in 85–98% of patients, with certainty in 70–92% of the cases. With one exception, IV injection of iodinated contrast material did not influence the rate of identification of the appendix according to the level of confidence of our readers. The exception was the inexperienced reader at the first reading session and not thereafter, suggesting a likely learning effect. Our results are in accordance with those reported by Jacobs et al. [12] and confirm that IV contrast injection does not improve the rate of identification of a normal appendix. Our identification rates are very close to the 87% reported by Jan et al. [7], who used enteric contrast material in patients undergoing 16-MDCT, and are higher than the 76–81% reported by Benjaminov et al. [6] in a study performed with single-detector CT and MDCT scanners. Our higher rates can be explained at least in part by the use of MPR, which also was used by Jan et al. However, that there was disagreement between readings in approximately 30% of cases and that reproducibility depends on the reader should make us cautious in considering rates of visualization reported in studies based on identification of a normal appendix.
Even if coronal reformations improve a reader's confidence in identifying the appendix, either diseased or normal [11], our results show that confidence varies from reader to reader and from reading session to reading session, despite use of MPR. The results of our study show that according to the confidence of our readers, no patient characteristic influences rate of identification. This finding is in accordance with those of previous studies [7, 8] that showed absence of the effect of intraabdominal fat volume. This volume, however, influences the proportion of perfect intrareader and interreader agreement in marking the same organ and the concordance of one of our experienced readers with the experts. Such influence may explain why some investigators [6–8, 10] have concluded that intraabdominal fat volume influences the rate of identification of a normal appendix whereas others have not. In these studies in which an influence was found or not, amount of fat was subjectively ranked by the same readers as those who were asked to identify a normal appendix. To avoid bias, we used an objective computer-assisted method that is used in metabolic studies to measure volume of intraabdominal fat [16–18]. With this method, we observed significant and positive correlation between volume of intraabdominal fat and BMI similar to that reported by Kobayashi et al. [16]. As expected, BMI, appendiceal position, and gas content influence perfect agreement. The highest agreement is achieved on images of heavy patients with an appendix medial to the cecum and containing gas.
One of our readers (reader A) had a negative effect on the observed agreements. The interpretations were less reproducible than those of readers B and C. Reader A disagreed more frequently with each of the other two readers than they disagreed with each other. Reader A also disagreed more frequently with the experts than did either of the other readers. Nevertheless, no patient characteristics were consistently predictive of the agreements, and therefore disagreements, of this particular reader. We can speculate on a possible effect of reader expertise on reader performance, but our study was not designed to investigate such an effect. To investigate this effect, we would have to consider groups of readers with different levels of expertise instead of only three readers. In this study, one of our two experienced readers had less-reproducible findings than did the inexperienced reader. Similar results have been reported [24, 25] in studies including more- and less-experienced readers. We did not elicit a patient characteristic responsible for this poor reproducibility. There might have been other underlying reasons for this observation, but the aim of this study was not to examine the influence of other factors, such as environmental factors, on reader performance. Thus we cannot draw conclusions about the observed positive influence of the use of IV contrast material on the reproducibility of the findings of the reader with the less-reproducible interpretations.
Our study had several limitations. First, only five patients were in the underweight BMI category. Our study group, which consisted of rather heavy patients (average BMI, 26.1, corresponding to the overweight category), reflects a tendency in Western countries. Second, in accordance to our standard protocol, we did not administer an enteric contrast agent, which might have had a positive effect on the rate of identification of an appendix. However, results of previous studies [5, 6, 9, 26–28] with patients with and those without suspected acute appendicitis were based on CT acquisitions without enteric contrast medium and showed high diagnostic performance. Third, we did not compare the performance of general radiologists with that of abdominal imagers because imaging of the appendix is a process frequently managed by general radiologists and residents. It was thus important to not artificially elevate performance by including only abdominal imagers. Fourth, we considered only three readers with different levels of expertise and found unexpectedly lower reproducibility of performance by an experienced reader than by an inexperienced reader. Consequently, no definite conclusion can be drawn from these results; further investigations should be conducted with large groups of readers.
A fifth limitation was that we did not guarantee that our experts identified the appendix correctly. Because patients with a normal appendix generally do not undergo surgery, it was not possible to consider surgical findings as an independent method of reference to verify whether the structure detected with CT was truly the appendix. The experts' readings thus were not considered an absolute method of reference. Agreement between the readers and our experts was not expressed in terms of correctness and thus not used to quantify the readers' diagnostic performance. Finally, with a mean age approximating 63 years, the patients were older than those usually affected by acute appendicitis (mean age, 30 years) [29]. This limitation was inherent to this study, which was designed to investigate visualization of a normal appendix and not acute appendicitis. In addition, it would have been unethical to expose young healthy persons to unnecessary radiation.
The overall proportion of perfect agreement in marking the same organ as the appendix approximates 70% and increases with BMI, intraabdominal fat volume, presence of gas in the appendiceal lumen, and appendiceal position medial to the cecum. The results of this study suggest that IV contrast administration does not influence either the rate of visualization of a normal appendix or reader confidence but may influence the reproducibility of findings by one particular reader. Further studies involving large groups of readers with different levels of expertise are needed to confirm that reproducibility in visualization of a normal appendix depends on the reader and whether any specific patient characteristic determines variations in reproducibility of interpretations.
|
|
|---|
This article has been cited by other articles:
![]() |
S.-M. Joo, K. H. Lee, Y. H. Kim, S. Y. Kim, K. Kim, K. J. Kim, and B. Kim Detection of the Normal Appendix with Low-Dose Unenhanced CT: Use of the Sliding Slab Averaging Technique Radiology, June 1, 2009; 251(3): 780 - 787. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. T. Johnson, K. M. Horton, S. Kawamoto, J. Eng, M. J. Bean, S. J. Shan, and E. K. Fishman MDCT for Suspected Appendicitis: Effect of Reconstruction Section Thickness on Diagnostic Accuracy, Rate of Appendiceal Visualization, and Reader Confidence Using Axial Images Am. J. Roentgenol., April 1, 2009; 192(4): 893 - 901. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |