|
|
||||||||
1 Department of Radiology, Box 1667, University of California School of
Medicine, San Francisco, CA 94143-1667.
2 Present address: Department of Radiology, University of Wisconsin Medical
School, E3/311 Clinical Science Center, 600 Highland Ave., Madison, WI
53792-3252.
3 Department of Radiology, Box 357115, University of Washington Medical Center,
1959 N.E. Pacific St., Seattle, WA 98195.
Received August 17, 2001;
accepted after revision May 13, 2002.
Presented in part at the annual meeting of the American Roentgen Ray
Society, Seattle, AprilMay 2001.
Abstract
|
|
|---|
MATERIALS AND METHODS. We analyzed 48,281 consecutive mammography examinations for which previous mammography (9825 diagnostic, 38,456 screening) had been performed between 1997 and 2001, collecting data on demographics, whether comparison actually was made with previous examinations, abnormal findings (recall for screening mammography or biopsy recommendation for diagnostic mammography), biopsy yield of cancer, cancer detection rate, size of invasive cancers, axillary nodal status, and cancer stage.
RESULTS. Comparison with previous examinations in the incidence screening setting decreases the recall rate from 4.9% to 3.8% (p < 0.0001) but does not significantly affect the biopsy yield (40-44%, p = 0.56) or the cancer detection rate (5.5-5.2/1000, p = 0.87). In the diagnostic setting, comparison with previous examinations increases the biopsy-recommended rate from 4.3% to 9.4% (p < 0.0001), the biopsy yield from 38% to 51% (p = 0.12), and the overall cancer detection rate from 11/1000 to 39/1000 (p < 0.0001). Comparison with previous examinations is not associated with a significant difference in mean tumor size. However, it is associated with a significant decrease in the frequency of axillary node metastasis and the cancer stage for screening mammography, but not for diagnostic mammography.
CONCLUSION. For screening mammography, comparison with previous examinations significantly decreases false-positive but not true-positive findings and permits detection of cancers at an earlier stage. For diagnostic mammography, comparison with previous examinations increases true-positive findings.
|
|
|---|
Pooling of prevalence and incidence screening examinations confounds the demonstration of any effects of comparison with previous films for several reasons. First, comparison is pertinent only for incidence screening because, by definition, no previous films exist for prevalence (baseline) screening. Second, there is a higher rate of breast cancers in the prevalence screening round than in the incidence screening round. Third, separate from the effects of actual comparison with previous films, the simple existence of one or more previous examinations itself reduces the number of potentially abnormal mammographic findings (some will have been fully worked up at diagnostic imaging or percutaneous biopsy and found to be benign, and others will have been completely excised), thereby affecting clinical outcomes. In our study, we eliminate these confounding effects by limiting our analysis to incidence screening examinations, analyzing the differences in clinical outcomes for examinations interpreted with versus without comparison films.
It also has been shown that most clinical outcomes differ, sometimes significantly, for screening and diagnostic mammography [10, 11]. For this reason, we also designed this study to determine whether the value of previous comparison mammograms differs between screening and diagnostic examinations. Furthermore, we conducted a subset analysis for diagnostic examinations to explore the differential value of previous comparison mammograms as a function of indication for examination.
|
|
|---|
In our practice, screening examinations are designed to involve asymptomatic women, consist only of standard mediolateral oblique and craniocaudal mammograms of each breast, are interpreted in batches twice daily, and are compared with two previous examinations (when available). On the other hand, our diagnostic examinations are problem-solving procedures that use the full spectrum of mammographic projections, tailored by the radiologist during the examination for each specific patient to elucidate either unresolved imaging features identified at screening or unexplained clinical signs and symptoms.
During image interpretation, the radiologist recorded the indication for each examination on the basis of information provided by the patient or the referring clinician, according to the classification scheme in Appendix 1. We excluded from study those diagnostic mammography examinations performed as additional workup for screening examinations with abnormal findings, in order to avoid double counting of data already included in the screening cohort.
The radiologist also recorded whether previous mammograms were used for comparison, as well as standard Breast Imaging Reporting and Data System (BI-RADS) assessment categories separately for each breast [13].
We considered findings of a screening examination to be abnormal if either breast was assessed as BI-RADS category 0 (incomplete, need additional imaging), category 4 (suspicious), or category 5 (highly suggestive of malignancy). We considered findings of a diagnostic examination to be abnormal if either breast was assessed as BI-RADS category 4 or 5. For all screening and diagnostic findings interpreted as abnormal, we determined the outcome of biopsy (fine-needle aspiration or core or surgical biopsy) by searching the pathology database at our institution or by obtaining the information from the referring clinician [12]. For biopsies resulting in a diagnosis of malignancy (ductal carcinoma in situ or any invasive carcinoma), we also recorded the lesion size, axillary nodal status, and stage (based on the American Joint Committee on Cancer staging system [14]).
We used S-PLUS programming software for data calculations and statistical analysis (Insightful, Seattle, WA). The Student's t test was used for comparison of data having a normal distribution. The chi-square test was used to compare proportional data. For contingency tables with cell counts fewer than five, we used Fisher's exact test, which gives a valid p value with these small samples sizes. We performed these exact computations using StatXact software (Cytel, Cambridge, MA). A p value of less than 0.05 was considered statistically significant.
|
|
|---|
Mammography Interpretation
Results of mammography interpretation are shown in
Table 1. For screening
examinations, comparison with previous films significantly reduced the recall
rate (3.8% with comparison vs 4.9% with no comparison; p <
0.0001); whereas for diagnostic examinations, comparison with previous films
significantly increased the rate of abnormal findings (9.4% with comparison vs
4.3% with no comparison, p < 0.0001). Subset analysis of the
diagnostic mammography cases by indication for examination, shown in
Table 2, reveals that
comparison with previous examinations increases the rate of abnormal findings
significantly in all subgroups.
|
|
Biopsy Results
For screening examinations interpreted with previous films for comparison,
the biopsy yield of cancer was 44% (166 cancers from 379 biopsies), similar to
that observed when previous films were unavailable for comparison (40%, 37
cancers from 93 biopsies; p = 0.56). In contrast, for diagnostic
examinations interpreted with previous comparison films, the biopsy yield of
cancer was 51% (321 cancers from 631 biopsies). This yield was higher than
when previous films were unavailable for comparison (38%, 17 cancers from 45
biopsies), but this difference was not statistically significant
(p=0.12).
The same trend, also not statistically significant, is seen for subset analysis of diagnostic mammography patients by indication for examination (Table 3). Comparison with previous mammograms increases the biopsy yield in all three subsets for which comparison is possible: surveillance of cancer patients treated with breast preservation surgery (p = 0.66), patients with palpable masses (p = 0.29), and the miscellaneous category (p=0.44).
|
Cancer Detection Rate
For screening cases, the availability of previous mammograms did not have a
significant effect on the cancer detection rate (5.2/1000 with comparison
films vs 5.5/1000 without comparison films; p = 0.87). However, for
diagnostic mammography, a significant difference was found. The cancer
detection rate was more than three times as high when previous film comparison
was made (39/1000 vs 11/1000; p < 0.0001). Furthermore, as shown
in Table 4, significant
differences were also observed for all three diagnostic mammography subsets
for which comparison was possible.
|
Characteristics of Breast Cancers
Data for mean invasive cancer size, frequency of axillary nodal metastasis,
and frequency of stages 0 and I cancer are shown in
Table 5. For screening and
diagnostic examinations, some statistically significant differences were found
between cases interpreted with previous comparison films and those interpreted
without previous films. When previous comparison films were available for
incident screening, axillary nodal metastasis was less frequent and early
stage cancer was more frequent. Similar trends were observed for diagnostic
examinations, but the magnitudes of the observed differences were smaller and
none of the differences was statistically significant, which may relate to the
small number of cases in that group.
|
|
|
|---|
For diagnostic examinations, our study shows several effects of comparison with previous mammograms that are different from those found for screening mammography. For example, in our diagnostic patient population as a whole, comparison with previous examinations increases the rate of abnormal findings, the biopsy yield, and the cancer detection rate. The significant increase in abnormal findings indicates that more biopsies are recommended as a result of comparison with previous films, and the accompanying smaller magnitude increase in biopsy yield shows that most of these biopsies result in a diagnosis of cancer. This finding is reinforced by the significant increase in the cancer detection rate when previous comparison films were available. Thus, for diagnostic mammography the principal effect of comparison with previous films appears to be an increase in true-positive findings. Our study was not designed to provide data relating to specific mammographic findings, but the fact that our overall results are also observed in each of the subgroups for which comparison was possible (surveillance of cancer patients treated with breast preservation surgery, patients with palpable masses, and the miscellaneous category) suggests that a common mechanism accounts for our results.
We speculate that when cancer truly is present, comparison with previous films permits the visualization of interval progression of certain mammographic findings that otherwise might be judged to be benign or probably benign, leading to biopsy and prompt cancer detection. For example, in the lumpectomy group (patients receiving short-interval follow-up mammography after breast preservation surgery for cancer), it is easy to understand why the abnormal findings and cancer detection rates increase when comparison films are available: the mammographic finding of asymmetric density or architectural distortion will be interpreted as abnormal when it is seen to have increased in comparison with previous films, whereas the same finding likely would be ascribed to postsurgical scarring if previous films were unavailable. More detailed subset analyses, necessarily involving larger patient populations for sufficient statistical power, are needed to elucidate the reasons for this and other as-yet-unexplained observations.
Despite the apparent value of comparison with previous examinations for diagnostic mammography, we identified no significant difference in the prognostic indicators of tumor size, axillary nodal status, and tumor stage. We did observe a trend that early-stage tumors (stages 0 and I) and negative axillary nodes were somewhat more frequent when comparison films were available, although this trend was not statistically significant. However, further study of larger patient populations would be useful for better understanding the association between comparison with previous examinations and the detection and diagnosis of less-advanced-stage cancer.
The limitations of our study include that we were unable to randomize the comparison with previous films during image interpretation. Thus, it is possible that unknown confounding factors may be affecting our results. It would be worthwhile to conduct either a prospective or a controlled retrospective randomized analysis despite the logistic difficulties involved in such an undertaking. Another limitation of our study that relates only to screening mammography concerns our inability to exclude from analysis those cases for which comparison films were obtained many years before the current screening examination. Thus, given the long time between such examinations, these cases more closely represent prevalence rather than incidence screening, with the potential for interaction of the confounding factors described earlier in this section. However, we do not believe that this substantially affected our observed results because a retrospective review of the most recent 4000 screening examinations in our study (slightly > 10% of our screening population) shows that only 317 (8%) of these examinations were interpreted in comparison with films obtained more than 2 years previously.
In summary, the effects of comparison with previous examinations are complex. For screening mammography, comparison with previous films is associated with a decrease in the recall rate with no corresponding decrease in the cancer detection rate (fewer false-positive interpretations). However, the magnitude of this effect is more modest than reported in earlier published studies, probably because the earlier studies included substantial numbers of prevalence screening examinations. For diagnostic mammography, comparison with previous films is associated with increases in the rate of abnormal findings, the biopsy yield of cancer, and the cancer detection rate (more true-positive interpretations). We believe the fact that comparison with previous films is associated with the detection and diagnosis of less-advanced-stage cancer in the screening, but not the diagnostic group, deserves further study.
APPENDIX I. Indications for Mammography
|
|
|---|
|
|
|
|---|
This article has been cited by other articles:
![]() |
S. L. Jackson, S. H. Taplin, E. A. Sickles, L. Abraham, W. E. Barlow, P. A. Carney, B. Geller, E. A. Berns, G. R. Cutter, and J. G. Elmore Variability of Interpretive Accuracy Among Diagnostic Mammography Facilities J Natl Cancer Inst, June 2, 2009; 101(11): 814 - 827. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. L. Miglioretti, R. Smith-Bindman, L. Abraham, R. J. Brenner, P. A. Carney, E. J. A. Bowles, D. S. M. Buist, and J. G. Elmore Radiologist Characteristics Associated With Interpretive Performance of Diagnostic Mammography J Natl Cancer Inst, December 19, 2007; 99(24): 1854 - 1863. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Skaane, S. Hofvind, and A. Skjennald Randomized Trial of Screen-Film versus Full-Field Digital Mammography with Soft-Copy Reading in Population-based Screening Program: Follow-up and Final Results of Oslo II Study Radiology, September 1, 2007; 244(3): 708 - 717. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. A. J. Roelofs, N. Karssemeijer, N. Wedekind, C. Beck, S. van Woudenberg, P. R. Snoeren, J. H. C. L. Hendriks, M. Rosselli del Turco, N. Bjurstam, H. Junkermann, et al. Importance of Comparison of Current and Prior Mammograms in Breast Cancer Screening Radiology, January 1, 2007; 242(1): 70 - 77. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Blanchard, J. A. Colbert, D. B. Kopans, R. Moore, E. F. Halpern, K. S. Hughes, B. L. Smith, K. K. Tanabe, and J. S. Michaelson Long-term Risk of False-Positive Screening Results and Subsequent Biopsy as a Function of Mammography Use. Radiology, August 1, 2006; 240(2): 335 - 342. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Horsch, M. L. Giger, C. J. Vyborny, L. Lan, E. B. Mendelson, and R. E. Hendrick Classification of Breast Lesions with Multimodality Computer-aided Diagnosis: Observer Study Results on an Independent Clinical Data Set. Radiology, August 1, 2006; 240(2): 357 - 368. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Hadjiiski, B. Sahiner, M. A. Helvie, H.-P. Chan, M. A. Roubidoux, C. Paramagul, C. Blane, N. Petrick, J. Bailey, K. Klein, et al. Breast Masses: Computer-aided Diagnosis with Serial Mammograms Radiology, August 1, 2006; 240(2): 343 - 356. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. S. Burnside, J. M. Park, J. P. Fine, and G. A. Sisney The Use of Batch Reading to Improve the Performance of Screening Mammography Am. J. Roentgenol., September 1, 2005; 185(3): 790 - 796. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Kerlikowske, R. Smith-Bindman, L. A. Abraham, C. D. Lehman, B. C. Yankaskas, R. Ballard-Barbash, W. E. Barlow, J. H. Voeks, B. M. Geller, P. A. Carney, et al. Breast Cancer Yield for Screening Mammographic Examinations with Recommendation for Short-Interval Follow-up Radiology, March 1, 2005; 234(3): 684 - 692. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Hadjiiski, H.-P. Chan, B. Sahiner, M. A. Helvie, M. A. Roubidoux, C. Blane, C. Paramagul, N. Petrick, J. Bailey, K. Klein, et al. Improvement in Radiologists' Characterization of Malignant and Benign Breast Masses on Serial Mammograms with Computer-aided Diagnosis: An ROC Study Radiology, October 1, 2004; 233(1): 255 - 265. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. E. M. Duijm, J. H. Groenewoud, J. H. C. L. Hendriks, and H. J. de Koning Independent Double Reading of Screening Mammograms in the Netherlands: Effect of Arbitration Following Reader Disagreements Radiology, May 1, 2004; 231(2): 564 - 570. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Smith-Bindman, P. W. Chu, D. L. Miglioretti, E. A. Sickles, R. Blanks, R. Ballard-Barbash, J. K. Bobo, N. C. Lee, M. G. Wallis, J. Patnick, et al. Comparison of Screening Mammography in the United States and the United Kingdom JAMA, October 22, 2003; 290(16): 2129 - 2137. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Kerlikowske, R. Smith-Bindman, and E. A. Sickles Short-Interval Follow-Up Mammography: Are We Doing the Right Thing? J Natl Cancer Inst, March 19, 2003; 95(6): 418 - 419. [Full Text] [PDF] |
||||
![]() |
J. G. Elmore, D. L. Miglioretti, and P. A. Carney Does Practice Make Perfect When Interpreting Mammography? Part II J Natl Cancer Inst, February 19, 2003; 95(4): 250 - 252. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |